- Earlier we introduced hypothesis test to examine the quality
of random number generators. Now we will apply these tests to hypothesis
about distributional forms of input data.
- Goodness-of-fit tests provide helpful guidance for evaluating
the suitability of a potential input model.
- The tests depends heavily on the amount of data. If very little
data are available, the test is unlikely to reject
*any*candidate distribution (because not enough evidence to reject); if a lot of data are available, the test will likely reject*all*candidate distributions (because none fits perfectly). - Failing to reject a candidate should be viewed as a piece of
evidence in favor of that choice; while rejecting an input model is
only one piece of evidence against the choice.
- Chi-square test is for large sample sizes, for both discrete
and continuous distributional assumptions, when parameters are estimated
by maximum likelihood.
- Arranging the
*n*observations into a set of*k*class intervals or cells. - The test statistic

where is the observed frequency in the*i*th class interval and is the expected frequency in that class interval. - The approximately follows the chi-square distribution
with
*k-s-1*degrees of freedom, where*s*represents the number of parameters of the estimated distribution. E.g Poisson distribution has*s = 1*, normal distribution has*s=2*. - The hypothesis
- the random variable,
*X*, conforms to the distributional assumption with the parameter(s) given by the parameter estimate(s) - the random variable
*X*does not conform the distribution

- The critical value
is found in Table
A.6. is rejected if
.
- The choice of
*k*, the number of class intervals, see Table 10.5 on page 377. - Example 10.13 on page 377.

- Arranging the
- Chi-square test with equal probabilities:
- If a continuous distributional assumption is being tested,
class intervals that are equal in probability rather than equal
in width of interval should be used.
- Example 10.14: Chi-square test for exponential distribution
(page 379)
- test with intervals of equal probability (not necessary equal width)
- number of intervals less than or equal to
*n/5* *n = 50*, so , according to recommendations in Table 10.5, 7 to 10 class intervals be used.- Let
*k = 8*, thus - The end points for each interval are computed from
the cdf for the exponential distribution

where represents the end point of the*i*th interval. - Since is the cumulative area from zero
to , thus

thus

regardless the value of , and . - With
in this example and
*k = 8*,

continue with*i = 2,3,...,7*results in 3.425, 5.595, 8,252, 11.677, 16.503, and 24.755. - See page 379 and 380 for completion of the example.

- Example 10.15 (Chi-square test for Weibull distribution)
on page 380
- Example 10.16 (Computing intervals for the normal distribution)
on page 381
- For the given data, using suggested estimator in Table
10.3 on page 370, we know (the original data was from Example 10.3
on page 360)

- For the given data, using suggested estimator in Table
10.3 on page 370, we know (the original data was from Example 10.3
on page 360)
- Kolmogorov-Smirnov Goodness-of-fit test
- Chi-square test heavily depends on the class
intervals. For the same data, different grouping of the data
may result in different conclusion, rejection or acceptance.
- The K-S goodness-of-fit test is designed to overcome
this difficulty. The idea of K-S test is from q-q plot.
- The K-S test is particularly useful when sample
size are small and when no parameters have been estimated
from the data.
- Example 10.7 on page 383, using the method described
in Section 8.4.1 on page 299. A few notes:
- If the interarrival time is exponentially distributed, the arrival times are uniformly distributed on (0,T]

- Chi-square test heavily depends on the class
intervals. For the same data, different grouping of the data
may result in different conclusion, rejection or acceptance.

- If a continuous distributional assumption is being tested,
class intervals that are equal in probability rather than equal
in width of interval should be used.