Input Data Analysis (3) Goodness-of-Fit Tests
Input Data Analysis (3) Goodness-of-Fit Tests
Goodness-of-Fit Tests
1
Input Data Analysis
2
Goodness-of-Fit Test
3
Goodness-of-Fit Test
• Note:
– Failure to reject H0 should NOT be
interpreted as “accepting H0 as being true”
– Because these tests are not very powerful
for small to moderate sample size n (often
real field data are limited in size) - not
sensitive to subtle deviation between
sample data and the fitted distribution
– If n is very large, then these test will almost
always reject H0, since H0 is virtually never
exactly true
– But for practical reasons, “nearly” correct is
acceptable 4
The Chi-Square Test
• Pearson (1900)
• First, divide the entire range of the fitted distribution
into k adjacent intervals with equal lengths except
the 1st and the last: [a0, a1), [a1, a2), [a2, a3), …, [ak-1,
ak]
a0 = -, ak = + First: (-, a1), Last: [ak, +)
• Then tally
Nj = number of Xi’s in the jth interval [aj-1, aj)
for j = 1, 2, …, k (note: (j=1 to k) Nj = n)
• Compute the expected Proportion Pj of the Xi’s that
would fall in the jth interval
5
The Chi-Square Test
• For continuous case
aj
pj fˆ ( x)dx
a j 1
18
Kolmogorov-Smirnov (K-S) Test
• An empirical distribution Fn(x) from data X1,…, Xn:
number of X i ' s x
Fn ( x) real numbers x
or n
Dn sup{ Fn ( x) Fˆ ( x) }
x
20
Kolmogorov-Smirnov (K-S) Test
• Dn can be calculated by:
i ˆ ˆ i 1
Dn max{ F ( X ( i ) )}, Dn max{F ( X ( i ) ) }
1 i n n 1i n n
and finally letting
Dn max{Dn , Dn }
• Notes:
– Direct computation of Dn+ and Dn– requires sorting the data to
obtain X(i)’s
– For moderate values of n (up to several hundreds), sorting can
be done quickly by simple methods
– If n is large, sorting becomes expensive!
– A large value of Dn indicates a poor fit, so that it is to reject the
null hypothesis H0 if Dn exceeds some constant dn,1-, where is
specified level of the test
21
Kolmogorov-Smirnov (K-S) Test
• Case 1: If all parameters of F̂ are known
– None of the parameters is estimated in any
way from the data, the distribution of Dn does
not depend on F̂ , assuming that it is
continuous
– Instead of testing for Dn > dn,1-, we reject H0 if
0.11
( n 0.12 ) Dn c1
n
where c1- are given in the all-parameters-known row
of Table 6.14
– This case is the original form of the K-S test
24
Kolmogorov-Smirnov (K-S) Test
• Case 2: Suppose that the hypothesized distribution
is N(, 2) with both and 2 unknown
– Estimate and 2 by X(n) and S2(n), respectively
– Define the distribution function F̂ to be N(X(n), S2(n))
– Using this F̂ , Dn is computed in the same way
– We reject H0 if
0.85
( n 0.01 ) Dn c'1
n
where c’1- are in the N(X(n), S2(n)) row of Table 6.14
– This case includes a K-S test for the lognormal distribution
if the Xi’s are the logarithms of the basic data points
26
Kolmogorov-Smirnov (K-S) Test
• Case 3: Suppose the hypothesized distribution is
expo() with unknown
is estimated by its MLE X(n)
– Define F̂ to be the expo(X(n)) distribution function
– Using this F̂ , Dn is computed
– We reject H0 if
0.2 0.5
( Dn )( n 0.26 ) Dn c"1
n n
where c”1- are given in the Expo(X(n)) row of Table
6.14
27
Kolmogorov-Smirnov (K-S) Test
• Case 4: Suppose the hypothesized distribution is
Weibull with both shape parameter and scale
parameter unknown
– Estimate parameters and by their respective MLEs
– F̂ is taken to be Weibull (MLEs of and )
– Dn is computed in the usual fashion
– We reject H0 if the adjusted K-S statistic n Dn is greater
than the modified critical value c*1- given in Table 6.15
– Note that critical values are available only for certain
sample sizes n, and that the critical values for n = 50
and are, fortunately, very similar
28