PAS204: Lecture 16.

The Neyman-Pearson

In this lecture we show how to construct optimal tests comparing two simple

16.1 Two simple hypotheses

Remember that a simple hypothesis asserts that takes a single, specied
We now consider the case where we wish to conduct a hypothesis test
when both hypotheses are simple.

H0 : = 0 ; H1 : = 1 :

In eect, we are supposing that can only have two possible values, 0
or 1 .

Remark 16.1 It is really superuous now to allow to be a vector, but

we stick with the general notation.
The power function for a test dened by critical region C therefore also
takes only two values:

C ( 0) = P (X 2 C j = 0) ; C ( 1) = P (X 2 C j = 1) :

The rst of these is the size of the test C, C = C ( 0 ). It is also referred

to as the probability of the rst kind of error.
In hypothesis testing, there are two kinds of error, to reject H0 when it
is true and to fail to reject it when it is false. So the rst kind of erroris
to reject H0 when it is true, and the probability of this is C .
The probability of the second kind of error is C = P (X 2 = Cj =
1 ) = 1 C ( 1 ).
Our goal is to make both probabilities small, and the ideal would be
C = C = 0!
Remember that in practice we wish to maximise power subject to xing
the test size. In the case of two simple hypotheses this reduces to xing
C = and nding the test with minimum possible value of C . This
corresponds to controlling the probability of the rst kind of error and
then minimising the probability of the second kind of error.

Remark 16.2 Notice the asymmetry between the two kinds of error. Mak-
ing the rst kind of error is thought to be more serious and so must be
controlled. This corresponds to the asymmetry in the hypotheses: the null
hypothesis is the one to stick with unless we are suitably convinced that the
alternative is more plausible.

16.2 The Neyman-Pearson Lemma

A famous result called the Neyman-Pearson (N-P) Lemma identies the
most powerful test of any given size for two simple hypotheses.

Denition 16.1 (Likelihood ratio) The likelihood ratio (LR) for com-
paring two simple hypotheses is

L( 1 ; x) f (x j = 1)
(x) = = :
L( 0 ; x) f (x j = 0)

Remark 16.3 The unspecied proportionality constant in likelihoods can-

cels out when we take the ratio. Therefore the value of the likelihood ratio
is unambiguous, and has absolute meaning.

Denition 16.2 (LR test) A likelihood ratio test is of the form

Ck = fx : (x) kg = fx : L( 1 ; x) k L( 0 ; x)g

for some k.
So the critical region includes all x for which is su ciently large.

Remark 16.4 As we vary k we get dierent tests. Notice that if k 0 > k

then Ck0 Ck . So increasing k gives tests with decreasing size Ck , and
also decreasing power 1 Ck .

Remark 16.5 In this ratio, remember that the alternative is on top.

The N-P Lemma says that the LR test Ck is the most powerful among
all tests of size = k ( 0 ) = P (L( 1 ; x) k L( 0 ; x) j = 0 ).
Its proof is not particularly di cult, but it is not particularly interesting

Theorem 1 (Neyman-Pearson Lemma) Let Ck be the Likelihood Ra-
tio test of H0 : = 0 versus H1 : = 1 dened by

L( 1 ; x)
Ck = x: k ;
L( 0 ; x)
and with power function k ( ). Let C be any other test such that C ( 0)
k = k ( 0 ), where C ( ) is the power function of C. Then k ( 1)
C ( 1 ).

Example 16.1 (Normal sample, known variance) Let X1 ; X2 ; : : : ; Xn

be a sample from the N ( ; 2 ) distribution and suppose that 2 is known.
Therefore = . Consider two simple hypotheses

H0 : = 0 ; H1 : = 1 :

From previous examples, the LR is

n 2
L( 1 ; x) exp( 2 2
(x 1) ) n
(x) = = n 2
= exp( 2 2
Q) ;
L( 0 ; x) exp( 2 2
(x 0) )

2 2 2 2
Q = (x 1) (x 0) = 2x( 0 1) +( 1 0) :


Ck = fx : exp( 2n2 Q) kg
= fx : 2n2 Q log kg
2 2
= fx : 2x( 0 + ( 21
0) n
log kg
= fx : x( 0 1) k g (16.1)
where k = n log k 12 ( 21 2
0 ). Now we are going to divide by ( 0 1 ),
but if this is negative we must change the direction of the inequality.
Therefore, if we now dene k = k =( 0 1 ), we have the following
form for the LR test Ck :

if 0 > 1, we reject H0 if x k ,
if 0 < 1, we reject H0 if x k .

So the LR criterion of su ciently large translates into su ciently

small xor su ciently large xdepending on whether 0 is greater or less
than 1 .

Remark 16.6 You should always be very careful when dealing with in-

Remark 16.7 Notice that the lines leading to (16.1) are all working to-
wards getting as simple as possible a function of the data on the left hand
side of the inequality and a constant on the right. The only way the data
appear in the LR is in the form of the sample mean x, so it is this that we
are trying to isolate on the left hand side, and the example ends by nishing
this task. It is important to recognise that on the right hand side we then
just have a constant, k .

16.3 Test size

Having found the general form of the optimal tests, we have a whole family
of tests. By varying k, we can get a test of any desired size from = 1
(k = 0, always reject H0 ) to = 0 (k = 1, never reject H0 ). To complete
the analysis, we need to be able to nd the value of k that makes Ck have
the desired size k = , or else nd the P -value corresponding to the
observed data.

Example 16.2 Let us continue the preceding example. The calculation

will again depend on whether 0 is greater or less than 1 , and we will deal
with the case 0 < 1 . Then

k = P (X k j = 0) :

To derive this probability, we will use the fact that the distribution of X
is N ( ; 2 =n), and we will standardise so that we can work in terms of
standard normal probabilities.
X k
k = P p 0 p 0j = 0 ;
= n = n
and we know that if = 0 then (X 0 )=( = n) has the standard normal
distribution. So if we let Z s N (0; 1) we have

k 0 k
k =P Z p =1 p 0 : (16.2)
= n = n

This equation links the test size to the critical value k for the test.
It immediately enables us to nd the P -value (observed signicance) for

the observed data. We simply equate the critical value k to the observed
vale x of X, and obtain

x x
P =1 p0 = 0
p ;
= n = n

using the general result that 1 (z) = ( z). Now suppose we wish to
choose k, or equivalently k , so as to obtain a test of specied size . This
means solving (16.2):
p 0 = Za :
= n
) k = 0 +p Z :
So we reject H0 if x 0 + n Z . Therefore the test of size
p formally has
critical region C = fx : x 0 + n Z g.

x 0
If 0 > 1 , we end up with P = p
= n
for observed signicance, and
for a xed size test we have k = 0 pn Z , and reject H0 if x 0
p Z .
[These are the one-sided tests for a normal mean that you will have met in
Level 1 statistics.]
There are several features to notice about this example.

Remark 16.8 First, note that we did not need to actually nd k. It was
enough to nd k . In Example 16.1 we found the general form of the LR
test. If 0 < 1 it is to reject H0 if x is su ciently large, and if 0 > 1
we reject H0 if x is su ciently small. In Example 16.2 we found just how
large or small x needed to be to get a test of a given size. We did not need
to have recorded what function k was of k, 2 , 0 and 1 , because we did
not use the information again.

Remark 16.9 Something that we did need to know is the distribution of

X, at least in the case = 0 (also for = 1 if we want to know or the

16.4 The LR test statistic

Examples 16.1 and 16.2 illustrate a common nding, at least in simple
models, that the LR statistic (x) is a monotone function of some simpler
test statistic T = T (x). Then the LR test reduces to either Ck = fx :
T k 0 g or Ck = fx : T k 0 g, for some k 0 , depending on whether the LR
is a monotone increasing or decreasing function of T . We do not need in

general to know the relationship between k and k 0 . It is su cient to know
the form of the test, that it amounts to seeing if T is su ciently large or
su ciently small.
Then to identify the most powerful test of a given size , we need to
be able to derive the distribution of T (X) given = 0 , and hence to nd
k 0 such that = P (T (X) k 0 j = 0 ) or = P (T (X) k 0 j = 0 ), as
This gives us a general procedure for nding optimal tests, in two steps.

1. Find the likelihood ratio (x). Then manipulate the inequality (x)
k so as to express the test in terms of as simple a test statistic T (x)
as possible. In doing so, the objective is to nd the form of the LR
test in terms of T . If the LR is a monotone function of T , the form
of the test will be either to reject H0 if T is su ciently large or to
reject if T is su ciently small.

2. Derive the distribution of T (X) given = 0, and thereby

(a) nd the P -value corresponding to the observed value t = T (x)

of the test statistic, or
(b) nd the appropriate critical value of T (x) to produce a test of
the required size .

In practice, both steps can be mathematically tricky.

Here is another example to illustrate the process.

Example 16.3 (Exponential sample) Let X1 ; X2 ; : : : ; Xn be a sample

from the exponential distribution Ex( ). We have simple hypotheses H0 :
= 0 and H1 : = 1 > 0 . NB in this example is the parameter!
Step 1. The LR statistic is
n n
L( 1 ; x) 1 exp( nx 1 ) 1
= n = exp( nx( 1 0 )) ;
L( 0 ; x) 0 exp( nx 0 ) 0

which is a monotone decreasing function of T (x) = x. Hence the LR test

is Ck = fx : x k 0 g for some k 0 (and we do not need to know any more
about it, like the relationship between k 0 and k).
Step 2. We know from distribution theory that the sum of indepen-
dent exponential random
Pn variables has a gamma distribution, and specif-
ically that nX = i=1 Xi s Ga(n; ). We also know that a multiple
of a gamma random variable has a gamma distribution, and we know a

relationship between the gamma and chi squared distributions, so that
2 nX s Ga(n; 21 ) = 22n . We now use this result as follows.

= P (X k 0 j = 0 )
= P (2 0 nX 2 0 nk 0 j = 0)
= P (Y 2 0 nk 0 ) ; (16.3)

where Y has the 22n distribution. This gives us the observed signicance
P = P (Y 2 0 nx) for the observed value x of the test statistic X. How-
ever, this simple case illustrates why traditionally hypothesis testing was
done with certain conventional xed values of , because before the days
of computers it would have been impossible to calculate this probability
when required.
Consider, then, nding the critical value k 0 to obtain a test of size .
Denoting the upper 100p% point of the 2m distribution as in Lecture 2 by
m;p then we have

) k0 =
2 0 nk 0 = 2
2n;1 :
2 0n
So the test of size rejects H0 if x 2n;1
2 0n
. We can get 22n;1 from
tables (e.g. Neave Table 3.2) for a range of values of , including the usual
5%, 1% etc.
With modern computer software, of course, we can readily nd the
P -value for any x, or the critical value k 0 for a test of any desired size.

Remark 16.10 In this example, Step 1 was relatively straightforward. How-

ever, to do Step 2 we needed to string together a rather complicated argu-
ment around various results in distribution theory. Most of the facts you
are likely to need in arguments like this are on the handouts on distributions
and useful facts. When constructing hypothesis tests, it is very useful to
know your way around these handouts.

Remark 16.11 In Level 1 statistics, you may have constructed various

tests using more intuitive arguments. In eect, you will have identied a
suitable test statistic T (x) and also the form of the test heuristically. The
new thing in this course is the formal Step 1, where T and the form of test
are derived from the likelihood ratio. Instead of guessing what would make
a good statistic and form of test, we have the N-P Lemma to tell us what
the optimal test looks like. But Step 2 is essentially the same, whichever
approach we use up to that point.

