BMC Psychiatry: Evaluating Cutpoints For The MHI-5 and MCS Using The GHQ-12: A Comparison of Five Different Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

BMC Psychiatry BioMed Central

Research article Open Access


Evaluating cutpoints for the MHI-5 and MCS using the GHQ-12: a
comparison of five different methods
Mark J Kelly*1, Frank D Dunstan1, Keith Lloyd2 and David L Fone1

Address: 1Dept. of Primary Care and Public Health, Centre for Health Sciences Research, School of Medicine, Cardiff University, Heath Park,
Cardiff, CF14 4YS, UK and 2Centre for Health Information, Research and Evaluation, School of Medicine, Swansea University, UK
Email: Mark J Kelly* - [email protected]; Frank D Dunstan - [email protected]; Keith Lloyd - [email protected];
David L Fone - [email protected]
* Corresponding author

Published: 19 February 2008 Received: 31 May 2007


Accepted: 19 February 2008
BMC Psychiatry 2008, 8:10 doi:10.1186/1471-244X-8-10
This article is available from: http://www.biomedcentral.com/1471-244X/8/10
© 2008 Kelly et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract
Background: The Mental Health Inventory (MHI-5) and the Mental Health Component Summary
score (MCS) derived from the Short Form 36 (SF-36) instrument are well validated and reliable
scales. A drawback of their construction is that neither has a clinically validated cutpoint to define
a case of common mental disorder (CMD). This paper aims to produce cutpoints for the MHI-5
and MCS by comparison with the General Health Questionnaire (GHQ-12).
Methods: Data were analysed from wave 9 of the British Household Panel Survey (2000),
providing a sample size of 14,669 individuals. Receiver Operating Characteristic (ROC) curves
were used to compare the scales and define cutpoints for the MHI-5 and MCS, using the following
optimisation criteria: the Youden Index, the point closest to (0,1) on the ROC curve, minimising
the misclassification rate, the minimax method, and prevalence matching.
Results: For the MHI-5, the Youden Index and the (0,1) methods both gave a cutpoint of 76,
minimising the misclassification rate gave a cutpoint of 60 and the minimax method and prevalence
matching gave a cutpoint of 68. For the MCS, the Youden Index and the (0,1) methods gave
cutpoints of 51.7 and 52.1 respectively, minimising the error rate gave a cutpoint of 44.8 and both
the minimax method and prevalence matching gave a cutpoint of 48.9. The correlation between the
MHI-5 and the MCS was 0.88.
Conclusion: The Youden Index and (0,1) methods are most suitable for determining a cutpoint
for the MHI-5, since they are least dependent on population prevalence. The choice of method is
dependent on the intended application. The MHI-5 performs remarkably well against the longer
MCS.

Background Another frequently used scale is the Mental Health Inven-


The common mental disorders of anxiety and depression tory (MHI-5) [3] which is included in the Short Form 36
(CMD) are leading causes of morbidity and disability and (SF-36). The MHI-5 is a well validated and reliable meas-
constitute a major public health burden [1]. The CMDs ure of mental health status [4], but an important limita-
are most commonly measured in population studies tion of its use is that it was not developed with a validated
using the General Health Questionnaire (GHQ-12) [2]. cutpoint to define a case of CMD. The SF-36 can also be

Page 1 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10

used to construct the Mental Health Component Sum- On both the MHI-5 and the MCS high scores indicate
mary score (MCS) which is another measure of mental good mental health, unlike the GHQ-12. Both the MHI-5
health status that is widely used in population surveys but and the GHQ-12 scales have discrete distributions. The
has no clinically validated cutpoint [5]. In this paper we GHQ-12 takes on only 13 different values, while the MHI-
aim to derive generalisable cutpoints for the MHI-5 and 5 produces only 26 different values. The MCS is a contin-
MCS using the GHQ-12 as a gold standard, using five dif- uous variable, with 11,003 different values being calcu-
ferent optimisation criteria. lated for the 14,669 individuals in the BHPS dataset.

Methods Statistical Methods


Dataset Sensitivity and Specificity
Data from wave 9 of the British Household Panel Survey In order to identify a cutpoint for any new measure, it
(BHPS) [6] were used in this analysis. The BHPS is a lon- needs to be compared to another scale which can classify
gitudinal study carried out in England, Scotland and people as a case or a non-case. Ideally, this scale would be
Wales (in wave 9). The first wave of the BHPS was carried a gold standard and would produce no misclassifications.
out in 1991 with a nationally representative sample of In the field of mental health the acknowledged gold
5,500 households. The BHPS follows households through standard is a standardised interview. A well validated
time, with an annual interview of every member of the scale, such as the GHQ-12, with an associated cutpoint to
household aged 16 and over. Individuals interviewed in distinguish cases from non-cases is a good alternative. The
the first sample who subsequently set up their own house- GHQ-12 classifies each individual in the dataset as a case
hold continued to participate in the survey, as well as or a non-case. Our aim is to find the cutpoints on the
every individual in the new household. All waves of the MHI-5 and MCS that imitate the GHQ-12 cutpoint as
BHPS include the GHQ-12, but wave 9 of the study closely as possible. Individuals with mental health scores
(2000) also included the SF-36 version 1. There is com- less than or equal to the cutpoint on the MHI-5 or MCS
plete information on both of these instruments for all will be defined as cases. The evaluation of a cutpoint
14,669 individuals in the dataset. Of those present at wave involves the twin concepts of sensitivity and specificity.
8, 83.4% were successfully followed up at wave 9. The sensitivity of a test is the probability of a case testing
positive (i.e. a true positive). The specificity of a test is the
Mental Health Measures probability of a non-case testing negative (i.e. a true nega-
The GHQ-12 comprises twelve questions, each with a set tive). Clearly a good test has a large sensitivity, but a test
of Likert scale responses which score the question as 0, 1, which automatically classifies everyone as a case has a sen-
2 or 3. There are two ways of scoring the GHQ-12. Either sitivity of one (the maximum possible), even though it is
the sum of these responses is used to provide a score rang- completely uninformative. There is a trade-off to be made,
ing between 0 and 36 or alternatively, the response to then, between sensitivity and specificity. As the cutpoint is
each question is deemed positive if it is greater than one decreased, the sensitivity decreases, while the specificity
and the number of positives provides the score. This increases.
results in a score between 0 and 12 for each individual.
This latter method is used in this study. Different studies Receiver Operating Characteristic Curves
use different cutpoints between 2 and 4 to define a case of For each possible cutpoint on the measure under investi-
common mental disorder. In this paper we use the most gation there is an associated sensitivity and specificity.
widely accepted convention of a score of three or more These can be summarised using a receiver operating char-
defined as a case [7]. The SF-36 version 1 consists of eight acteristic (ROC) curve. A ROC curve plots the sensitivity
subscales measuring Physical Functioning, Role Physical, (i.e. true positive rate) on the y-axis against one minus the
Bodily Pain, General Health, Vitality, Social Functioning, specificity (i.e. false positive rate) on the x-axis. Each point
Role Emotional and Mental Health. The MHI-5 comprises on the curve represents a different cutpoint on the new
five questions. There are six possible responses to the measure. A diagonal line at 45 degrees, known as the line
questions, scored between 1 and 6. The score for each of chance, would result from a test which allocated sub-
individual therefore ranges between 5 and 30. This is then jects randomly.
transformed into a variable ranging from 0–100 using a
standard linear transformation [5]. A different mental Optimisation Criteria
health score, which incorporates all eight subscales of the There are several approaches to choosing a cutpoint on a
SF-36, called the Mental Health Component Summary ROC curve. Five of these will be investigated in this study.
(MCS) can also be constructed. It was calculated in the Each method focuses on optimising a different criterion
standard way [5], using UK norms [8] and factor loadings and so may produce a different cutpoint. The five meth-
[9]. ods are: 1. the Youden Index [10], 2. the point closest to
the upper left corner, coordinates (0,1), as used by Hol-

Page 2 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10

mes [11] 3. the misclassification rate, 4: the minimax


method [12] and 5. prevalence matching, as used by

1.0
Hoeymans et al [13]. Only the first two have a graphical (0,1)
interpretation on the ROC curve.

0.8
Youden Index
In general, a good cutpoint is one which produces both a

0.6
large sensitivity and a large specificity. An intuitive J

Sensitivity
method, therefore, is to maximise the sum of the sensitiv-

0.4
ity and specificity, S, i.e. satisfy equation 1

S = max(Sensitivity + Specificity) (1)

0.2
This approach assumes that sensitivity and specificity are

0.0
equally important. This is exactly equivalent to the 0.0 0.2 0.4 0.6 0.8

Youden index, shown in equation 2, since subtracting a 1-Specificity

constant does not affect the optimal cutpoint. This can be


interpreted as choosing the point on the ROC curve with criterion1 illustration of the Youden Index (J) and the (0,1)
Graphical
Figure
Graphical illustration of the Youden Index (J) and the
the largest vertical distance from the line of chance.
(0,1) criterion. 1. (0,1) refers to the minimum distance
between the point (0,1) and the ROC curve. 2. J refers to the
J = max(Sensitivity + Specificity - 1) (2) Youden Index in equation 2.

Shortest distance to upper left corner


The second optimisation method investigated in this the penalty incurred for a false positive is equal to that
paper is to choose the cutpoint associated with the point incurred for a false negative. If this does not hold, the sum
on the ROC curve closest to the upper left corner. This can be weighted according to the penalties incurred for
entails finding the cutpoint which minimises d in equa- false positives and negatives, i.e. minimise
tion 3. This method also places equal weight on the sensi-
tivity and specificity. θ×(1-Sensitivity) + (1 - θ)×(1-Specificity) (6)

where θ is the weight attached to the sensitivity. Choosing


d = (1 − Sensitivity) 2 + (1 − Specificity) 2 (3)
this weight may not be straightforward. For instance, in
The rationale behind this is that a perfect ROC curve this study it is difficult to compare the consequences of
would pass through the point (0,1) (i.e. Sensitivity = 1 both types of misclassification. Expression 6 is equivalent
&Specificity = 1 for some cutpoint). Selecting the point on to equations 1 and 2 with θ set to 0.5, and equivalent to
the curve which is closest to this point of perfection is one prevalence matching (described below) with θ set to the
way to choose a cutpoint. The Youden index and the (0,1) population prevalence.
criterion are illustrated in Figure 1.
Minimax Criterion
Misclassification rate The minimax criterion involves minimising the frequency
Alternatively, the misclassification rate could be mini- of the most common error. In a two by two classification
mised. For this we define the false positive rate (FPR) to be table, this is equivalent to minimising the maximum of
the off-diagonal elements.
FPR = (Non-case Prevalence)×(1-Specificity) (4)
This involves minimising M in equation 7.
and the false negative rate (FNR) to be
M = max(FPR, FNR) (7)
FNR = (Case Prevalence)×(1-Sensitivity) (5)
This is similar to minimising the misclassification rate,
and it is the sum of these two terms that is minimised. except instead of the sum of FPR and FNR being mini-
This essentially gives weights to the sensitivity and specif- mised, the maximum of the two terms is minimised.
icity based on the prevalence of cases. If the population
has a very low prevalence of cases, then more weight Prevalence Matching
would be given to specificity. If the prevalence is high, The final optimisation criterion we consider is to choose a
then sensitivity takes precedence. This presupposes that cutpoint which results in the proportion of the screened

Page 3 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10

population classified as positives (or cases) being closest


60 68 76
to the gold standard case prevalence. Those classed as pos-

1.0
itives comprise both true and false positives and so expres-
sion 8 is minimised, where the True Positive Rate (TPR) is
the sensitivity multiplied by the case prevalence.

0.8
|TPR + FPR - P(Case)| (8)

0.6
Sensitivity
It can be shown that in the continuous case (i.e. where the

0.4
new measure is capable of infinite subdivision) this Youden & (0,1) => 76
method is equivalent to the minimax method. In discrete Min(Error Rate) => 60
Prevalence Matching

0.2
cases, they will produce very similar results. It is important & Minimax => 68
to clarify at this point, that unlike other studies which
employ ROC curves, the area under the curve is not a

0.0
meaningful criterion to use here. The area under the curve 0.0 0.2 0.4 0.6 0.8 1.0

summarises the performance of an entire measure across 1-Specificity

all cutpoints. It is appropriate when two new measures are


being compared against a gold standard in order to deter- more ROC
MHI-5
Figure 2 curve using a GHQ caseness criterion of 3 or
MHI-5 ROC curve using a GHQ caseness criterion of
mine which of the new measures performs most similarly
3 or more. 1. ROC curve based on a GHQ-12 caseness cri-
to the gold standard. It cannot, however, be used to deter- terion of 3 or more. Vertical lines indicate the optimum cut-
mine an optimum cutpoint on a scale. points using the five different optimisation criteria.

Since the method uses the same dataset both to define cut-
points and assess the performance of those cutpoints,
there is the possibility of overestimating the sensitivity MCS Results
and specificity. This potential source of bias is investigated Next, we examine the MCS. The Youden index and the
by repeating the analysis using 75% of the dataset (ran- (0,1) methods produce very similar cutpoints of 51.7 and
domly selected), and then assessing the performance of 52.1, respectively. Minimising the error rate produces a
the cutpoints produced on the remaining 25% of the data- cutpoint of 44.8 while both prevalence matching and the
set. minimax method indicate a cutpoint of 48.9. Table 1
summarises the results and illustrates the trade off that
Results must be made between sensitivity and specificity. Figure 3
MHI-5 Results shows the points on the ROC curve corresponding to each
First consider the MHI-5. Maximising the Youden index of the cutpoints produced by the different optimisation
leads to a cutpoint of 76 (a case of common mental disor- criteria. The correlation between the MCS and GHQ-12 is
der is defined by a score of less than or equal to 76) for the the same as for the MHI-5 at -0.65.
MHI-5. Using the shortest distance from (0,1) criterion
the optimal cutpoint is also 76. In general, these two opti- Assessment of bias
misation methods will not always give the same cutpoint Using three-quarters of the data to derive cutpoints
though the discrete nature of both scales means that in resulted in no change of optimum cutpoints for the MHI-
practice they often will. Using the sample prevalence of 5. When these were applied to the unused 25% of the data
25.3% (according to the GHQ-12) to weight the sum of in order to assess their performance no systematic bias
the sensitivity and specificity (thereby minimising the was observed, with the sensitivity and specificity for each
error rate) the corresponding cutpoint is 60. Using the cutpoint being equally likely to increase as decrease. The
prevalence matching method of choosing a cutpoint the situation was similar for the MCS, with most of the opti-
optimal cutpoint is 68. This produces a case prevalence of misation criteria producing identical cutpoints to those
24.4%, which is the closest to the GHQ-12 case preva- produced by the full dataset (the only method which pro-
lence of 25.3%. The minimax method yields the same cut- duced a slightly different cutpoint was the minimising the
point as prevalence matching. The correlation between misclassification method, which went from 44.8 to 45.1).
the GHQ-12 and the MHI-5 is high (-0.65). Figure 2 Again, when these cutpoints were applied to the unused
shows the points on the ROC curve corresponding to each 25% of the data, they produced sensitivities and specifici-
of the cutpoints produced by the different optimisation ties very close to those reported for the full dataset.
criteria.

Page 4 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10

Table 1: MHI-5 and MCS cutpoints and associated test characteristics for five optimisation criteria

Scale Optimisation Criterion Cutpoint Sensitivity Specificity Positivity1 Rate Error Rate2 %

MHI-5 Youden Index 76 0.756 0.771 0.362 23.3


(0,1)3 76 0.756 0.771 0.362 23.3
Misclassification Rate 60 0.473 0.943 0.163 17.6
Minimax method 68 0.615 0.882 0.244 18.5
Prevalence Matching 68 0.615 0.882 0.244 18.5

MCS Youden Index 51.7 0.745 0.787 0.348 22.4


(0,1) 52.1 0.759 0.772 0.362 23.1
Misclassification Rate 44.8 0.476 0.941 0.164 17.6
Minimax method 48.9 0.630 0.874 0.253 18.8
Prevalence Matching 48.9 0.630 0.874 0.253 18.8

1Positivity rate refers to the proportion of the sample defined to be a case using each cutpoint.
2Error rate refers to the proportion of the sample classified differently to the GHQ-12. This comprises both false negatives and false positives.
3(0,1) refers to the criterion which minimises the distance between the point (0,1) and the ROC curve

Comparison of the MHI-5 and the MCS seven times as many questions. Figure 4 illustrates how
It is also worth noting that the shorter MHI-5 performs the cutpoints on the MHI-5 and the MCS vary with GHQ-
remarkably similar to the longer MCS, with the correla- 12 case prevalence (the GHQ-12 case prevalence was var-
tion between the two being 0.88. Table 1 shows that the ied by changing the cutpoint on the GHQ-12).
error rates produced by the five optimisation methods are
very similar, for the two scales. Table 1 shows that the
MCS is only marginally more efficient at discriminating
cases of CMD than is the MHI-5, despite employing over

48.9 52.1
44.8 51.7
1.0
0.8
0.6
Sensitivity

Youden => 51.7


0.4

(0,1) => 52.1


Min(Error Rate) => 44.8
0.2

Prevalence Matching
& Minimax => 48.9
0.0

0.0 0.2 0.4 0.6 0.8 1.0


1-Specificity

Figure
MCS ROC
3 curve using a GHQ caseness criterion of 3 or more
MCS ROC curve using a GHQ caseness criterion of 3 or more. 1. ROC curve based on a GHQ-12 caseness criterion
of 3 or more. Vertical lines indicate the optimum cutpoints using the five different optimisation criteria.

Page 5 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10

MHI-5

80
60

MCS
Cutpoint

40
20
0

0.0 0.1 0.2 0.3 0.4

Case Prevalence

Relationship
Figure 4 between prevalence and MHI-5 and MCS cutpoints for four optimisation methods
Relationship between prevalence and MHI-5 and MCS cutpoints for four optimisation methods. 1. Case preva-
lence is altered by varying the cutpoint used to define caseness on the GHQ-12 from 1 to 12. 2. Solid line denotes the Youden
Index. 3. Dashed line denotes the (0,1) method. 4. Dotted line denotes the minimising the error rate method. 5. Dashed and
dotted line denotes the prevalence matching method. 6. The minimax method is excluded since it is predominantly coincidental
with the prevalence matching method.

Discussion The relationship between the optimum cutpoint and the


Main Findings population case prevalence for the five optimisation
For the MHI-5 the five methods produce three distinct cut- methods and for both the MHI-5 and the MCS was inves-
points. Both graphical approaches (the Youden Index and tigated in Figure 4 (the minimax method was excluded
the point closest to the upper left corner) produce a cut- since it was largely coincidental with prevalence match-
point of 76. Prevalence matching and the minimax ing). For the minimising the error rate and prevalence
method both indicate that 68 is the optimal cutpoint, matching methods the optimal cutpoint varies greatly
while minimising the misclassification rate provides an with population prevalence, while the Youden index and
optimal cutpoint of 60. The five methods produced simi- (0,1) methods are relatively independent of population
lar results for the MCS with the two graphical approaches prevalence. This is the case for both scales. This invariance
producing cutpoints of 51.7 and 52.1. Both prevalence under different population prevalences is a property that
matching and the minimax method gave an MCS cutpoint is extremely useful for studies that span large and hetero-
of 48.9 and minimising the missclassification rate pro- geneous areas, such as international comparisons. Both
duced an MCS cutpoint of 44.8. It is important to point methods also have intuitive interpretations as described
out that the reason the prevalence matching, minimax, earlier, and so there is very little to choose between them.
and minimising the misclassification methods give lower
cutpoints than the Youden or (0,1) criteria is due to the When the misclassification rate was minimised there was
fact that the case prevalence is less than 50% (25.3%). If still a error rate of 17.6% for both the MHI-5 and MCS,
the case prevalence were greater than 50% this situation which may imply that they measure slightly different con-
would be reversed and the three aforementioned methods structs to the GHQ-12. This finding is echoed by Hoey-
would give cutpoints greater than the Youden or (0,1) cri- mans et al [13] who noted that the MHI-5 was
teria. uncorrelated with age, whereas older age groups scored
higher on the GHQ-12 (indicating worse mental health).

Page 6 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10

Weinstein et al [14] drew attention to the fact that the They used a less severe CMD case criterion of two or more
comparative nature of the GHQ-12 response choices is on the GHQ-12, giving a case prevalence of 22.8%. The
not conducive to detecting chronic disorders. A subject MHI-5 cutpoint which matched this prevalence most
suffering from chronic anxiety disorder may well answer closely was 72, resulting in a case prevalence of 20.6%. To
the question "Have you recently lost much sleep over illustrate how this approach can lead to different results in
worry?" with the response choice "no more than usual", if different populations, we carried out the equivalent pro-
their condition is a long-standing one. The MHI-5 and cedure in the BHPS dataset. Using a GHQ-12 caseness cri-
MCS avoid this problem by employing less comparative terion of two or more classifies 32.9% of the dataset as
response choices. Another explanation for the lack of cases. The MHI-5 cutpoint which best matches this preva-
complete agreement between the GHQ-12 and the two lence is 76 (providing a case prevalence of 36.2%).
SF-36 mental health measures is that they were designed
differently. The MHI-5 includes one or more questions on One small study compared four psychiatric case-finding
each of the following mental health dimensions: anxiety, instruments in 69 patients presenting to general practice
depression, loss of behavioural/emotional control and in Wales and chose cutpoints which provided an unde-
psychological well-being [3], while the MCS is a weighted fined "similar sensitivity and specificity values for each
sum of all eight health dimensions of the SF-36. The instrument" [15]. The Revised Clinical Interview Schedule
GHQ-12 on the other hand includes items on depression, was used to define a case of CMD. This study identified an
anxiety, social performance and somatic complaints [2]. MHI-5 cutpoint quoted as 71/72.
However, the high correlations between the GHQ-12 and
both the MHI-5 and the MCS indicate that, despite these A report published in Dutch compared the MHI-5 with
differences, the three scales perform very similarly. This the GHQ-12 in order to ascertain a cutpoint [16]. They
can be seen in Table 1 where the five optimisation meth- sampled 7,065 independently living individuals aged 18
ods produce cutpoints on both scales with very similar to 64 from the general population. A score of two or more
properties in terms of sensitivity, specificity, positivity rate on the GHQ-12 was used to define caseness, which classi-
and error rate. As mentioned previously the correlation fied 24.4% of the population as a case. They used the
between the MHI-5 and the MCS is high at 0.88. Youden Index and prevalence matching methods. The
Youden Index indicated an MHI-5 cutpoint of 72, leading
More generally, this study has found that the minimax to a case prevalence of 22.8%. The Composite Interna-
method and prevalence matching methods give very sim- tional Diagnostic Interview (CIDI) was used to determine
ilar results. Indeed, in this study they produce identical whether individuals suffered from any of the following
cutpoints. This is not a coincidence, as the two criteria disorders: depression, bipolar disorder, dysthymia, panic
become equivalent if the scale in question is continuous disorder, agoraphobia, specific phobia, social phobia,
(and the probability of caseness is calculated from the generalised anxiety disorder, obsessive compulsive disor-
same dataset). der, schizophrenia, anorexia and bulimia. The percentage
of the population diagnosed with at least one of these dis-
Investigators should give careful consideration to which orders was found to be 12.2%. The MHI-5 cutpoint which
of these cutpoints is most appropriate for their study, matched this prevalence most closely was 60, producing a
since selecting which criterion should be optimised case prevalence of 11.2%.
depends primarily on the intended application of the
resulting cutpoint. For instance, a study whose primary Three other studies have defined a cutpoint by comparing
goal is to identify cases in a given locality might do well to MHI-5 scores with a range of different validated clinical
minimise the misclassification rate. However, a study interview schedules. These are summarised in turn below.
interested in comparing CMD internationally should con- The wide range of cutpoints found reflects the wide variety
sider utilising the Youden Index or the (0,1) method, as in sample sizes, study settings and outcomes of interest.
these methods are most appropriate when the study area
encompasses regions with different case prevalences. A study of 95 non-psychiatric patients who were HIV sero-
Prevalence matching has the advantage of simplicity but positive used the Structured Clinical Interview for DSM-
will inevitably lead to different cutpoints in different pop- III-R (SCID-NP-HIV) psychiatric disorders [11] and found
ulations. The minimax method approximates to preva- a cutpoint of 52 using the (0,1) method. This study was
lence matching when the scale in question is continuous. investigating more severe disorders than the CMD and so
produced a very low cutpoint. Applying this cutpoint to
Comparison with Previous Studies the BHPS dataset would identify only 8.3% of the individ-
One study of 7,359 adults representative of the Dutch uals as cases. A study of 4,036 German nationals resident
general population used the GHQ-12 to derive a MHI-5 in an area of approximately 50 km in diameter surround-
cutpoint using the prevalence matching method [13]. ing Lubeck used the Munich Composite International

Page 7 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10

Diagnostic Interview (M-CIDI) and found a cutpoint of achievable. Administering the GHQ-12 is, by comparison,
65 [17]. This study used the (0,1) method. This low cut- inexpensive and efficient. Also, it can be argued that the
point can be attributed to the fact that the M-CIDI is used GHQ-12 is a well validated and reliable scale, with vali-
to diagnose DSM-IV Axis 1 psychiatric disorders which are dated cutpoints, and as such is a reasonable instrument
more extreme conditions than the common mental disor- against which to measure other scales.
ders.
A further potential criticism concerns the crude nature of
Another study investigated the validity of the MHI-5 for cutpoints resulting in the loss of information in the varia-
assessing major depression using 1,444 functionally ble being dichotomised. One method which seeks to
impaired, community dwelling elderly Americans. The avoid this problem is to use stratum-specific likelihood
gold standard against which the MHI-5 was compared was ratios (SSLRs) [19], defined as the ratio of the probability
the MINI-International Neuro-Psychiatric Interview of a given test result when the disease is present and the
Major Depressive Episode (MINI-MDE) module. The probability of the same test result when the disease is
Youden index optimisation criterion produced a cutpoint absent. Instead of plotting these against one another as in
quoted as 59/60. Again, the study focussed on major a ROC curve, the SSLR approach examines the ratio of the
depression and so produced a lower cutpoint than the cut- two. These SSLRs can be calculated for each possible score
point of 80 indicated by our paper. on the scale in question. Nomograms can then be con-
structed providing the probability that a given individual
A Norwegian study used the MHI-5 as the gold standard is a case depending on their score. While this approach
to define cutpoints for a different measure [18]. Postal certainly retains more information than a simple cutpoint
questionnaire surveys with MHI-5 information were and may be intuitively appealing, it does not avoid the
returned by 6865 (70.5% response rate) individuals and problem of having to choose a cutpoint in many situa-
cutpoints of 52 and 56 were used successively. tions since the SSLRs may still need to be summarised for
practical purposes. Furthermore, the SSLR is more useful
To our knowledge no study has attempted to identify a for diagnostic purposes, while both the GHQ-12 and the
cutpoint of the MCS. MHI-5 are intended as screening tools as opposed to diag-
nostic tools. The use of depression/anxiety or case finding
Strengths and Limitations of the Study instruments has limited impact on the recognition, man-
This study compares two measures derived from a ques- agement or outcome of depression/anxiety in primary
tionnaire that is frequently employed in population care [20,21]. As such, a cutpoint on either scale is only
research [18], using a large, representative sample of the appropriate for use in research on populations and would
UK population and five different optimisation criteria. not be suitable for diagnostic purposes.
The main strength of our study is that the sample size used
is nearly twice that of the next largest study. So, while it is In the year that wave 9 of the BHPS was carried out
well known that optimal cutpoints vary as a function of (2000), an updated version of the SF-36 was released [5].
the population being investigated, severity of caseness, The MHI-5 used in the SF-36 version two excludes the
case prevalence and gold standard employed, this study response choice "a good bit of the time" as validation
compares the cutpoints derived from five optimisation studies found that this choice was not consistently
criteria, for the MHI-5 and the MCS. This facilitates an ordered in relation to the other categories [22]. However,
objective and comprehensive assessment of the best cut- it has been shown that there is little difference in the per-
point for use in different studies. formance of the six and five choice response scale [5,17].

A criticism of the approach adopted in this paper regards Conclusion


the use of the GHQ-12 as the comparative gold standard. Of the five optimisation methods used in this study, the
In ROC curve analysis, a measure is supposed to be com- Youden Index and the (0,1) method are the most suitable
pared to a gold standard which can categorise the sample for the determination of a generalisable cutpoint, since
without error. In the field of mental health a standardised they are least dependent on the population case preva-
interview schedule is considered the gold standard and lence. Both approaches indicate that the best cutpoint to
would be preferable to using the GHQ-12. A scale such as define a case of CMD using the MHI-5 is 76, while for the
the GHQ-12 is likely to have lower sensitivity and specifi- MCS the Youden Index indicates a cutpoint of 51.7 and
city than a standardised interview schedule and this may the (0,1) method a cutpoint of 52.1. The MHI-5 has the
affect the resulting cutpoints. Unfortunately, the BHPS advantage over the GHQ-12 of brevity, consisting of only
did not administer a standardised interview schedule. A five multiple choice questions and performs very similarly
disadvantage to using a standardised interview schedule is to the longer MCS. Further validation studies, ideally
that it is resource intensive, limiting the sample size using a standardised interview schedule and a large popu-

Page 8 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10

lation, spanning different countries, are required to con- disorders as gold standard. Psychiatry Research 2001,
105:243-253.
firm our findings. 18. Strand BH, Dalgard OS, Tambs K, Rognerund M: Measuring the
mental health status of the Norwegian population: a com-
Competing interests parison of the instruments SCL-25, SCL-10, SCL-5 and MHI-
5 (SF-36). Nordic Journal of Psychiatry 2003, 57(2):113-118.
The author(s) declare that they have no competing inter- 19. Furukawa T, Goldberg D, Rabe-Hesketh S, Ustun T: Stratum-spe-
ests. cific likelihood ratios of two versions of the General Health
Questionnaire. Psychological Medicine 2001, 31:519-529.
20. Gilbody S, House A, Sheldon T: Routinely administered ques-
Authors' contributions tionnaires for depression and anxiety: systematic review.
DLF was responsible for the inception and design of the British Medical Journal 2001, 322:406-409.
21. Gilbody S, House A, Sheldon T: Screening and case finding instruments
study. The analysis was performed by MJK, supervised by for depression. The Cochrane Database of Systematic Reviews Wiley;
FDD. MJK drafted the paper. All authors contributed to 2005.
the intellectual content of the paper as well as offering 22. Keller S, Ware J, Gandek B, Aaronson N, Alonso J, Giovanni A,
Bjorner J, Brazier J, Bullinger M, Fukuhara S, Kaasa S, Leplege A, San-
revisions on all drafts. KL was especially involved in com- son-Fisher R, Sullivan M, Wood-Dauphine S: Testing the equiva-
menting upon the psychiatric component to the paper. All lence of translations of widely used response choice labels:
results from the IQOLA project. Journal of Clinical Epidemiology
authors read and approved the final manuscript. 1998, 51:933-944.

Acknowledgements Pre-publication history


We thank two reviewers for their constructive suggestions. The pre-publication history for this paper can be accessed
here:
References
1. Weich S: Prevention of the common mental disorders: a pub-
lic health perspective. Psychological Medicine 1997, 27:757-764. http://www.biomedcentral.com/1471-244X/8/10/pre
2. Goldberg D, Williams P: A User's Guide to the General Health Question- pub
naire Windsor: NEFR-Nelson; 1988.
3. Ware J, Kosinksi M, Gandek B: SF-36® health survey: Manual & Inter-
pretation Guide Lincoln: Quality Metric Incorporated; 2000.
4. Ware E, Gandek B: Overview of the SF-36 survey and the inter-
national quality of life assessment(IQOLA). Journal of Clinical
Epidemiology 1998, 51(11):903-912.
5. Ware J, Kosinksi M, Dewey J: How to score version 2 of the SF-36®
health survey Lincoln: Quality Metric Incorporated; 2000.
6. Taylor M, Brice J, Buck N, Prentice-Lane E: British Household Panel Sur-
vey User Manual Volume A: Introduction, technical report and appendices
Colchester: University of Essex; 2005.
7. Goldberg D, Gater G, Sartorius N, Ustun T, Piccinelli M, Gureje O,
Rutter C: The validity of two versions of the GHQ in the
WHO study of mental illness in general health care. Psycho-
logical Medicine 1997, 27:191-197.
8. Jenkinson C, Layte R, Lawrence K: Development and testing of
the medical outcomes study 36-item short from health sur-
vey summary scale scores in the United Kingdom: Results
from a large-scale survey and clinical trial. Medical Care 1997,
35(4):410-416.
9. Jenkinson C, Stewart-Brown S, Petersen S, Paice C: Assessment of
the SF-36 version 2 in the United Kingdom. Journal of Epidemi-
ology and Community Health 1999, 53:46-50.
10. Youden W: An index for rating diagnostic tests. Cancer 1950,
3:32-35.
11. Holmes W: A short, psychiatric, case-finding measure for HIV
seropositive outpatients. Medical Care 1998, 36(2):237-243.
12. Hand D: Screening vs Prevalence Estimation. Applied Statistics
1987, 36:1-7.
13. Hoeymans N, Garssen A, Westert G, Verhaak P: Measuring men-
tal health of the Dutch population: a comparison of the Publish with Bio Med Central and every
GHQ-12 and the MHI-5. Health and Quality of Life Outcomes 2004, scientist can read your work free of charge
2:23-29.
14. Weinstein W, Berwick D, Goldman P, Murphy J, Barsky A: A com- "BioMed Central will be the most significant development for
parison of three psychiatric screening tests using reciever disseminating the results of biomedical researc h in our lifetime."
operating characteristics (ROC) analysis. Medical Care 1989, Sir Paul Nurse, Cancer Research UK
27(6):593-607.
15. Winston M, Smith J: A trans-cultural comparison of four psychi- Your research papers will be:
atric case-finding instruments in a Welsh community. Social
available free of charge to the entire biomedical community
Psychiatry and Psychiatric Epidemiology 2000, 35:569-575.
16. Perenboom R, Oudshoorn K, van Herten L, Hoeymans N, Bijl R: Life- peer reviewed and published immediately upon acceptance
expectancy in good mental health: establishing cut-offs for the MHI-5 and
cited in PubMed and archived on PubMed Central
GHQ-12 (in Dutch). Leiden: TNO-report 2000.
17. Rumpf H, Meyer C, Hapke U, John U: Screening for mental yours — you keep the copyright
health: validity of the MHI-5 using DSM-IV Axis 1 psychiatric
Submit your manuscript here: BioMedcentral
http://www.biomedcentral.com/info/publishing_adv.asp

Page 9 of 9
(page number not for citation purposes)

You might also like