BMC Psychiatry: Evaluating Cutpoints For The MHI-5 and MCS Using The GHQ-12: A Comparison of Five Different Methods
BMC Psychiatry: Evaluating Cutpoints For The MHI-5 and MCS Using The GHQ-12: A Comparison of Five Different Methods
BMC Psychiatry: Evaluating Cutpoints For The MHI-5 and MCS Using The GHQ-12: A Comparison of Five Different Methods
Address: 1Dept. of Primary Care and Public Health, Centre for Health Sciences Research, School of Medicine, Cardiff University, Heath Park,
Cardiff, CF14 4YS, UK and 2Centre for Health Information, Research and Evaluation, School of Medicine, Swansea University, UK
Email: Mark J Kelly* - [email protected]; Frank D Dunstan - [email protected]; Keith Lloyd - [email protected];
David L Fone - [email protected]
* Corresponding author
Abstract
Background: The Mental Health Inventory (MHI-5) and the Mental Health Component Summary
score (MCS) derived from the Short Form 36 (SF-36) instrument are well validated and reliable
scales. A drawback of their construction is that neither has a clinically validated cutpoint to define
a case of common mental disorder (CMD). This paper aims to produce cutpoints for the MHI-5
and MCS by comparison with the General Health Questionnaire (GHQ-12).
Methods: Data were analysed from wave 9 of the British Household Panel Survey (2000),
providing a sample size of 14,669 individuals. Receiver Operating Characteristic (ROC) curves
were used to compare the scales and define cutpoints for the MHI-5 and MCS, using the following
optimisation criteria: the Youden Index, the point closest to (0,1) on the ROC curve, minimising
the misclassification rate, the minimax method, and prevalence matching.
Results: For the MHI-5, the Youden Index and the (0,1) methods both gave a cutpoint of 76,
minimising the misclassification rate gave a cutpoint of 60 and the minimax method and prevalence
matching gave a cutpoint of 68. For the MCS, the Youden Index and the (0,1) methods gave
cutpoints of 51.7 and 52.1 respectively, minimising the error rate gave a cutpoint of 44.8 and both
the minimax method and prevalence matching gave a cutpoint of 48.9. The correlation between the
MHI-5 and the MCS was 0.88.
Conclusion: The Youden Index and (0,1) methods are most suitable for determining a cutpoint
for the MHI-5, since they are least dependent on population prevalence. The choice of method is
dependent on the intended application. The MHI-5 performs remarkably well against the longer
MCS.
Page 1 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10
used to construct the Mental Health Component Sum- On both the MHI-5 and the MCS high scores indicate
mary score (MCS) which is another measure of mental good mental health, unlike the GHQ-12. Both the MHI-5
health status that is widely used in population surveys but and the GHQ-12 scales have discrete distributions. The
has no clinically validated cutpoint [5]. In this paper we GHQ-12 takes on only 13 different values, while the MHI-
aim to derive generalisable cutpoints for the MHI-5 and 5 produces only 26 different values. The MCS is a contin-
MCS using the GHQ-12 as a gold standard, using five dif- uous variable, with 11,003 different values being calcu-
ferent optimisation criteria. lated for the 14,669 individuals in the BHPS dataset.
Page 2 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10
1.0
Hoeymans et al [13]. Only the first two have a graphical (0,1)
interpretation on the ROC curve.
0.8
Youden Index
In general, a good cutpoint is one which produces both a
0.6
large sensitivity and a large specificity. An intuitive J
Sensitivity
method, therefore, is to maximise the sum of the sensitiv-
0.4
ity and specificity, S, i.e. satisfy equation 1
0.2
This approach assumes that sensitivity and specificity are
0.0
equally important. This is exactly equivalent to the 0.0 0.2 0.4 0.6 0.8
Page 3 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10
1.0
itives comprise both true and false positives and so expres-
sion 8 is minimised, where the True Positive Rate (TPR) is
the sensitivity multiplied by the case prevalence.
0.8
|TPR + FPR - P(Case)| (8)
0.6
Sensitivity
It can be shown that in the continuous case (i.e. where the
0.4
new measure is capable of infinite subdivision) this Youden & (0,1) => 76
method is equivalent to the minimax method. In discrete Min(Error Rate) => 60
Prevalence Matching
0.2
cases, they will produce very similar results. It is important & Minimax => 68
to clarify at this point, that unlike other studies which
employ ROC curves, the area under the curve is not a
0.0
meaningful criterion to use here. The area under the curve 0.0 0.2 0.4 0.6 0.8 1.0
Since the method uses the same dataset both to define cut-
points and assess the performance of those cutpoints,
there is the possibility of overestimating the sensitivity MCS Results
and specificity. This potential source of bias is investigated Next, we examine the MCS. The Youden index and the
by repeating the analysis using 75% of the dataset (ran- (0,1) methods produce very similar cutpoints of 51.7 and
domly selected), and then assessing the performance of 52.1, respectively. Minimising the error rate produces a
the cutpoints produced on the remaining 25% of the data- cutpoint of 44.8 while both prevalence matching and the
set. minimax method indicate a cutpoint of 48.9. Table 1
summarises the results and illustrates the trade off that
Results must be made between sensitivity and specificity. Figure 3
MHI-5 Results shows the points on the ROC curve corresponding to each
First consider the MHI-5. Maximising the Youden index of the cutpoints produced by the different optimisation
leads to a cutpoint of 76 (a case of common mental disor- criteria. The correlation between the MCS and GHQ-12 is
der is defined by a score of less than or equal to 76) for the the same as for the MHI-5 at -0.65.
MHI-5. Using the shortest distance from (0,1) criterion
the optimal cutpoint is also 76. In general, these two opti- Assessment of bias
misation methods will not always give the same cutpoint Using three-quarters of the data to derive cutpoints
though the discrete nature of both scales means that in resulted in no change of optimum cutpoints for the MHI-
practice they often will. Using the sample prevalence of 5. When these were applied to the unused 25% of the data
25.3% (according to the GHQ-12) to weight the sum of in order to assess their performance no systematic bias
the sensitivity and specificity (thereby minimising the was observed, with the sensitivity and specificity for each
error rate) the corresponding cutpoint is 60. Using the cutpoint being equally likely to increase as decrease. The
prevalence matching method of choosing a cutpoint the situation was similar for the MCS, with most of the opti-
optimal cutpoint is 68. This produces a case prevalence of misation criteria producing identical cutpoints to those
24.4%, which is the closest to the GHQ-12 case preva- produced by the full dataset (the only method which pro-
lence of 25.3%. The minimax method yields the same cut- duced a slightly different cutpoint was the minimising the
point as prevalence matching. The correlation between misclassification method, which went from 44.8 to 45.1).
the GHQ-12 and the MHI-5 is high (-0.65). Figure 2 Again, when these cutpoints were applied to the unused
shows the points on the ROC curve corresponding to each 25% of the data, they produced sensitivities and specifici-
of the cutpoints produced by the different optimisation ties very close to those reported for the full dataset.
criteria.
Page 4 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10
Table 1: MHI-5 and MCS cutpoints and associated test characteristics for five optimisation criteria
Scale Optimisation Criterion Cutpoint Sensitivity Specificity Positivity1 Rate Error Rate2 %
1Positivity rate refers to the proportion of the sample defined to be a case using each cutpoint.
2Error rate refers to the proportion of the sample classified differently to the GHQ-12. This comprises both false negatives and false positives.
3(0,1) refers to the criterion which minimises the distance between the point (0,1) and the ROC curve
Comparison of the MHI-5 and the MCS seven times as many questions. Figure 4 illustrates how
It is also worth noting that the shorter MHI-5 performs the cutpoints on the MHI-5 and the MCS vary with GHQ-
remarkably similar to the longer MCS, with the correla- 12 case prevalence (the GHQ-12 case prevalence was var-
tion between the two being 0.88. Table 1 shows that the ied by changing the cutpoint on the GHQ-12).
error rates produced by the five optimisation methods are
very similar, for the two scales. Table 1 shows that the
MCS is only marginally more efficient at discriminating
cases of CMD than is the MHI-5, despite employing over
48.9 52.1
44.8 51.7
1.0
0.8
0.6
Sensitivity
Prevalence Matching
& Minimax => 48.9
0.0
Figure
MCS ROC
3 curve using a GHQ caseness criterion of 3 or more
MCS ROC curve using a GHQ caseness criterion of 3 or more. 1. ROC curve based on a GHQ-12 caseness criterion
of 3 or more. Vertical lines indicate the optimum cutpoints using the five different optimisation criteria.
Page 5 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10
MHI-5
80
60
MCS
Cutpoint
40
20
0
Case Prevalence
Relationship
Figure 4 between prevalence and MHI-5 and MCS cutpoints for four optimisation methods
Relationship between prevalence and MHI-5 and MCS cutpoints for four optimisation methods. 1. Case preva-
lence is altered by varying the cutpoint used to define caseness on the GHQ-12 from 1 to 12. 2. Solid line denotes the Youden
Index. 3. Dashed line denotes the (0,1) method. 4. Dotted line denotes the minimising the error rate method. 5. Dashed and
dotted line denotes the prevalence matching method. 6. The minimax method is excluded since it is predominantly coincidental
with the prevalence matching method.
Page 6 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10
Weinstein et al [14] drew attention to the fact that the They used a less severe CMD case criterion of two or more
comparative nature of the GHQ-12 response choices is on the GHQ-12, giving a case prevalence of 22.8%. The
not conducive to detecting chronic disorders. A subject MHI-5 cutpoint which matched this prevalence most
suffering from chronic anxiety disorder may well answer closely was 72, resulting in a case prevalence of 20.6%. To
the question "Have you recently lost much sleep over illustrate how this approach can lead to different results in
worry?" with the response choice "no more than usual", if different populations, we carried out the equivalent pro-
their condition is a long-standing one. The MHI-5 and cedure in the BHPS dataset. Using a GHQ-12 caseness cri-
MCS avoid this problem by employing less comparative terion of two or more classifies 32.9% of the dataset as
response choices. Another explanation for the lack of cases. The MHI-5 cutpoint which best matches this preva-
complete agreement between the GHQ-12 and the two lence is 76 (providing a case prevalence of 36.2%).
SF-36 mental health measures is that they were designed
differently. The MHI-5 includes one or more questions on One small study compared four psychiatric case-finding
each of the following mental health dimensions: anxiety, instruments in 69 patients presenting to general practice
depression, loss of behavioural/emotional control and in Wales and chose cutpoints which provided an unde-
psychological well-being [3], while the MCS is a weighted fined "similar sensitivity and specificity values for each
sum of all eight health dimensions of the SF-36. The instrument" [15]. The Revised Clinical Interview Schedule
GHQ-12 on the other hand includes items on depression, was used to define a case of CMD. This study identified an
anxiety, social performance and somatic complaints [2]. MHI-5 cutpoint quoted as 71/72.
However, the high correlations between the GHQ-12 and
both the MHI-5 and the MCS indicate that, despite these A report published in Dutch compared the MHI-5 with
differences, the three scales perform very similarly. This the GHQ-12 in order to ascertain a cutpoint [16]. They
can be seen in Table 1 where the five optimisation meth- sampled 7,065 independently living individuals aged 18
ods produce cutpoints on both scales with very similar to 64 from the general population. A score of two or more
properties in terms of sensitivity, specificity, positivity rate on the GHQ-12 was used to define caseness, which classi-
and error rate. As mentioned previously the correlation fied 24.4% of the population as a case. They used the
between the MHI-5 and the MCS is high at 0.88. Youden Index and prevalence matching methods. The
Youden Index indicated an MHI-5 cutpoint of 72, leading
More generally, this study has found that the minimax to a case prevalence of 22.8%. The Composite Interna-
method and prevalence matching methods give very sim- tional Diagnostic Interview (CIDI) was used to determine
ilar results. Indeed, in this study they produce identical whether individuals suffered from any of the following
cutpoints. This is not a coincidence, as the two criteria disorders: depression, bipolar disorder, dysthymia, panic
become equivalent if the scale in question is continuous disorder, agoraphobia, specific phobia, social phobia,
(and the probability of caseness is calculated from the generalised anxiety disorder, obsessive compulsive disor-
same dataset). der, schizophrenia, anorexia and bulimia. The percentage
of the population diagnosed with at least one of these dis-
Investigators should give careful consideration to which orders was found to be 12.2%. The MHI-5 cutpoint which
of these cutpoints is most appropriate for their study, matched this prevalence most closely was 60, producing a
since selecting which criterion should be optimised case prevalence of 11.2%.
depends primarily on the intended application of the
resulting cutpoint. For instance, a study whose primary Three other studies have defined a cutpoint by comparing
goal is to identify cases in a given locality might do well to MHI-5 scores with a range of different validated clinical
minimise the misclassification rate. However, a study interview schedules. These are summarised in turn below.
interested in comparing CMD internationally should con- The wide range of cutpoints found reflects the wide variety
sider utilising the Youden Index or the (0,1) method, as in sample sizes, study settings and outcomes of interest.
these methods are most appropriate when the study area
encompasses regions with different case prevalences. A study of 95 non-psychiatric patients who were HIV sero-
Prevalence matching has the advantage of simplicity but positive used the Structured Clinical Interview for DSM-
will inevitably lead to different cutpoints in different pop- III-R (SCID-NP-HIV) psychiatric disorders [11] and found
ulations. The minimax method approximates to preva- a cutpoint of 52 using the (0,1) method. This study was
lence matching when the scale in question is continuous. investigating more severe disorders than the CMD and so
produced a very low cutpoint. Applying this cutpoint to
Comparison with Previous Studies the BHPS dataset would identify only 8.3% of the individ-
One study of 7,359 adults representative of the Dutch uals as cases. A study of 4,036 German nationals resident
general population used the GHQ-12 to derive a MHI-5 in an area of approximately 50 km in diameter surround-
cutpoint using the prevalence matching method [13]. ing Lubeck used the Munich Composite International
Page 7 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10
Diagnostic Interview (M-CIDI) and found a cutpoint of achievable. Administering the GHQ-12 is, by comparison,
65 [17]. This study used the (0,1) method. This low cut- inexpensive and efficient. Also, it can be argued that the
point can be attributed to the fact that the M-CIDI is used GHQ-12 is a well validated and reliable scale, with vali-
to diagnose DSM-IV Axis 1 psychiatric disorders which are dated cutpoints, and as such is a reasonable instrument
more extreme conditions than the common mental disor- against which to measure other scales.
ders.
A further potential criticism concerns the crude nature of
Another study investigated the validity of the MHI-5 for cutpoints resulting in the loss of information in the varia-
assessing major depression using 1,444 functionally ble being dichotomised. One method which seeks to
impaired, community dwelling elderly Americans. The avoid this problem is to use stratum-specific likelihood
gold standard against which the MHI-5 was compared was ratios (SSLRs) [19], defined as the ratio of the probability
the MINI-International Neuro-Psychiatric Interview of a given test result when the disease is present and the
Major Depressive Episode (MINI-MDE) module. The probability of the same test result when the disease is
Youden index optimisation criterion produced a cutpoint absent. Instead of plotting these against one another as in
quoted as 59/60. Again, the study focussed on major a ROC curve, the SSLR approach examines the ratio of the
depression and so produced a lower cutpoint than the cut- two. These SSLRs can be calculated for each possible score
point of 80 indicated by our paper. on the scale in question. Nomograms can then be con-
structed providing the probability that a given individual
A Norwegian study used the MHI-5 as the gold standard is a case depending on their score. While this approach
to define cutpoints for a different measure [18]. Postal certainly retains more information than a simple cutpoint
questionnaire surveys with MHI-5 information were and may be intuitively appealing, it does not avoid the
returned by 6865 (70.5% response rate) individuals and problem of having to choose a cutpoint in many situa-
cutpoints of 52 and 56 were used successively. tions since the SSLRs may still need to be summarised for
practical purposes. Furthermore, the SSLR is more useful
To our knowledge no study has attempted to identify a for diagnostic purposes, while both the GHQ-12 and the
cutpoint of the MCS. MHI-5 are intended as screening tools as opposed to diag-
nostic tools. The use of depression/anxiety or case finding
Strengths and Limitations of the Study instruments has limited impact on the recognition, man-
This study compares two measures derived from a ques- agement or outcome of depression/anxiety in primary
tionnaire that is frequently employed in population care [20,21]. As such, a cutpoint on either scale is only
research [18], using a large, representative sample of the appropriate for use in research on populations and would
UK population and five different optimisation criteria. not be suitable for diagnostic purposes.
The main strength of our study is that the sample size used
is nearly twice that of the next largest study. So, while it is In the year that wave 9 of the BHPS was carried out
well known that optimal cutpoints vary as a function of (2000), an updated version of the SF-36 was released [5].
the population being investigated, severity of caseness, The MHI-5 used in the SF-36 version two excludes the
case prevalence and gold standard employed, this study response choice "a good bit of the time" as validation
compares the cutpoints derived from five optimisation studies found that this choice was not consistently
criteria, for the MHI-5 and the MCS. This facilitates an ordered in relation to the other categories [22]. However,
objective and comprehensive assessment of the best cut- it has been shown that there is little difference in the per-
point for use in different studies. formance of the six and five choice response scale [5,17].
Page 8 of 9
(page number not for citation purposes)
BMC Psychiatry 2008, 8:10 http://www.biomedcentral.com/1471-244X/8/10
lation, spanning different countries, are required to con- disorders as gold standard. Psychiatry Research 2001,
105:243-253.
firm our findings. 18. Strand BH, Dalgard OS, Tambs K, Rognerund M: Measuring the
mental health status of the Norwegian population: a com-
Competing interests parison of the instruments SCL-25, SCL-10, SCL-5 and MHI-
5 (SF-36). Nordic Journal of Psychiatry 2003, 57(2):113-118.
The author(s) declare that they have no competing inter- 19. Furukawa T, Goldberg D, Rabe-Hesketh S, Ustun T: Stratum-spe-
ests. cific likelihood ratios of two versions of the General Health
Questionnaire. Psychological Medicine 2001, 31:519-529.
20. Gilbody S, House A, Sheldon T: Routinely administered ques-
Authors' contributions tionnaires for depression and anxiety: systematic review.
DLF was responsible for the inception and design of the British Medical Journal 2001, 322:406-409.
21. Gilbody S, House A, Sheldon T: Screening and case finding instruments
study. The analysis was performed by MJK, supervised by for depression. The Cochrane Database of Systematic Reviews Wiley;
FDD. MJK drafted the paper. All authors contributed to 2005.
the intellectual content of the paper as well as offering 22. Keller S, Ware J, Gandek B, Aaronson N, Alonso J, Giovanni A,
Bjorner J, Brazier J, Bullinger M, Fukuhara S, Kaasa S, Leplege A, San-
revisions on all drafts. KL was especially involved in com- son-Fisher R, Sullivan M, Wood-Dauphine S: Testing the equiva-
menting upon the psychiatric component to the paper. All lence of translations of widely used response choice labels:
results from the IQOLA project. Journal of Clinical Epidemiology
authors read and approved the final manuscript. 1998, 51:933-944.
Page 9 of 9
(page number not for citation purposes)