Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Journal of Consulting and Clinical Psychology Copyright 2001 by the American Psychological Association. Inc.

2001, Vol. 69. No. 6. 875-899 0022-006XV01/S5.00 DOI: 10.1037//0022-006X.69.6.875

A Multidimensional Meta-Analysis of Treatments for Depression,


Panic, and Generalized Anxiety Disorder: An Empirical Examination
of the Status of Empirically Supported Therapies
Drew Westen and Kate Morrison
Boston University

The authors report a meta-analysis of high-quality studies published from 1990-1998 on the efficacy of
manualized psychothetapies for depression, panic disorder, and generalized anxiety disorder (GAD) that
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

bear on the clinical utility and external validity of empirically supported therapies. The results suggest
This document is copyrighted by the American Psychological Association or one of its allied publishers.

that a substantial proportion of patients with panic improve and remain improved; that treatments for
depression and GAD produce impressive short-term effects; that most patients in treatment for depression
and GAD do not improve and remain improved at clinically meaningful follow-up intervals; and that
screening procedures used in many studies raise questions about generalizability, particularly in light of
a systematic relation across studies between exclusion rates and outcome. The data suggest the impor-
tance of reporting, in both clinical trials and meta-analyses, a range of outcome indices that provide a
more comprehensive, multidimensional portrait of treatment effects and their generalizability. These
include exclusion rates, percent improved, percent recovered, percent who remained improved or
recovered at follow-up, percent seeking additional treatment at follow-up, and data on both completer and
intent-to-treat samples.

The past decade has seen increasing efforts to create evidence- That psychotherapy training and practice in the 21st century
based practice guidelines in both clinical psychology and psychi- will, and should, be evidence-based is indisputable (Kopta, Lue-
atry (see Nathan, 1998). Spurred in part by the publication of a task ger, Saunders, & Howard, 1999; Nathan et al., 2000; Roth &
force report by the Division of Psychotherapy (Division 12) of the Fonagy, 1996). The fact that many clinicians are content to prac-
American Psychological Association (APA) in 1995, many re- tice without reference to research that might refine their therapeu-
searchers have called for psychotherapy practice and training to be tic technique, as well as widely held beliefs in the popular culture
limited to treatments that have demonstrated efficacy in random- and in many circles in psychiatry that medication should be the
ized controlled trials (Calhoun, Moras, Pilkonis, & Rehm, 1998; first-line treatment for disorders such as panic and depression (see
Chambless & Hollon, 1998; Persons & Silberschatz, 1998). These DeRubeis, Gelfand, Tang, & Simons, 1999), provided a powerful
treatments, typically referred to as empirically validated therapies, and appropriate impetus for the efforts of the Division 12 task
or empirically supported therapies (ESTs; Kendall, 1998), differ force to create procedures for separating the therapeutic wheat
substantially from treatments conducted by many practicing clini- from the chaff. The question is not whether practice should be
cians. A major shift in training is nearing completion in university- evidence-based but what kind of evidence should be the basis of
based clinical psychology programs, as advocates of traditional practice.
psychotherapies retire and have in large part passed the torch to
Although the APA task force did not exclude designs other than
younger, more empirically informed colleagues who share the
randomized controlled trials from their 1995 report, over the
sentiment of Calhoun et al. (1998) that the longer-term, more
past 20 years psychotherapy researchers have come to a consensus
exploratory therapies of a previous era are "less essential and
that the gold standard in treatment research is a straightforward
outdated" (p. 151).
application of experimental method as used in other areas of
psychology, aimed at allowing researchers to draw causal infer-
ences. In this design, patients are screened for inclusion to maxi-
Drew Westen and Kate Morrison, Center for Anxiety and Related mize homogeneity and minimize the presence of co-occurring
Disorders and Department of Psychology, Boston University. conditions that could render findings difficult to interpret. Treat-
Preparation of this article was supported in part by Grants MH59685 and ments are designed for a single disorder, typically an Axis I
MH60892 from the National Institute of Mental Health to Drew Westen. condition such as major depression, panic, or social phobia, rather
We thank David Barlow, Glen Gabbard, Robert Rosenthal, Laura than for nonspecific or multiple problems. Treatments are rela-
Arkowitz-Westen, and Sherwood Waldron for their comments on a draft of
tively brief, of fixed duration, and manualized, so that within-
this article.
Correspondence concerning this article should be addressed to Drew treatment variance can be minimized, allowing tight experimental
Westen, Center for Anxiety and Related Disorders and Department of control.
Psychology, Boston University, 648 Beacon Street, 6th Floor, Boston, In many respects, these characteristics exemplify good sci-
Massachusetts 02215. Electronic mail may be sent to [email protected]. ence. Recently, however, a number of researchers have begun to

875
876 WESTEN AND MORRISON

express concern about some of the limitations of this method, ered. Another appropriate numerator is the number of patients
particularly regarding the balance between internal and external whose scores at termination and follow-up fall within a predefined
validity (e.g., Goldfried & Wolfe, 1995, 1998; Ingram, Hayes, range (e.g., one or two standard deviations) of published norms in
& Scott, 2000; Seligman, 1995). In this article, we address some nonclinical samples (see Kendall & Sheldrick, 2000).'
of these concerns empirically, by examining the findings of Still another, even more conservative, meaning of efficacious
high-quality studies of three highly prevalent disorders, focus- treatment or clinically significant improvement is complete recov-
ing on variables that bear on clinical utility and external valid- ery; that is, absolute absence of symptoms (Jacobson et al., 1999).
ity. We describe the results of what we will call a multidimen- For depression, a researcher could choose a cutoff of, for exam-
sional meta-analysis, which presents a range of statistics ple, 6 on the BDI, below which no one could argue that a treated
bearing on outcome, including but not limited to effect size, that patient remains depressed. Absolute absence is probably most
we believe are important in assessing the strengths and limita- appropriate for symptoms such as panic, or purging in bulimia,
tions of treatments of psychological disorders. which the patient either continues or does not continue to have.
An equally important question about percent improved pertains
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Key Distinctions not to the numerator of the equation (the number improved) but to
This document is copyrighted by the American Psychological Association or one of its allied publishers.

the denominator (what we consider the relevant N). The most


We draw attention here to four distinctions that we believe are important distinction is between intent-to-treat and completer anal-
essential in drawing accurate inferences from the data: the multiple yses (see Kendall, 1999). Until very recently, researchers have
potential meanings of efficacy, initial response versus sustained often chosen which of these to report, and many have opted to
efficacy, treatment of states versus treatment of disorders, and report data on completers only. Although this is clearly an impor-
empirically unsupported versus empirically untested therapies. tant metric, it only tells part of the story, particularly if dropouts
are substantial. This metric also tends to favor drug treatments,
The Multiple Meanings of Efficacy: because it removes from the analysis participants who had trouble
How Do We Measure Success? tolerating side effects.
Another potential meaning of N is the number who entered
Particularly in the past decade, psychotherapy research has seen treatment; that is, examining the intent-to-treat sample. Intent-to-
an enormous increase in the sophistication with which outcome treat analyses increase the denominator and hence deflate esti-
has been conceptualized, measured, and analyzed (Haaga & Stiles, mates of efficacy for treatments with substantial dropout rates.
2000; Kendall, 1999). Perhaps the most basic way of measuring Even with a 20% dropout rate, which is relatively small, including
success is to compare mean outcome in different conditions and to dropouts in the denominator can push an apparent 50% success
test for statistical significance. Mean differences are particularly rate to 40%—leading to the conclusion that the treatment helps
useful in that their magnitude can be summarized meta- only a minority of patients who begin it.2 Intent-to-treat analyses
analytically with measures of effect size. In treatment research, the are particularly important in comparing the effects of psychother-
most common index of magnitude of effect is the difference apy and medication trials (because many patients have difficulty
between the means of the experimental and control groups divided tolerating side effects of medications), but they can also sometimes
by the standard deviation (pooled, at Time 1, or at Time 2), produce different conclusions than completer analyses even when
yielding an effect size estimate (in this case, Cohen's d) in comparing psychotherapies (e.g., Foa et al., 1999).
standard-deviation units. A still more conservative estimate, which might be called effec-
Because effect size does not yield information on variability of tive efficacy, describes the range of values within which one can
response—for example, on the percent of patients who experience expect a treatment to be effective for the typical patient in clinical
clinically significant improvement—a second measure is percent practice with a given symptom or disorder, as extrapolated from
improved. Complicating matters, however, are some important efficacy trials. To calculate the lower bounds of generalizability of
decisions about what to include in both the numerator and denom- the treatment, the researcher can choose as the denominator the
inator of the equation (percent improved = number improved/ number of patients screened with symptoms of the disorder
N)—decisions that can lead to very different conclusions. (whether or not they met thresholds defined by the Diagnostic and
Questions about how to choose a numerator have been central to Statistical Manual of Mental Disorders [4th ed., DSM-IV; Amer-
the extensive literature on clinical significance of change in psy- ican Psychiatric Association, 1994]) or, more usefully, the number
chotherapy (e.g., Jacobson, Roberts, Berns, & McGlinchey, 1999;
Jacobson & Truax, 1991; Kendall, Marrs-Garcia, Nath, & Shel- 1
drick, 1999). One common choice is to represent improvement in A problem with the widely used criterion of 2 standard deviations
terms of the number of patients whose scores on outcome mea- above the mean of nonclinical samples is that it ignores base rates of the
disorder in the population. If 5%-l 0% of the population is depressed at any
sures now fall some number of standard deviation units below the
given time, individuals with BDI scores 2 standard deviations above the
pretreatment mean or published means on clinical samples, or mean in nonclinical samples are likely to be depressed.
some number of units away from their own pretreatment score, 2
An even more conservative way of estimating efficacy is to use the
adjusted for standard error of measurement (Jacobson & Truax, number of participants randomized as the denominator, on the assumption
1991). For example, a patient whose posttreatment score is two that attrition between randomization and entry into the study could reflect,
standard deviations below published norms for depressed samples in part, patients' initial attitudes toward the treatment on the basis of what
on the Beck Depression Inventory (BDI; Beck, Ward, Mendelson, they have understood from the investigator about its goals, methods,
Mock, & Erbaugh, 1961) might be considered improved or recov- duration, and so forth.
MULTIDIMENSIONAL META-ANALYSIS 877

screened who had the disorder (if this is available from the re- gone unchallenged (Tang & DeRubeis, 1999), naturalistic studies
search report) prior to application of exclusion criteria such as have typically produced similar findings (Howard, Lueger, Mai-
comorbid conditions. Use of the number of patients screened ing, & Martinovich, 1993), and recent research on bulimia has
(whether or not they were included) as the denominator is clearly similarly found that patients whose symptoms do not decrease by
overly conservative, but it is useful for estimating the absolute 70% in the first six sessions are unlikely to respond to brief
lower limit of clinical generalizability because clinicians in every- psychotherapy in clinical trials (Agras et al., 2000; see also Wil-
day practice do not have the luxury of screening out patients who son, 1999).
they have reason to believe will not respond. The higher the A key question is whether the treatment effects that occur before
exclusion rate in efficacy research, and the more the exclusion rate the 1st session, within the first 5 to 6 sessions, or over the complete
is correlated with positive outcome, the more useful this lower course of a brief (6-to-20 session) treatment are lasting. In other
limit becomes. For most controlled trials, the upper bound of areas of medicine a distinction between initial response and what
effective efficacy would be the familiar intent-to-treat value. A is usually called efficacy is commonplace. A treatment for HIV
higher upper limit estimate would be appropriate, however, if that leads to initial suppression of the virus but does not continue
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

researchers deliberately excluded patients likely to succeed; for to inhibit the virus beyond a few months would not be considered
This document is copyrighted by the American Psychological Association or one of its allied publishers.

example, if they studied treatment-refractory cases. efficacious, although it might be used in combination with another
A third measure of efficacy is the percent of patients who seek treatment if the two together proved beneficial. As we will see, the
additional treatment. A considerable debate exists on how to think limited data on long-term outcome of ESTs suggest that initial
about patients who seek further treatment (Kendall, 1999), as some response may bear little relationship to efficacy at clinically mean-
patients may seek further treatment because they found the treat- ingful follow-up intervals (Shea, Elkin, et al., 1992; Snyder, Wills,
ment they received useful. In general, however, patients who are & Grady-Fletcher, 1991). Apropos are data from effectiveness
satisfied with the gains they have made or who are genuinely free studies suggesting that longer-term treatments may produce more
of their psychological symptoms are, presumably, unlikely to seek robust effects and may be useful in the treatment of the polysymp-
further treatment, unless they have other comorbid conditions that tomatic presentations that are the norm in clinical practice (Kopta
were left untreated or their treatment length was inadequate. et al., 1994; Morrison & Westen, 2000; Seligman, 1995).
A fourth, and often overlooked, measure of efficacy is the
absolute magnitude of mean symptoms at termination or follow- Treatment of States Versus Treatment of Disorders
up. A treatment for depression might appear efficacious enough to
recommend it as the treatment of choice if it produces a strong A related distinction is between the treatment of states and the
effect size (e.g., d = 0.8), but this conclusion may not be warranted treatment of disorders. A trip to the emergency room can be highly
if the average patient continues to suffer substantial, if substan- efficacious for the treatment of intense despair and suicidality, but
tially diminished, depressive symptoms. Such a finding would not it is not a treatment for a longstanding or intermittent depressive
only suggest incomplete treatment but also might predict relapse. disorder or for a personality disorder that provides a diathesis for
None of these indexes is inherently more valid than the others, depression. Data on the natural course of depression suggest that
and none should be taken as the sole index of a treatment's major depressive episodes (depressive states) resolve in gradual
efficacy. A treatment that achieves 90% success with the 30% of stepwise reductions of symptoms over time, with 54% of patients
patients who find it useful enough to complete is efficacious for a recovering at 6 months, 70% within 1 year, and 88% by 5 years
subset of the population, just as a treatment that produces an effect (Keller, Lavori, Mueller, & Endicott, 1992). However, the risk of
size of 1.2 relative to placebo may be useful even if it does not repeated episodes exceeds 85% over 10 to 15 years (Mueller et al.,
restore many patients to health, if other treatments are unlikely to 1999). Individuals afflicted with major depressive disorder will
do better. In the present study, we report data using each of these experience, on average, four lifetime major depressive episodes
indexes, providing a range of values on the efficacy of ESTs for of 20 weeks duration each and are likely to experience a variety of
depression, panic, and generalized anxiety disorder (GAD). This other depressive symptoms at other points (Judd, 1997).
allows readers to consider the range of conclusions that can be Of relevance is research by Howard, Kopta and colleagues
drawn from the existing literature. (Howard et al., 1993; Kopta et al., 1994) on phases in treatment
associated with different processes of change. Howard et al. (1993)
Initial Response Versus Sustained Efficacy distinguished acute distress, which tends to resolve quickly;
chronic distress, which resolves more slowly; and characterolog-
A second key issue is the distinction between initial response ical problems, which tend to require the most time to address.
and sustained efficacy (that is, sustained improvement over time). After an initial "remoralization" period, in which the patient's
Many psychological disorders, such as depression and GAD, show hope is restored, symptom relief tends to occur in a "remediation"
some initial response to virtually any kind of psychosocial inter- period that typically lasts approximately 16 weeks in naturalistic
vention. In naturalistic studies, 15% of patients improve signifi- studies. The researchers suggest that enduring "rehabilitation" of
cantly after making the initial call to a therapist's office and before longstanding problems is likely to require substantially longer,
attending a first session (see Kopta, Howard, Lowry, & Beutler, depending on the patient's degree and type of characterological
1994). Further, Ilardi and Craighead (1994) reported that most of impairment.
the treatment effects demonstrated in studies of cognitive therapy To what degree the same techniques are effective in treating
for depression occur by the fifth session, with treatment effects states (acute anxiety, depression, etc.) versus disorders (GAD,
leveling off asymptotically after that. Although their data have not recurrent major depression, etc.) or diatheses for those disorders is
878 WESTEN AND MORRISON

unknown. The mechanisms of change may or may not be the same, common disorders—depression, panic, and GAD—and provide
and data on initial response or follow-up at brief intervals may be the entire range of efficacy coefficients described above (effect
of little relevance to the question of whether a treatment is effica- size, percent treatment success indexed using different numerators
cious for treating a recurrent disorder, particularly one for which and denominators, etc.). We also address a series of questions
personality pathology provides an underlying vulnerability (see, bearing on external validity; most important, what percent of
e.g., Westen & Hamden-Fischer, 2001). Only long-term follow-up patients are screened out and does a relationship exist between
data can answer that question. As we show, such data are difficult exclusion rates and measures of efficacy?
to find even for disorders about which statements are common in
the literature about treatment of choice. Method
Indeed, a treatment that is efficacious in the long run may be
deleterious in the short run, or vice versa. Empirically, disclosure Selection of Studies
of painful events initially leads to both subjective pain and immune To maximize the quality of studies (and to limit the magnitude of the
suppression; however, disclosure ultimately produces improve- task), we identified a sample of studies on psychotherapy for adults with
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

ment in both mood and immune functioning (Pennebaker, 1997). If diagnoses of depression, panic (with or without agoraphobia), and GAD
This document is copyrighted by the American Psychological Association or one of its allied publishers.

treatment is limited to a few weeks and success measured at the using a manual and computer search of high-quality, high-impact journals
end of that time, treatments likely to be highly beneficial in the (selected a priori, before examining any data) that routinely publish effi-
long run may well appear inferior to other treatments, or even cacy research. We included studies published in the decade of the 1990s
appear iatrogenic. On the basis of data obtained at limited (until 1999, when the data were analyzed) from the following journals (in
follow-up intervals, researchers using dismantling designs, for alphabetical order): The American Journal of Psychiatry, Archives of
example, could easily abandon useful techniques and preserve General Psychiatry, Behavior Research and Therapy, Behavior Therapy,
The British Journal of Psychiatry, Cognitive Therapy and Research, The
others that are less efficacious in the long run.
Journal of Consulting and Clinical Psychology, The Journal of Psycho-
therapy Practice and Research, Psychotherapy, and Psychotherapy Re-
Empirically Unsupported Versus Empirically Untested search. Inclusion of only studies of the most recently refined treatments
A final distinction is between empirically unsupported treat- that used current methods (that is, beginning the search in 1990, rather than
1980 or 1970) and published in major publication outlets for efficacy
ments—those that have been tested and demonstrated to lack
studies (rather than including all published and unpublished studies, in-
efficacy—and empirically untested treatments—those that have
cluding file-drawer studies that often reduce effect size estimates; see
not been tested. To infer that one treatment is more efficacious Rosenthal, 1991) means that the findings can only be generalized to
than another because one has been subjected to empirical scrutiny research that is methodologically relatively strong. Because of the paucity
using a particular set of procedures and the other (e.g., long-term of studies uncovered with this method for GAD, we conducted an exhaus-
exploratory therapies) has not is a logical error, and a common one tive computer search of Psychological Abstracts, using the key word
in the literature. To put it another way, we need to be careful to generalized anxiety disorder, which turned up two additional articles.
distinguish empirically invalidated from empirically invalidated To be included in this review, a study had to test the efficacy of a
treatments. specific psychosocial treatment against a waiting-list control condition, an
Sometimes, of course, researchers do not test certain therapies alternative psychotherapy, a pharmacotherapy, or some combination of
these.3 We included both initial publications and follow-up studies, pro-
because clinical experience suggests that they are not likely to be
vided the follow-up interval was 12 months or longer (an interval we chose
efficient or effective (e.g., psychoanalysis for fear of dogs follow-
because of its clinical meaningfulness for disorders such as major depres-
ing a dog bite). We suspect, however, that the differences between sion that tend to remit spontaneously after 20 weeks and show high rates
tested and untested treatments too often rest on less rational or of recurrence within 2 to 5 years). To be included, studies also had to
empirical bases. For example, the fact that interpersonal therapy include valid measures of outcome for the primary symptom4 and to be
(IPT) is the only empirically supported treatment for bulimia other experimental in design (including randomized patient assignment, stan-
than cognitive-behavioral therapy (CBT) has less to do with dardized treatments, and blind outcome assessment); hence, studies on the
clinical experience (e.g., that large numbers of clinicians had been naturalistic end of the continuum were excluded (e.g., Luborsky et al.,
practicing IPT with patients with bulimia and found that it had 1996). We also excluded studies if they (a) represented reanalyses of data
substantially improved their results) than with the facts that (a) the
infrastructure was already in place to begin studying it (e.g., by 3
Outcome studies examining maintenance therapies were not included
adapting existing manuals) and (b) its proponents understand the
because of the lack of comparability of findings. The small number of
importance of rigorous scientific investigation of treatment re- studies assessing such therapies typically test the impact of infrequent,
sponse. As we argue below, one might do well to apply scientific ongoing therapeutic contact where the goal is primarily prophylactic. The
methods to the selection of treatments to test rather than assume majority of these studies focused on depression, and most found that about
that the best way to separate therapeutic gold from inert or iatro- two-thirds of patients relapsed by 2-3 years (Frank & Kupfer, 1993; Frank
genic lead is to use scientific methods to test hypotheses about a et al., 1990). Also excluded were composite studies where it was not
narrow range of treatments preselected for reasons that may be explicit which treatments which patients had received (e.g., Chambless &
relatively arbitrary. Williams, 1995), as well as studies that were primarily medication trials
and did not include a placebo or a placebo plus psychosocial comparison
group.
The Present Study 4
A number of studies of panic disorder neglected to report inclusion of
This article reports a multidimensional meta-analytic investiga- any direct measure of panic symptoms (e.g., Rijiken et al., 1992; Telch et
tion designed to examine the evidentiary basis of ESTs for three al., 1993; Wade et al., 1993).
MULTIDIMENSIONAL META-ANALYSIS 879

already included in the meta-analysis, (b) were limited to highly specific our primary analyses, we used whatever definition the researchers reported,
subtypes or subpopulations of the disorder in question (e.g., depressed to maximize, once again, the likelihood of positive findings, because
patients who somatize, elderly patients, adolescent patients, or fundamen- researchers tend to put their best foot (or data) forward. We also coded the
talist Christians), or (c) were not primarily face-to-face psychosocial treat- data for stringency of improvement criteria using a 1-4 scale and, in a
ments (e.g., self-administered therapies such as bibliotherapy, biofeedback, secondary analysis, correlated stringency with percent improved across
telephone or computer-administered treatment, or didactic educational studies. This yielded no significant correlations for any of the three
interventions administered in large groups). disorders, suggesting that the collapsed data provide a reasonable summary
Within these parameters, then, we included studies if they met the metric.8
following criteria: The study (a) was published between 1990-1998, (b) With respect to the denominator, we provide data on percent improved
was published in English, (c) had an experimental design, and (d) included in three ways: as the number of patients considered improved by the
outcome measures specifically targeted to symptoms of depression, panic, investigators divided by (a) the number who completed treatment; (b) the
or GAD. Follow-up studies that appeared during this time period that number who entered treatment, whether or not they completed (intent-to-
represented continuations of research published before 1990 were included treat sample); and (c) the number of patients screened who appeared to
if they met study criteria (e.g., if they added a follow-up at intervals of 1 have the disorder, where this could be discerned, to provide a lower
or 2 years). Thirty-four studies met our inclusion criteria and are included
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

estimate of generalizability, presuming that researchers were not excluding


in the review (with asterisks in the reference section). Twenty-three others
This document is copyrighted by the American Psychological Association or one of its allied publishers.

patients arbitrarily.
were excluded prior to examination of the data because they did not meet
For measures of posttreatment andfollow-up symptomatology, we report
minimal criteria for randomized controlled trials (e.g., outcome was not
mean scores on outcome measures for each disorder. Because different
assessed masked to experimental condition; see Appendix A).5
outcome measures may vary in their reliability and validity, to calculate
effect size, we used a hierarchical procedure, giving priority to the most
Procedure widely used, reliable, and valid measures of each symptom and to those
with the least demand characteristics (i.e., masked ratings made by objec-
It is worth noting at the outset that decisions about how to code or define tive observers). Thus, for depression, we used scores from the Hamilton
variables reflected our consistent effort to give the treatments under con- Rating Scale for Depression (HRSD; Hamilton, 1967); when this measure
sideration the "benefit of the doubt." We chose to do this because we had was not available, we used the BDI. For GAD, we used the Hamilton
undertaken this study from a critical perspective; were aware of pervasive Anxiety Rating Scale (HARS; Hamilton, 1959) first, followed by the
allegiance effects in psychotherapy research (including the likelihood of State-Trait Anxiety Inventory—Trait version (STAI-T; Spielberger, Gor-
our own); and, hence, wanted to prevent where possible any intrusion or such, & Lushene, 1970). (In secondary analyses, we compared effect sizes
appearance of bias. Thus, at each step we attempted to make methodolog- using these different instruments, which turned out to have no impact on
ical choices, prior to examination of the data where possible, that maxi- the findings.) For panic disorder, for which researchers used a variety of
mized efficacy estimates. measures, we relied, first, on the patient's self-reported frequency of panic
We report meta-analytic data for each disorder at each of three assess- attacks and, second, on the mean number of panic symptoms endorsed.
ment periods: termination, 12-18-month follow-up, and 24+-month Where none of these measures was available, we selected the outcome
follow-up. Variables assessed (described more thoroughly below) included measure that produced the largest effect size, again maximizing the like-
number of participants, percent of patients who met initial criteria who lihood of obtaining positive outcomes. For panic, as for the other disorders,
were included (screened into the study), percent of patients who completed these different measures tended to yield equivalent effect size estimates.
treatment, percent of patients who improved with treatment, percent who To measure effect size, we used Cohen's d (Cohen, 1988), which was
remained improved at each follow-up interval, effect size, mean posttreat- calculated using the following formula: (mean of treatment group - mean
ment symptomatology at each follow-up interval (e.g., mean BDI scores at of control groupVpooled pretest standard deviations. We used the pretreat-
termination), and percent at follow-up who sought additional treatment. ment standard deviations, once again, to maximize effect size. Experimen-
Because we were interested in what the average patient can expect to
tal conditions were considered controls if (a) they were not recognized as
receive from the average EST, and because research by Luborsky et al.
an established treatment (e.g., the investigators created an "expressive
(1999) and others has documented that up to 70% of the variance in
therapy" condition to compare with the experimental treatment of interest);
outcome across state-of-the-art efficacy studies can be predicted by inves-
(b) the investigators made no attempt to standardize treatments in those
tigator allegiance, data reported here are collapsed across all active treat-
conditions; or (c) the authors explicitly referred to them as a placebo,
ments in each study, excluding conditions described as control conditions
or interventions that included pharmacotherapy. Appendixes B-D list each
study, its active and control conditions, and the data we extracted and 5
For example, we excluded Fava et al. (1995) because the investigators
analyzed.
used "definite" panic attack as their primary outcome measure but speci-
With respect to specific variables that may require clarification, number
fied no criteria for coding attacks as more or less definite, relied on the
of participants refers to the number of people who actually began treatment
determinations of a nonmasked rater, and included no self-report measures
(i.e., the number who were randomized to any given treatment condition
that would permit estimation of agreement between participants and the
minus those who never attended the first treatment session).6 Percent
nonmasked clinician.
completed refers to the percentage of patients who completed the treat- 6
ments. Percent included describes the percentage of patients who survived This served to make our final estimate of completion rate a comparably
the screening process, which typically occurred after a patient was referred liberal one, given that we did not factor in the attrition between random-
or self-referred for the disorder under investigation and often after a ization and commencement of treatment.
7
telephone screen. Researchers frequently did not provide either of these Ns, A table listing criteria for improvement in each study is available from
and rarely reported both. Where possible, we chose the number screened Drew Westen.
8
out after an initial screen, to provide the most generous estimate of It is worth noting that very few investigators reported whether their
treatment effects. criteria for improvement were selected a priori or after examining the data.
With respect to percent improved, definitions of improvement varied This information is unfortunately essential for evaluating statistical signif-
substantially from study to study, a problem we addressed in two ways.7 In icance of any findings.
880 WESTEN AND MORRISON

control, or "treatment-as-usual" condition.9 (The only gray area we en- morbid conditions. However, exclusion criteria for all three disor-
countered in making these determinations was the differential use of ders often eliminated more troubled and difficult-to-treat patients,
applied relaxation, which was sometimes described as a treatment and such as patients with borderline features (who are likely, for
sometimes as control condition for depression.) example, to be suicidal and to have substance use disorders).
Finally, because many studies did not include either a control group or
Completion. The percentages of completers in these studies
the descriptive statistics necessary to calculate between-treatment effect
sizes, we also calculated within-treatment effects using the equation (pre-
were relatively high: 74% for depression, 86% for panic, and 84%
test mean — posttest meanVpretest standard deviation. Although widely for GAD.
reported, this statistic is not very meaningful because it confounds genuine
effects with placebo effects, the effects of passage of time, the effects of
Initial Response
common factors, and the fact that people tend to seek treatment when their
distress is very acute and, hence, does not allow the kind of causal Effect size. With respect to effect size, initial response was
inferences that are the virtue of experimental designs. generally impressive. For studies that included and reported ade-
quate data on control groups (n = 13, or 38% of the total sample),
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Results the average effect size at termination (Cohen's d) was small for
This document is copyrighted by the American Psychological Association or one of its allied publishers.

depression but large for panic and GAD (here we report medians,
The results cover 12 studies of depression, 17 studies of panic, because the standard deviations tended to be much larger than the
and 5 studies of GAD (Appendixes B-D). (Thirty-four studies, of means, so that the medians are more readily interpretable):
course, represent the results of more than 34 research reports, Mdns = 0.3, 0.8, and 0.9, respectively. These findings are strong
because for data analytic purposes, all articles describing the same and particularly meaningful because they control for both time and
sample constitute a single study.) Together these studies included at least some nonspecific factors (depending on the nature and
a total of 2,414 participants. We calculated both unweighted and credibility of the placebo condition). Pretest effect sizes were
weighted means (weighting for variables such as sample size and very large (we revert to means here because of substantially
stringency of criteria for determining clinically significant im- smaller relative variance): for depression, M = 2.23 (SD = 0.78);
provement), but as in all cases the results were similar, we report for panic, M = 1.55 (SD = 1.24); and for GAD, M = 2.09
here only the unweighted means. Where appropriate, we also (SD = 0.76). As noted above, however, this metric is difficult to
report medians, to avoid undue impact of outliers. Tables 1-3 interpret because of the number of variables that could account for
summarize the most important findings.10 changes over many months, particularly in the absence of compa-
rable data for participants in placebo control conditions.
Inclusion and Completion Rates Percent improved. Roughly 73% of studies reported on the
percentage of participants classified as improved posttreatment. Of
Inclusion. For all three disorders, the majority of patients were depressed patients who completed treatment, 54% were deemed
excluded from participating in the average study. Inclusion rates improved. The comparable percentages for panic and GAD were
were 32% for depression, 36% for panic, and 35% for GAD. For 63% and 52%, respectively. For the intent-to-treat group (includ-
most studies (across all disorders), researchers appropriately ex- ing those who did not complete), improvement rates were 37% for
cluded patients with psychotic, bipolar, or organic disorders. Ad- depression, 54% for panic, and 44% for GAD." The lower limit
ditional exclusion criteria were as follows. estimate of efficacy, percent improved of those screened (includ-
The prototypical study of treatment for depression excluded ing those not treated because of exclusion criteria), was low (not
patients for suicidality or comorbid substance use disorders. Sev- surprisingly, given high exclusion rates): 14% for depression, 19%
eral studies also excluded patients who had one or more of the for panic, and 10% for GAD.
following: GAD, panic disorder, antisocial personality disorder, Posttreatment symptomatology. A variable that has received
severe obsessional symptoms, schizotypal features, or significant little attention in either primary studies or meta-analytic reviews
physical problems. Several excluded patients if these comorbid bearing on the question of the efficacy of ESTs is whether they
conditions were considered primary but did not define how that actually lead to what most patients and clinicians would consider
determination was made (or report reliability of that determina- cure. At termination, depressed patients averaged 8.68 (SD =
tion). The majority of studies required a diagnosis of major de-
pressive disorder for inclusion.
9
The prototypical exclusion criteria for panic were moderate to We have reported our procedures in such detail and listed raw data for
severe agoraphobic avoidance, any concurrent Axis I or Axis II individual studies in the appendixes, in part to allow researchers whose
studies were included or excluded to assess whether they believe our
disorder in need of immediate treatment, major depression deemed
representations of their studies are appropriate, and in part because of the
primary, and recent previous therapy. In addition, most studies
lack of standardization across studies that characterizes the literature on all
excluded patients for suicidality or substance abuse. three disorders.
The prototypical GAD exclusion criteria included major depres- 10
As can be seen from the tables, the number of studies drops substan-
sion, substance use disorders, and suicidality. Many GAD studies tially for many analyses, particularly at follow-up. Except in extended
also excluded patients with dysthymic disorder, somatic disorders, follow-ups, we only report data in the table when provided by a minimum
panic, past psychosocial treatment, current substance abuse, or of two studies.
obsessive-compulsive disorder. 11
Seven of 24 studies of treatment for panic—29%—did not report
Studies of depression tended to be less restrictive than studies of panic symptoms at termination and hence could not be included in these
panic or GAD, which generally excluded the most common co- analyses.
MULTIDIMENSIONAL META-ANALYSIS 881

Table 1
Meta-Analysis of Outcome at Treatment Termination

Depression Panic Generalized anxiety

Outcome-related variable M SD Mdn Study n M SD Mdn Study n M SD Mdn Study n


Number of participants 92.3 46.6 76.0 12 56.8 38.3 48.0 17 62.6 26.0 65.0
Included (%) 31.9 13.3 31.3 11 35.8 33.1 23.8 7 34.6 21.4 35.4
Completed (%)
Of those entered 74.3 14.1 78.4 12 86.3 11.9 85.4 16 83.7 6.3 83.3
Of those screened 23.5 11.2 20.6 11 34.7 33.0 22.2 6 29.2 17.1 32.3
Improved (%)
Studies reporting 7 58.0 14 82.4 100.0
Of completers 50.8 9.5 50.0 7 63.3 12.2 66.1 14 52.1 21.8 55.3
Of intent-to-treat sample 36.8 9.0 33.0 7 53.8 12.2 52.3 14 43.5 19.1 41.8
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Of those screened 13.6 6.2 13.7 5 19.5 18.8 15.5 6 10.4 5.4 7.8
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Effect size at termination


Studies with control group 5 41.7 7 41.2 60.0
Treatment vs. control 0.5 1.1 0.3 3 0.7" 1.1 0.8 7 1.2 0.6 0.9
Pretreatment vs. posttreatment 2.2 0.8 2.4 8 1.5 1.2 1.0 14 2.1 0.8 1.6
Posttreatment mean symptoms
Depressed patients
HRSD 8.7 6.5 9.2
BDI 11.0 8.6 9.5
Panic patients
Panics/week 0.7 1.2 0.6 15
No. of panic symptoms 4.1 3.5 4.1 2
GAD patients
HARS 11.0 6.2 10.3
STAI-T 47.5 9.3 48.6

Note. HRSD = Hamilton Raring Scale for Depression; BDI = Beck Depression Inventory; GAD = generalized anxiety disorder, HARS = Hamilton
Anxiety Rating Scale; STAI-T = State-Trait Anxiety Inventory—Trait version.
" Percentage of studies reporting. b Mean here reflects data dropping the highest and lowest values because of an uncharacteristically low outlier with a
negative effect size; the unadjusted mean was 0.3.

6.49) on the HRSD and 10.98 (SD = 8.60) on the BDI. Although only 9 experimental studies with follow-up at 12-18 months and
this was a substantial improvement from pretreatment, the degree only 4 with extended follow-up at 24+ months for all three
of continued symptomatology did not constitute a return to mental disorders combined. Of these 13 studies, several have methodolog-
health, and the large standard deviations point to substantial vari- ical problems that render conclusions drawn from them potentially
ability in treatment response. In fact, both of these means are problematic.12
above the criteria used by researchers to indicate clinically signif- 12-18 months. The follow-up data at 12-18 months were
icant depression. For example, Pilkonis, Heape, Ruddy, and Serrao largely similar to the data at termination except that (a) a clearly
(1991) recommended HRSD scores below 6 and BDI scores below observable waxing and waning of symptoms in both improved and
9 as indicative of recovery. Using Jacobson and Truax's (1991) unimproved patients was apparent, which is characteristic of the
criterion of patients' scores being closer to published norms of natural course of the disorders; and (b) of those who initially
nonclinical than of clinical samples, Bouchard et al. (1996) used a improved, in most cases, there was a small-to-moderate decrease
cutoff of 7.86 for the BDI to indicate clinically significant change. in efficacy over time.
Similar findings emerged for panic: The average patient contin- With respect to effect size, data comparing treated and untreated
ued to panic slightly less than once a week (0.74 times per week, groups are rare at 12-18 months, primarily reflecting the ethical
SD = 1.20) and endorsed a total of 4.14 panic symptoms problem of keeping patients treatment free for such an extended
(SD = 3.49) of the 7 required for a DSM-IV diagnosis of panic period. For depression, the single exception, the National Institute
disorder—enough, in fact, to qualify as limited symptom attacks. of Mental Health (NIMH) Treatment of Depression Collaborative
For GAD, the average patient continued to score 11.03 Research Project found no difference between control and treat-
(SD = 6.18) on the HARS and 47.45 (SD = 9.33) on the STAI-T. ment conditions at 18 months. The single GAD study that included
By and large, these findings, relative to published norms (where data on a comparison group without offering them further treat-
available), suggest that the average patient receives substantial
benefit but continues, even at termination, to have mild symptoms
12
of the disorder for which he or she was treated. For example, McLean and Hakstian (1990) created their own outcome
measure and reported no analyses on the final outcome interval, instead
Follow-Up only providing data averaged across multiple follow-up intervals. The data
they did report, however, showed a main effect for time interval to
One of the most striking things about Tables 1-3 is the sheer follow-up significant at p < .001, which was greater than the main effect
lack of data on follow-up at 12 months or longer. We could locate for treatment, which was significant at only p < .05.
882 WESTEN AND MORRISON

Table 2
Meta-Analysis of Outcome at 12-18 Month Follow-Up

Depression Panic Generalized anxiety

Outcome-related variable M SD Mdn Study n M SD Mdn Study n M SD Mdn Study n

Improved (%)
Of completers 67.5 0.7 67.5 2 75.3 12.8 75.8 4 60.1 22.3 60.3 2
Of intent-to-treat sample 71.6 14.9 71.8 4 48.9 21.0 48.9 2
Of those screened 29.8 22.5 19.3 3 9.9 1
Remained improved (%)
Of completers 36.6 14.1 30.4 85.5
Of intent-to-treat sample 28.5 13.3 22.9 73.1
Of those screened 10.1 4.3 9.1 20.3
Effect size
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Treatment (follow-up) vs. control 0.7


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Pretreatment vs. posttreatment 2.5 0.2 2.5 1.0 0.3 1.1 3.3
Mean symptoms at follow-up
Depressed patients
HRSD
BDI 7.8 8.2
Panic patients
Panics/week 0.7 1.4 0.7
No. of panic symptoms
GAD patients
HARS 6.3 5.9
STAI-T 39.6 10.8
% of patients seeking additional
treatment
Patients 27.8 1.1 27.8 35.3 16.8 36.9 44.8 22.2 44.8

Note. HRSD = Hamilton Rating Scale for Depression; BDI = Beck Depression Inventory; GAD = generalized anxiety disorder, HARS = Hamilton
Anxiety Rating Scale; STAI-T = State-Trait Anxiety Inventory—Trait version.

ment found a moderate-to-strong treatment versus control effect BDI, which represents an improvement from the first posttreat-
size of .65. By and large, the only effect size data available at 1 ment assessment, although this finding is based on only one study.
year and beyond were pre-post data, which generally showed that Similar findings emerged for panic, where at follow-up the average
patients tended to improve even beyond their initial termination patient continued to panic 0.69 (SD = 1.40) times per week. In the
posttreatment levels, suggesting continued treatment effects post- single study providing these data on GAD, the average patient
treatment or spontaneous remission over time. continued to score 12.0 on the HARS.
With respect to percent improved, a crucial distinction in With respect to percent seeking additional treatment, slightly
follow-up studies is between percent improved and percent re- under one third of studies provided data on this variable between
mained improved. The former capitalizes on chance factors and termination and 12-18-month assessment (n = 11; 29%) and,
naturally occurring symptom fluctuation (i.e., treatment failures when they did, the data were often not systematically gathered
can appear to become treatment successes over time in disorders (e.g., unstructured telephone contact). For depression, of those
that naturally remit or fluctuate over several months to a year), studies providing relevant data, 28% of patients reported receiving
whereas the latter does not. Thus, we focus here on the stability of further treatment. For panic and GAD, the rates were 35% and
treatment effects; namely, on how likely patients who improved 45%, respectively. On average, then, 36% of all treatment com-
with treatment were to remain improved (what we called sustained pleters engaged in some form of additional psychosocial therapy
efficacy). Of the 11 studies reporting on 12-18-month follow-up, 5 within 12-18 months.
(45.0%) provided data on the percent of patients who were im- Two years. At 2 years and beyond, data are almost nonexist-
proved or recovered at termination and remained improved. For ent. Hence, most of the data we report here are from single studies.
depression, 36.6% of patients who completed remained improved, With respect to effect size, no comparisons between active and
and 28.5% of the intent-to-treat sample got better and stayed control groups were available.
better. Unfortunately, none of the studies for GAD and only 1 With respect to percent improved, the data reported here are
study for panic provided information on whether the same partic- based on single studies for both depression and GAD and on two
ipants remained improved. The single panic study produced im- studies for panic. Once again, we focus here on the percent who
pressive results: 86% of the completer sample and 73.0% of the remained improved. Of the four studies reporting on 24+-month
intent-to-treat sample improved and maintained their treatment follow-up, two (50%) provided data on the percent of patients who
gains between termination and 12-18-month follow-up. were recovered at termination and remained improved. For depres-
With respect to posttreatment symptomatology, depressed pa- sion, 38% of patients who completed treatment remained im-
tients at 12-18 months averaged scores of 7.79 (SD = 8.18) on the proved. Stated differently, 27% of those who initially entered
MULTIDIMENSIONAL META-ANALYSIS 883

Table3
Meta-Analysis of Outcome at 24+-Month Follo\v-Up

Depression Panic Generalized anxiety

Outcome-related variable M SD Mdn Study n M SD Mdn Study n M SD Mdn Study n

Improved (%)
Of completers 74.0 71.0 5.1 71.0 2 32.0 1
Of intent-to-treat sample 54.6 50.9 0.9 50.9 2 24.0 1
Of those screened 27.4
Remained improved (%)
Of completers 37.5 54.0 1
Of intent-to-treat sample 27.4 46.4 1
Of those screened 7.6
Effect size
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Treatment (follow-up) vs. control


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Pretreatment vs. posttreatment 1.9 1 2.1 2.5 2.1 2


Mean symptoms at follow-up
Depressed patients
HRSD
BDI 8.8 7.9 1
Panic patients
Panics/week 0.9 1.6 1
No. of panic symptoms
GAD patients
HARS 13.0 1
STAI-T
% of patients seeking additional
treatment
Patients 49.5 10.6 49.5 2 49.0 49.0 2

Note. HRSD = Hamilton Rating Scale for Depression; BDI = Beck Depression Inventory; GAD = generalized anxiety disorder, HARS = Hamilton
Anxiety Rating Scale; STAI-T = State-Trait Anxiety Inventory—Trait version.

treatment (and 8% of those who were originally screened) im- criteria is maximization of diagnostic homogeneity. The parallel
proved and remained improved 2 years later. These data are com- pitfall is diminished external validity. When researchers exclude
parable to the 18-month follow-up data from the NIMH TDCRP. 70% of depressed patients from efficacy studies, they cannot
For panic, the percentages are, again, much more impressive: legitimately generalize to any but a minority of depressed pa-
Fifty-four percent of completers and 46% of the intent-to-treat tients—unless, at the very least, other data suggest that the re-
sample remained improved. For GAD, we could not locate any maining 70% of patients are likely to respond similarly, in which
study that provided relevant data. case exclusion would have been unnecessary.
With respect to posttreatment symptomatology, all results re- We consider these analyses preliminary. However, the meta-
ported here are based on single studies. Depressed patients at 24+ analytic data provide preliminary evidence that researchers may
months averaged 8.80 (SD = 7.90) on the BDI, panic patients have been wise to exclude patients in several of these studies on
continued to panic 0.87 (SD = 1.55) times per week, and no data the basis of concerns about treatment efficacy in polysymptomatic
were available for GAD.
patients.13 We report statistical significance here conservatively,
Finally, with respect to the percent seeking additional treatment,
considering studies, not patients, as sampling units. This means,
of the four studies reporting on extended follow-up, four studies
with sample sizes typically of less than 10 studies for each anal-
reported whether patients sought additional treatment between
ysis, that we have minimal power to detect significant findings.14
termination and 2 years after treatment. Roughly half of the pa-
However, the findings are highly suggestive and form a clear
tients received further treatment, an increase beyond the 28%
pattern, and many of the effect sizes are large by Cohen's (1988)
found to have done the same by 12-18 months.
standards.

Relation Between Exclusion Rates and Outcome


13
We are currently embarking on a set of multidimensional meta-
One of the advantages of meta-analytic techniques is that they
analyses of psychotherapies and pharmacotherapies for a range of disor-
allow researchers to uncover mediators and moderators that might
ders, using broader inclusion criteria for studies (e.g., inclusion of studies
not be apparent from single studies. Of particular importance to from the 1980s), to try to discern the disorders for which these findings do
assessing the generalizability of the findings of these studies is the and do not apply. We have recently found, for example, that outcome is
correlation between the percentage of patients who improved with uncorrelated with exclusion criteria in studies of bulimia.
treatment and the percentage of patients excluded in each study. 14
We report significance values here for any findings that reached
The experimental wisdom of stringent inclusion and exclusion conventional significance levels. All other analyses produced ps > .05.
884 WESTEN AND MORRISON

Two findings were consistent across all three samples: a sys- supporting the utility of these treatments in reducing various forms
tematic relation between the percent of patients excluded from of psychological distress. The limited data available suggest, how-
studies and (a) the percentage of patients who improved and (b) the ever, that the majority of patients do not show sustained improve-
percent who sought additional treatment. For simplicity, we report ment over 1 to 2 years, particularly for generalized affect states
here only the findings collapsing across all three diagnoses (that is, (depression and GAD). We may do well to use terms such as
including studies of all three disorders in each analysis, standard- empirically supported in more qualified ways, particularly in re-
izing the data within each of the three samples, and then aggre- view articles, acknowledging limitations of samples, time frames,
gating across samples where appropriate). The correlation between and ways of indexing outcome that can lead to different
percent excluded and percent improved at termination was r(14) = conclusions.
.41 (where the sampling unit, and, hence, the degrees of freedom,
refers to studies rather than patients). Thus, the more patients
excluded in a given study, the higher the percent of patients who Exclusion Rates and External Validity
showed improvement. Too few studies were available at 12-18
The greatest impediment to generalizing from these studies to
months and 2 years to provide reliable data (because many studies
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

clinical practice is the high exclusion rate from clinical trials for all
This document is copyrighted by the American Psychological Association or one of its allied publishers.

did not report inclusion rates).


three disorders. Many of the exclusion criteria used in the studies
The correlation between percent excluded and percent seeking
described here, such as the exclusion of patients with bipolar
additional treatment at any point beyond 12 months was /t8) =
disorder or psychotic disorders in studies of treatments for unipolar
—.40. Thus, the more patients excluded from a study, the fewer
depression, are appropriate and do no; jeopardize external validity.
who subsequently sought further treatment. No other variables
In the average study, however, two thirds of patients who present
(notably effect size) demonstrated a systematic relation to exclu-
for treatment with symptoms of the disorder are excluded, and the
sion rates across all three disorders.
more patients excluded and the more stringent the exclusion cri-
Because exclusion rates can reflect many factors, in a secondary
teria, the more successful the treatment. For clinicians who cannot
analysis, we simply counted the exclusion criteria for each study as
pick and choose their patients, the applicability of these findings to
described in the methods section of each article and correlated clinical practice is largely unknown. Parallel findings obtained
number of exclusion criteria with various measures of outcome.15
with a completely different methodology have recently emerged
In this analysis, the correlation with percent improved largely
with respect to efficacy trials for treatment of alcohol related
disappeared, but the correlation with percent seeking additional
disorders: Humphreys and Weisner (2000) applied the prototypical
treatment doubled, to r(8) = — .81, p < .02. In addition, using this
exclusion criteria in alcohol treatment studies to two large com-
rough index of stringency of exclusion criteria, we identified a
munity samples and found that typical exclusion criteria result in
number of other highly suggestive findings, which, once again, we
unrepresentative samples more heavily composed of White, stable,
consider only preliminary. For all three disorders, number of
higher functioning patients with less-substantial comorbidity.
exclusion criteria predicted both pre- versus post-effect size and
Although many exclusion criteria are scientifically and ethically
absolute levels of pathology as measured by instruments such as
appropriate, it is important to distinguish those that limit external
the BDI, usually at both termination and follow-up. For example,
validity. Here we note three.
in studies of depression, number of exclusion criteria predicted
First, exclusion of patients with particular forms of co-occurring
pre-post effect size at r(8) = .71, p = .05. The correlation between
disorders presents a serious challenge to external validity if (a)
stringency of exclusion criteria and BDI and HRSD scores at
these comorbidities are common in clinical practice or (b) they
termination was r(9) = -.63 and r(6) = -.41, respectively. In
affect treatment response, course, length, and so on. Existing data
other words, the more stringent the exclusion criteria, the healthier
suggest that comorbidity is common, for instance, between depres-
the patients at the end of treatment. For panic, the correlation
sion and many Axis I and Axis II disorders and that this can, in
between number of exclusion criteria and pre-post effect size was
fact, affect treatment response. Research using both community
r(14) = .48, p = .08. Exclusion criteria predicted mean number of
and clinical samples has shown that most individuals with one
panic attacks at 12-18-month follow-up, r(5) = —.53. Using the
Axis I disorder have at least one other Axis I or Axis II condition
combination of (a) percent of patients excluded and (b) number of
(e.g., Kessler et al., 1996; Oldham et al., 1995; Shea, Widiger, &
exclusion criteria to predict percent improved and pre-post effect
Klein, 1992). By definition (because of the need for standardiza-
size using simultaneous multiple regression, we were able to
tion of treatment focused on alleviating symptoms of a specific,
generate multiple Rs in the range of .50 to .75.
usually Axis I, symptom), comorbid conditions are outside the
scope of ESTs, except insofar as treatment of one condition can
Discussion have secondary effects on other symptoms, which it often does.
Recent research suggests that the presence of multiple symp-
We believe that an appropriately conservative, scientific attitude
toms can have a substantial impact on both outcome in randomized
toward these findings leads to the following general conclusion.
controlled trials of ESTs and on the nature of treatment in every-
The average EST for the disorders we examined leads to substan-
day practice. For example, Frank et al. (2000) have recently shown
tial initial improvement in pathological states for roughly half of
the patients who pass a series of rigorous inclusion and exclusion
criteria that vary substantially across studies. The average patient 15
A table listing inclusion and exclusion criteria for each study is
in these studies who receives an active treatment is substantially available from Drew Westen. Limitations of space prevented their publi-
better off than the average control patient at the end of treatment, cation here.
MULTIDIMENSIONAL META-ANALYSIS 885

that depressed patients with a lifetime history of panic- quently drawn from this body of literature—for example, that CBT
agoraphobia spectrum symptoms fare worse in clinical trials of and IPT are the treatment of choice for depression and that other
both psychotherapy and medication (selective serotonin reuptake treatments should not be practiced, taught, or reimbursed—with
inhibitors), and many researchers (e.g., Brown, Chorpita, & Bar- another equally inaccurate, unbalanced conclusion—that these
low, 1998; Zimmerman, McDermut, & Mattia, 2000) have shown treatments are not useful. If researchers have excluded patients
that the overlap of anxiety and mood disorders is very high, if not with substantial comorbidities and subclinical disturbances, it is
the norm. A naturalistic study of patients in treatment with expe- not surprising that they have excluded 70% of patients who were
rienced clinicians for clinically significant depressive, panic, or referred by clinicians, self-referred, or passed initial phone screens,
other anxiety symptoms found that clinicians of all theoretical given the rates in the clinical population of subclinical pathology
orientations report treating the vast majority of patients for multi- and comorbid conditions. The highly variable exclusion criteria
ple problems; that the presence of Axis I and Axis II pathology used in these studies, however, make it difficult to know precisely
tends to double treatment length; and that the presence of subclin- who the population is to which many of these findings generalize
ical personality pathology (such as problems with self-esteem, and, within that population, who the 25% to 45% of those who
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

constriction, inhibition, attachment, etc.), which was virtually enter treatment are who are likely to show a sustained response
This document is copyrighted by the American Psychological Association or one of its allied publishers.

ubiquitous in everyday practice, has a substantial impact on treat- over 2 years. Identifying these subgroups should be a primary goal
ment length as well (Morrison & Westen, 2000). Given that not all of future efficacy research.
studies have found comorbidity to influence outcomes, future
research clearly needs to begin to distinguish the disorders and
treatments for which comorbidity does and does not affect out- Initial Response
come and, hence, those for which it may threaten external validity.
Second, a focus on patients who meet full criteria for Axis I In terms of initial response, compared with appropriate placebo
disorders such as major depression and GAD, and the exclusion of control conditions, the studies summarized here demonstrate
patients whose pathology is subclinical using current standards, is moderate-to-strong effect sizes: medians of 0.3 for depression, 0.8
scientifically sensible because it minimizes diagnostic variation for panic, and 0.9 for GAD. These effect sizes are clinically
and ensures that patients included in randomized controlled trials meaningful and comparable to those found over the past 2 decades
do in fact share the syndrome under investigation. However, of psychotherapy research (Smith & Glass, 1977; Wampold et al.,
internal validity often comes at a cost to external validity. Some- 1997).
where on the order of 50% of patients treated in everyday practice With respect to percent improved, the studies document consid-
for mood, anxiety, personality, and many other psychiatric dis- erable variability in initial response for patients with all three
orders appear to have subclinical pathology (e.g., Westen & disorders. Roughly half of patients who complete treatment expe-
Arkowitz-Westen, 1998; Zinbarg, Barlow, Liebowitz, & Street, rience initial clinical improvement, and of those who enter treat-
1994). It may be that these patients are easier to treat than patients ment (intent-to-treat group), approximately 40% will gain from it.
who meet full criteria, in which case the findings reported here The percentages are higher for patients treated for panic than for
may represent an underestimate of efficacy. At this point, however, the other two disorders studied.
we might do well to base the next generation of efficacy studies on With respect to posttreatment symptomatology, the data suggest
data showing the kinds of patients who present for treatment in that in these treatments patients can expect a significant reduction
clinical practice rather than on patients who meet DSM-IV cutoffs. in mean levels of symptomatology, which is clearly clinically
Third, we would have liked to report the percent of patients meaningful. However, particularly for depression and GAD, the
excluded because they had bipolar disorder, had psychotic disor- average patient will maintain a mild but clinically significant level
ders, did not meet DSM-III-R (3rd ed., rev.; American Psychiatric of symptoms after treatment, and the high standard deviations on
Association, 1987) or DSM-IV criteria for the target disorder, were all measures (and data on percent improved) suggest that a sub-
excluded before or after an initial telephone screen, and so forth. stantial number of patients will continue to be highly symptomatic.
Unfortunately, those data, which are crucial for evaluating external An appropriately positive rendering of these findings is that we
validity, are rarely published. have many efficacious treatments for reducing the severity of
It is important to note that most studies do not eliminate all depressed and anxious states. The average patient will experience
comorbid conditions, and the trend, exemplified by the increasing a substantial reduction in symptomatology, and roughly half of
emphasis on testing laboratory-based treatments in the community, those who complete these treatments will benefit significantly
is toward greater inclusion. Further, recent research taking some of from them. These are impressive and important findings.
these treatments into the community has produced some very Another way of describing the data, however, is less encourag-
promising early findings. One group of investigators (Stuart, Treat, ing: Roughly 40% of patients from an unknown subpopulation
& Wade, 2000) has replicated success rates for CBT for panic in who undertake these treatments will receive help, and the average
a community mental health center (although with nonblind assess- patient who completes them will remain clinically depressed or
ment at follow-up); another found that patients treated with expo- anxious. For depression and GAD, many patients will experience
sure and response prevention for obsessive—compulsive disorder improvement, but few will become asymptomatic. Whether return
who were excluded from randomized controlled trials fared as well to healthy functioning would require longer treatments, sustained
as patients included in research trials (Franklin, Abramowitz, relapse prevention efforts (e.g., periodic booster sessions), or other
Kozak, Levitt, & Foa, 2000). These are very encouraging findings. or supplemental forms of treatment that address what appear to be
Our aim is not to replace one inaccurate set of conclusions fre- some relatively durable diatheses is unknown.
886 WESTEN AND MORRISON

Sustained Efficacy Percent of patients seeking additional treatment, particularly


within months of terminating treatment, is a crucial variable bear-
In terms of sustained efficacy—the ability of these treatments to ing on genuine or sustained efficacy. Somewhere between one
produce lasting symptomatic changes rather than solely an initial quarter and one half of patients treated for these 'disorders seek
response—perhaps the most striking finding of these three meta- further treatment within 12-18 months, and roughly half seek
analyses is the paucity of follow-up data at 12-18 months (nine further treatment by 2 years. (Unfortunately, few studies reported
studies in as many years) and the virtual nonexistence of follow-up whether the patients who sought additional treatment sought it
data at 2 years or longer (four studies across three disorders over 9 from the investigators and, hence, found the treatment useful, or
years).16 The major study comparing experimental and placebo- whether they sought it elsewhere.) The negative correlation be-
control groups at 12-18 months (the NIMH TDCRP) found no tween exclusion rates and percent seeking additional treatment
differences in depressive symptoms. Pre-post effect size data across disorders, and the extraordinarily high negative correlation
suggest that some unspecified combination of variables leads to between number of exclusion criteria and percent seeking treat-
symptomatic improvement over time. Unfortunately, to our knowl- ment (r < —.80), suggest that for the substantial portion of the
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

edge, no one has systematically compared these pre-post effects population of patients excluded from many of these studies, these
This document is copyrighted by the American Psychological Association or one of its allied publishers.

with data on the natural course of these illnesses left untreated, so treatments may not provide effective enough relief to prevent them
no conclusions can be drawn about treatment impact at 1 to 2 from seeking further treatment within 1 to 2 years.
years.
With respect to percent improved, the findings suggest differ- Limitations and Implications
ences across the three disorders. The most positive rendering of the
The meta-analyses reported here have two primary limitations.
data on depression—namely, the number of patients who improved
First, to maximize the quality of studies (and to limit the magni-
and remained improved of those who completed the treatment—is
tude of the task), we included only studies published in high-
in the range of 36% to 38% at both 12-18 months and 2 years.
quality, high-profile journals, limiting the study to the decade of
Including patients who began the treatment but did not complete it,
the 1990s. The list of journals was selected a priori and was based
however, drops the improvement rate at 2 years to 27%. Thus,
on methodological quality as well as on an initial literature search
roughly one fourth of carefully screened patients with major de-
to determine the journals with the highest volume of outcome
pression, who are not suicidal and do not abuse alcohol or other
studies, so that we could be assured of missing few methodolog-
drugs, can expect to improve and to remain improved 2 years later
ically solid studies published during that period. Within these
if they embark on a brief course of manualized treatment for constraints, we were liberal with respect to inclusion. For example,
depression. By any standards, it is difficult to construe these data we included studies with no control group, for two reasons: be-
as evidence for the hypothesis that these treatments show genuine cause many such studies reported data on percent improved, which
efficacy for the treatment of depressive disorders (which recur more carefully controlled studies often did not, and because for
with troubling regularity after any index episode) as opposed to established treatments researchers often included only comparison
depressed states. groups because of ethical concerns about assigning some patients
For panic disorder, the data are much more positive. These to inert conditions. Nevertheless, our decision to begin in the
findings accord with those of others who have reviewed the 1990s rather than the 1970s or 1980s, and to include only journals
relevant literatures and concluded that the evidence for the efficacy to which researchers tend to send their best studies, could have led
of psychosocial (particularly cognitive-behavioral) treatments of to unrecognized biases, which should be examined in future re-
anxiety disorders is higher than for other disorders (Roth & Fon- search. A more inclusive meta-analysis of controlled trials of
agy, 1996). Roughly half of patients who complete treatment can psychotherapy for bulimia recently completed in our laboratory,
expect to improve and to remain improved at 2 years. Of those who which imposed no restrictions on journal quality or year of pub-
enter treatment, improvement rates remain a hefty 46%. These are lication, is producing very similar findings.
impressive results for a brief, cost-effective treatment of a syn- Second, we do not believe we are immune to investigator
drome that is often chronic and highly resistant to extinction. A allegiance effects. One of the major lessons of this study for us was
more recent study found, further, that cognitive-behavioral treat- just how much subjectivity routinely enters meta-analytic research,
ment for panic helped prevent relapse and recurrence of panic such as choice of studies to include or disqualify, choice of data to
following discontinuation of alprazolam several years after treat- include or exclude within studies (e.g., when the investigators
ment (Bruce, Spiegel, & Hegel, 1999). As noted by Brown and report complementary analyses with different sample sizes in
Barlow (1995), however, more conservative measures of end-state different tables), and choice of variables to study or not to study.
functioning produce results more comparable to those reported The problem of remaining blind in meta-analyses has not, we
here of the other two disorders. This suggests the need for contin-
ued treatment development to address functional impairments 16
The major exception is the work of Frank and colleagues (Frank et al.,
common in panic patients that may be related to panic, to comorbid
1990; Frank & Kupfer, 1993), who have studied 3-year maintenance trials
conditions common in panic patients, or to personality variables
to try to determine what kinds of longer term relapse prevention procedures
associated with panic. might be efficacious. One or two other studies not included in our meta-
For GAD, the available data suggest that ESTs can produce an analyses examined 2-year follow-up data but focused on special popula-
initial response, but data on sustained recovery are largely tions, such as the elderly, and hence were not included (Gallagher-
unavailable. Thompson, Hanley-Peterson, & Thompson, 1990).
MULTIDIMENSIONAL META-ANALYSIS 887

believe, been adequately addressed in the psychotherapy literature, that most studies share particular characteristics of design or
because investigators' biases can subtly influence all of these reporting, or show negative findings in certain key areas that have
decisions. Indeed, our own current research on emotional con- not been previously emphasized or aggregated meta-analytically
straints on decision making (Westen & Arkowitz, 2001) made us (e.g., exclusion rates, mean posttreatment symptomatology, or
suspect of our own such decisions and led us to make decisions follow-up data at longer intervals), then one may have reason to
wherever possible before collecting the data or before examining question what has often been taken as the central tendency of these
the data of a particular study (or, when this was not possible, to studies, namely, that certain treatments can now be described with
select the data from a given study that gave its authors the benefit confidence as the treatment of choice, particularly for depression
of the doubt). As Abelson (1995) has argued, the aim of using and GAD.
statistics is not simply to lay out "the facts" but to make a Two recent studies provide models for collecting and reporting
principled argument. Although we endeavored to keep both our data that permit independent assessment of their clinical utility. In
methods and our arguments principled, we recognize that counter- a study of CBT for panic, Brown and Barlow (1995) assessed and
vailing "principles" may have guided our judgments when our reported data on variables crucial for assessing the sustained
conscious principles were not engaged; hence, we have included efficacy of treatments for panic, such as percent of patients seeking
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

the data for individual studies so that researchers can indepen- additional help, data on whether the patients who remain improved
dently evaluate our judgments. at 12 months are the same as those who remain improved at 24
Within the context of these limitations, we would point to three months, measures of end-state functioning, and measures of gen-
implications for future research. First, investigators should rou- eral levels of adaptation and other Axis I pathology. In another
tinely report the range of efficacy estimates described here and study, the investigators included a figure, in the form of a tree with
report data and methodological details necessary for evaluating branches, which shows precisely how many patients are "lost" at
external as well as internal validity. The majority of the highest each step of the way (Barlow, Gorman, Shear, & Woods, 2000),
quality research reports in the period we studied did not include all which we believe should become standard practice in all articles
or most of the information that would allow consumers of this reporting efficacy data. Such figures allow readers to see the full
research to judge the meaningfulness and generalizability of the picture, essentially presenting the data necessary for the reader to
findings, such as numbers of patients excluded for various reasons, calculate improvement or recovery using almost any appropriate
reliability of diagnosis of both pre- and post-interviews, effect size numerator and denominator.
estimates on both completer and intent-to-treat samples, percent of A second implication concerns the nature of treatments to be
patients improved or recovered (and how and when criteria for tested in future efficacy research. The data on depression, and to a
improvement and recovery were selected), data on comorbidity lesser degree (because of the absence of data at clinically mean-
and effect sizes for patients with and without key comorbidities ingful follow-up intervals) on GAD, do not strike us as encourag-
(even if sample size does not permit significance testing), data on ing, especially for treatments that have undergone 20 years of
follow-up at clinically meaningful intervals (or reasons the re- testing and empirical refinement. When compared with the results
searchers chose not to follow up large-N studies beyond 6-to-12 of treatments for specific anxiety conditions (such as panic and
months), percent of patients seeking additional treatment (includ- simple phobia) and with what most consumers would reasonably
ing reasons for seeking further treatment, kinds of treatment expect is meant by "empirically supported," sustained efficacy
sought, and whether patients sought treatment or referrals from the rates of 25%-30% over 12-24 months are poor by almost any
investigators), and percent of patients who remain improved at standards and suggest that we should begin testing different treat-
follow-up. Meta-analysts and authors of review articles similarly ments for these disorders.
should routinely report such data rather than solely reporting effect Whether the problem is one of inadequate treatment duration,
sizes. inadequate theory or technique, or the intractability of the prob-
In many respects, we are calling attention to criteria for con- lems with current technologies is unclear. However, suggestive
ducting efficacy research that have already been outlined in the data come from a number of sources. That patients who complete
literature (see, in particular, Kendall, 1999). We are suggesting, brief treatments aimed at addressing generalized mood pathology
however, that the conclusions often drawn from this body of would continue to show subclinical pathology, and that they would
research need to be reexamined because of the extent to which be highly vulnerable to relapse, makes considerable sense in light
research has diverged from these criteria. An analogy to a central of data on the polysymptomatic nature of patients in clinical
issue in construct validation is of relevance here, namely Campbell practice. Most treatments for depression in naturalistic samples
and Fiske's (1959) concept of the multitrainnultimethod matrix. unconstrained by managed care take roughly half a year for CBT
The construct validity of a measure is not threatened by a critical and upward of 1 to 2 years for other forms of therapy, and as noted
review of 35 studies when the reviewer looks for flaws in each above, treatment length doubles across therapeutic modalities in
study and inevitably finds them, as any competent graduate student the presence of comorbid conditions (Morrison & Westen, 2000).
could do. Every study is flawed, and if the errors are randomly The data from effectiveness studies similarly show that 3 to 4
distributed—that is, if the 35 studies do not share the same flaw— months of treatment rarely lead to improvement in more than 50%
construct validity is not seriously threatened. If, on the other of patients who complete treatment and that longer treatments are
hand, 31 of the studies rely on self-reports as criterion variables associated with better outcomes (Kopta et al., 1994; Seligman,
and the studies that rely on other methods do not yield similar data, 1995).
construct validity is in fact undermined. The same point can be The lack of sustained efficacy of treatments for depression and
made with respect to the studies we examined here: To the extent GAD also makes sense in light of developments in the cognitive
888 WESTEN AND MORRISON

neurosciences and social psychology demonstrating the impor- selected therapy hours, which would be independently coded to
tance of distinguishing between implicit and explicit cognitive, assess therapeutic interventions and process (see also Borkovec &
affective, and motivational processes that have different correlates Castonguay, 1998).
and may require different forms of intervention (Westen, 1998; Aside from simply asking whether one treatment works better
1999; Westen, Feit, & Zittel, 1999). Suggestive data relevant to than another, this design would allow researchers to discover
treatment come from studies using implicit measures of depres- which features of actual treatments (intervention variables) are
sion, such as emotional Stroop tasks (Williams, Mathews, & associated with outcome and to see what intervention strategies
MacLeod, 1996) or dream content (A. T. Beck, 1976), which often appear to work with what kinds of patients (intervention-outcome
find continued attentional biases toward depressive words and correlations), using variables such as severity, comorbidity, and
thematic content among remitted depressives. This suggests that personality as moderators. Instead of requiring individual investi-
changes in state may or may not be accompanied by changes in gators to predict a priori which treatments are most likely to work
diatheses for those states encoded in implicit networks. The hy- and, hence, worthy of a 10-year program of research, this approach
pothesized existence of implicit cognitive, affective, and motiva- allows us to determine empirically which interventions within the
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

tional processes (such as the activation of affect-laden networks of range of treatments practiced in the community are associated with
This document is copyrighted by the American Psychological Association or one of its allied publishers.

association outside of awareness) is precisely what motivated the success with particular kinds of patients (within, of course, the
development of longer term treatments a century ago.17 constraints provided by correlational data). We can then focus our
A third implication, germane to recent calls for a closer link experimental sights on the interventions that are most likely to pay
between research and practice (Goldfried & Wolfe, 1998; National off, as well as on promising experimentally derived interventions
Advisory Mental Health Council, 1999; Street, Niederehe, & Le- by researchers who have the benefit of thorough knowledge of the
bowitz, 2000), pertains to the utility of supplementing efficacy empirical literature.
studies (controlled clinical trials) with effectiveness studies, which The reality is that we do not know whether the small number of
have different sources of systematic error, and of distinguishing treatments tested in the laboratory fare better or worse in actual
two ways effectiveness research can be implemented. In one clinical practice than many of the interventions currently used by
approach, which might be called Type I effectiveness studies, clinicians—or, more importantly, than the interventions used by
researchers begin by developing experimental treatments through the subset of clinicians who, empirically, produce the best results,
controlled clinical trials and, ultimately, test them on larger, more if we were to find out who those clinicians are. We are thus
generalizable naturalistic samples. This is the primary way effec- suggesting that we expand the range of science to include clinical
tiveness has been interpreted and is guiding large-scale multisite practice and the selection of treatments to test and that we use
effectiveness studies now in progress. multiple research designs to try to converge on the most accurate
An alternative approach, which might be called Type II effec- conclusions (see Borkovec & Castonguay, 1998; Goldfried &
tiveness studies, begins with treatments "designed" by clinicians, Wolfe, 1998).
whose knowledge of research design may be limited but whose The conflict between the "two disciplines of scientific psychology"
intervention strategies are likely to have evolved considerably (Cronbach, 1957)—experimental and correlational research—is an
through basic operant and social learning processes as well as old one in the history of our field. The extent to which, and at what
theory and training. This approach acknowledges that those of us point in a program or domain of research, researchers should sacrifice
who do research necessarily have to limit our clinical hours and, external validity for experimental control, or sacrifice some control
hence, cannot extensively test potential innovations clinically be- over variables for external validity, is a philosophical and subjective
fore committing research efforts to them. Thus, this approach decision. There is no single, scientifically correct answer to it. When
involves the creation of practice research networks consisting of internal and external validity sharply collide, as they have in psycho-
large numbers of clinicians in the community. The aim is to therapy research, the best strategy is to avoid erring consistently on
capitalize on natural variation in clinical practice and therapeutic one side or the other.
outcome to help develop prototypes of successful treatments that
can help us determine which interventions to test experimentally.
Using this approach, a researcher might enlist the participation
17
of a randomly selected sample of doctoral-level clinicians to We suspect that future treatments may benefit from integration of
recruit the next patient who presents for treatment of clinically CBT and IPT techniques tested in the laboratory aimed at pulling people
significant depression, regardless of comorbidities or other pre- out of depressed or anxious states and from the focus on changing implicit
associational networks characteristic of longer term treatments that have
senting problems. Alternatively, a researcher might select clini-
not received empirical attention, but at this point, this is largely speculative.
cians nominated by peers from their own therapeutic orientation
(e.g., psychodynamic and CBT) as expert clinicians, to maximize
their chance of studying the best interventions available from each References
of the major treatments in wide use in clinical practice. Patients,
clinicians, and independent assessors would then provide periodic References marked with an asterisk are studies included in the meta-
data on a range of measures bearing on outcome over a period of analyses.
several years, including not only self-reported symptoms but im- Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ:
plicit measures of the same constructs, measures of personality, Erlbaum.
measures of other symptoms and disorders, and measures of adap- Agras, W. S., Crow, S. J., Halmi, K. A., Mitchell, J. E., Wilson, G. T., &
tive functioning. In addition, clinicians would audiotape randomly Kraemer, H. C. (2000). Outcome predictors for the cognitive behavior
MULTIDIMENSIONAL META-ANALYSIS 889

treatment of bulimia nervosa: Data from a multisite study. American Bruce, T. J., Spiegel, D. A., & Hegel, M. T. (1999). Cognitive-behavioral
Journal of Psychiatry, 157, 1302-1308. therapy helps prevent relapse and recurrence of panic disorder following
American Psychiatric Association. (1987). Diagnostic and statistical man- aiprazolam discontinuation: A long-term follow-up of the Peoria and
ual of mental disorders (3rd ed., rev.). Washington, DC: Author. Dartmouth studies. Journal of Consulting and Clinical Psychology, 67,
American Psychiatric Association. (1994). Diagnostic and statistical man- 151-156.
ual of mental disorders (4th ed.). Washington, DC: Author. *Butler, G., Fennell, M., Robson, P., & Gelder, M. (1991). Comparison of
American Psychological Association Division of Clinical Psychology. behavior therapy and cognitive-behavior therapy in the treatment of
(1995). Training in and dissemination of empirically-validated psycho- generalized anxiety disorder. Journal of Consulting and Clinical Psy-
logical treatments: report and recommendations. Clinical Psychology: chology, 59, 167-175.
Science and Practice, 48, 3-23. Calhoun, K. S., Moras, K., Pilkonis, P. A., & Rehm, L. (1998). Empirically
*Amtz, A., & Van den Hout, M. (19%). Psychological treatments of panic supported treatments: Implications for training. Journal of Consulting
disorder without agoraphobia: Cognitive therapy versus applied relax- and Clinical Psychology, 66, 151-162.
ation. Behavior Research and Therapy, 34, 113-121. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant
Barlow, D., Gorman, J., Shear, M. K., & Woods, S. W. (2000). Cognitive- validation by the multitrait-multimethod matrix. Psychological Bulle-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

behavioral therapy, imipramine, or their combination for panic disorder tin, 56. 81-105.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

A randomized controlled trial. Journal of the American Medical Asso- Chambless, D., & Hollon, S. (1998). Defining empirically supported ther-
ciation, 283, 2529-2536. apies. Journal of Consulting and Clinical Psychology, 66, 7-18.
*BarIow, D., Rapee, R., & Brown, T. (1992). Behavioral treatment of Chambless, D., & Williams, K. (1995). A preliminary study of African
generalized anxiety disorder. Behavior Therapy, 23, 551-570. Americans with agoraphobia: Symptom severity and outcome of treat-
*Beach, D., & O'Leary, K. (1992). Treating depression in the context of ment with in vivo exposure. Behavior Therapy, 26, 501-515.
marital discord: Outcome and predictors of response to marital therapy *Clark, D., Salkovskis, P., Hackman, A., Middleton, H., Anastasiades, P.,
versus cognitive therapy. Behavior Therapy, 23, 507-528. & Gelder, M. (1994). A comparison of cognitive therapy, applied
Beck, A. T. (1976). Cognitive therapy and the emotional disorders. New relaxation and imipramine in the treatment of panic disorder. British
York: International Universities Press. Journal of Psychiatry, 164, 759-769.
*Beck, A. T., Sokol, L., Clark, D., Berchick, R., & Wright, F. (1992). A Cohen, J. (1988). Statistical power analysis for the behavioral sciences
crossover study of focused cognitive therapy for panic disorder. Amer- (2nd ed.). Mahwah, NJ: Erlbaum.
ican Journal of Psychiatry, 149, 778-783. *Cote, G., Gauther, J., Laberge, B., Cormier, H., & Plamondon, J. (1994).
*Beck, J. G., Stanley, M., Baldwin, L., Deagle, E, & Averill, P. (1994). Reduced therapist contact in the cognitive-behavioral treatment of panic
Comparison of cognitive therapy and relaxation training for panic dis- disorder. Behavior Therapy, 25, 123-145.
order. Journal of Consulting and Clinical Psychology, 62, 818-826. Cox, B. J., Swinson, R. P., & Fergus, K. D. (1993). Changes in fear versus
Beck, A., Ward, C., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An avoidance ratings with behavioral treatments for agoraphobia. Behavior
inventory for measuring depression. Archives of General Psychiatry, 4, Therapy, 24, 619-624.
561-571. *Craske, M., Brown, T., & Barlow, D. (1991). Behavioral treatment of
*Beutler, L., Engle, D., Mohr, D., Daldrup, R., Bergan, J., Meredith, K., & panic disorder: A two-year follow-up. Behavior Therapy, 22, 289-304.
Merry, W. (1991). Predictors of differential response to cognitive, ex- *Crits-Christoph, P., Connolly, M., Azarian, K., Crits-Christoph, K., &
periential, and self-directed psychotherapeutic procedures. Journal of Shappell, S. (1996). An open trial of brief supportive-expressive psy-
Consulting and Clinical Psychology, 59, 333-340. chotherapy in the treatment of generalized anxiety disorder. Psychother-
*Borden, J., Clum, G., & Salmon, P. (1991). Mechanisms of change in the apy, 33, 418-429.
treatment of panic. Cognitive Therapy and Research, 15, 257-272. Cronbach, L. (1957). The two disciplines of scientific psychology. Amer-
*Borkovec, T., & Costello, E. (1993). Efficacy of applied relaxation and ican Psychologist, 57, 671-684.
cognitive-behavioral therapy in the treatment of generalized anxiety *De Beurs, E., van Balkom, A., Lange, A., Koele, P., & van Dyck, R.
disorder. Journal of Consulting and Clinical Psychology, 61, 611-619. (1995). Treatment of panic disorder with agoraphobia: Comparison of
Borkovec, T., & Castonguay, L. (1998). What is the scientific meaning of fluvoxamine, placebo, and psychological panic management combined
empirically supported therapy? Journal of Consulting and Clinical Psy- with exposure and of exposure in vivo alone. American Journal of
chology, 66, 136-142. Psychiatry, 152, 683-691.
*Bouchard, S., Gaudier, J., Laberge, B., French, D., Pelletier, M., & DeRubeis, R. J., Evans, M. D., Hollon, S. D., Garvey, M. J., Grove, W., &
Godbout, C. (1996). Exposure versus cognitive restructuring in the Tuason, V. (1990). How does cognitive therapy work? Cognitive change
treatment of panic disorder with agoraphobia. Behavior Research and and symptom change in cognitive therapy and pharmacotherapy for
Therapy, 34, 213-224. depression. Journal of Consulting and Clinical Psychology, 58, 862-
Bowman, D., Scogin, F., Floyd, M., Patton, E., & Gist, L. (1997). Efficacy 869.
of self-examination therapy in the treatment of generalized anxiety DeRubeis, R. J., Gelfand, L., Tang, T., & Simons, A. D. (1999). Medica-
disorder. Journal of Counseling Psychology, 44, 267-273. tions versus cognitive behavior therapy for severely depressed outpa-
Bowman, D., Scogin, F., & Lyrene, B. (1995). The efficacy of self- tients: Meta-analysis of four randomized comparisons. American Jour-
examination therapy and cognitive bibliotherapy in the treatment of mild nal of Psychiatry, 156, 1007-1013.
to moderate depression. Psychotherapy Research, 5, 131-140. *Durham, R., Murphy, T., Allan, T., Richard, K., Trevling, L., & Fenton,
*Brown, T. A., & Barlow, D. (1995). Long-term outcome in cognitive- G. (1994). Cognitive therapy, analytic psychotherapy and anxiety man-
behavioral treatment of panic disorder Clinical predictors and alterna- agement training for generalised anxiety disorder. British Journal of
tive strategies for assessment. Journal of Consulting and Clinical Psy- Psychiatry, 165, 315-323.
chology, 63, 754-765. Elkin, L, Gibbons, R. D., Shea, M. T., Sotsky, S. M., Watkins, A., Pilkonis,
Brown, T. A., Chorpita, B. F., & Barlow, D. H. (1998). Structural rela- P., & Hedeker, D. (1995). Initial severity and differential treatment
tionships among dimensions of the DSM-IV anxiety and mood disorders outcome in the National Institute of Mental Health Treatment of De-
and dimensions of negative affect, positive affect, and autonomic pression Collaborative Research Program. Journal of Consulting and
arousal. Journal of Abnormal Psychology, 107, 179-192. Clinical Psychology, 63, 841-847.
890 WESTEN AND MORRISON

*Emanuels-Zuurveen, L., & Emmelkamp, P. (19%). Individual ing research subjects and its effects on the generalizability of alcohol
behavioural-cognitive therapy vs. marital therapy for depression in mar- treatment outcome studies. American Journal of Psychiatry, 157, 588-
itally distressed couples. British Journal of Psychiatry, 169, 181-188. 594.
Emmelkamp, P. M., Van Dyck, R., Bitter, M., Heins, R., et al. (1992). Ilardi, S., & Craighead, W. (1994). The role of nonspecific factors in
Spouse-aided therapy with agoraphobics. British Journal of Psychiatry, cognitive-behavior therapy for depression. Clinical Psychology: Sci-
160. 51-56. ence and Practice, 1, 138—156.
*Evans, M., Hollon, S., DeRubeis, R., Piasecki, J., Grove, W., Garvey, M., Ingram, R. E., Hayes, A., & Scott, W. (2000). Empirically supported
& Tuason, V. (1992). Differential relapse following cognitive therapy treatments: A critical analysis. In R. Ingram & C. R. Snyder (Eds.),
and pharmacotherapy for depression. Archives of General Psychia- Handbook of psychological change (pp. 40-60). New York: Wiley.
try. 49, 802-808. *Jacobson, N. J., Fruzzetti, A., Dobson, K., Schmaling, K., & Salusky, S.
Fava, G., Zielezny, M., Savron, G., & Grandi, S. (1995). Long-term effects (1991). Marital therapy as treatment for depression. Journal of Consult-
of behavioral treatment for panic disorder with agoraphobia. British ing and Clinical Psychology, 59, 547-557.
Journal of Psychiatry, 166. 87-92. Jacobson, N. )., Roberts, L. J., Berns, S. B., & McGlinchey, J. B. (1999).
Foa, E., Dancu, C., Hembree, E., Jaycox, L., Meadows, E. A., & Street, Methods for defining and determining the clinical significance of treat-
G. P. (1999). A comparison of exposure therapy, stress inoculation ment effects: Description, application, and alternatives. Journal of Con-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

training, and their combination for reducing posttraumatic stress disorder sulting and Clinical Psychology, 67, 300-307.
in female assault victims. Journal of Consulting and Clinical Psychol- Jacobson, N. J., & Truax, P. (1991). Clinical significance: A statistical
ogy, 67, 194-200. approach to defining meaningful change in psychotherapy research.
Frank, E., & Kupfer, D. (1993). Does a placebo tablet affect psychother- Journal of Consulting and Clinical Psychology, 59, 12-19.
apeutic treatment outcome? Results from the Pittsburgh study of main- Judd, L. (1997). The clinical course of unipolar major depressive disorders.
tenance therapies in recurrent depression. Psychotherapy Research, 2, Archives of General Psychiatry, 54, 989-991.
102-111. Keller, M. B., Lavori, P., Mueller, T., & Endicott, J. (1992). Time to
Frank, E., Kupfer, D., Perel, J., Comes, C., Jarrett, D., Mallinger, A., recovery, chronicity, and levels of psychopathology in major depression:
Thase, M., McEachran, A., & Grochocinski, V. (1990). Three-year A 5-year prospective follow-up of 431 subjects. Archives of General
outcomes for maintenance therapies in recurrent depression. Archives of Psychiatry, 49, 809-816.
General Psychiatry, 47, 1093-1099. Kendall, P. C. (1998). Empirically supported psychological therapies.
Frank, E., Shear, M. K., Rucci, P., Cyranowski, J., Endicott, J., Fagiolini, Journal of Consulting and Clinical Psychology, 66, 3-6.
A., Grochocinski, V., Houck, P., Kupfer, D., Maser, J., & Cassano, G. Kendall, P. C. (1999). Therapy outcome research methods. In P. C.
(2000). Influence of panic-agoraphobic spectrum symptoms on treat- Kendall, J. N. Butcher, & G. N. Holmbeck (Eds.), Handbook of research
ment response in patients with recurrent major depression. American methods in clinical psychology (2nd ed., pp. 330-363). New York:
Journal of Psychiatry, 157, 1101-1107. Wiley.
Franklin, M. E., Abramowitz, J. S., Kozak, M. J., Levitt, i. T., & Foa, E. B. Kendall, P. C., Marrs-Garcia, A., Nath, S., & Sheldrick, R. (1999). Nor-
(2000). Effectiveness of exposure and ritual prevention for obsessive- mative comparisons for the evaluation of clinical significance. Journal
compulsive disorder Randomized compared with nonrandomized sam- of Consulting and Clinical Psychology, 67, 285-299.
ples. Journal of Consulting and Clinical Psychology, 68, 594-602. Kendall, P. C., & Sheldrick, R. C. (2000). Normative data for normative
Gallagher-Thompson, D., Hanley-Peterson, P., & Thompson, L. W. comparisons. Journal of Consulting and Clinical Psychology, 68, 767-
(1990). Maintenance of gains versus relapse following brief psychother- 773.
apy for depression. Journal of Consulting and Clinical Psychology, 58, Kessler, R., Nelson, C., McGonagle, K., Liu, M., Swartz, M., & Blazer, D.
371-374. (1996). Comorbidity of DSM-11I-R major depressive disorder in the
Goldfried, M., & Wolfe, B. (1995). Psychotherapy practice and research: general population: Results from the US National Comorbidity Survey.
Repairing a strained alliance. American Psychologist, 51, 1007—1016. British Journal of Psychiatry, 168, 17-30.
Goldfried, M., & Wolfe, B. (1998). Toward a clinically valid approach to *Klosko, J., Barlow, D., Tassinari, R., & Cemy, J. (1990). A comparison
therapy research. Journal of Consulting and Clinical Psychology, 66, of alprazolam and behavior therapy in treatment of panic disorder.
143-150. Journal of Consulting and Clinical Psychology, 58, 77-84.
Haaga, D., & Stiles, W. (2000). Randomized clinical trials in psychother- Kopta, S., Howard, K., Lowry, J., & Beutler, L. (1994). Patterns of
apy research: Methodology, design, and evaluation. In R. Ingram & symptomatic recovery in psychotherapy. Journal of Consulting and
C. R. Snyder (Eds.), Handbook of psychological change (pp. 14-39). Clinical Psychology. 62, 1009-1016.
New York: Wiley. Kopta, S. M., Lueger, R. J., Saunders, S. M., & Howard, K. I. (1999).
Hamilton, M. (1959). The assessment of anxiety states by rating. British Individual psychotherapy outcome and process research: Challenges
Journal of Medical Psychology, 32, 50-55. leading to greater turmoil or a positive transition? Annual Review of
Hamilton, M. (1967). Development of a rating scale for primary depressive Psychology, 50, 441-469.
illness. British Journal of Social and Clinical Psychology, 6, 278-296. Luborsky, L., Diguer, L., Cacciola, J., Barber, J., Moras, K., Schmidt, K.,
Hardy, G. E., Barkham, M., Shapiro, D. A., Reynolds, S., et al. (1995). & DeRubeis, R. (1996). Factors in outcomes of short-term dynamic
Credibility and outcome of cognitive-behavioural and psychodynamic- psychotherapy for chronic vs. nonchronic major depression. Journal of
interpersonal therapy. British Journal of Clinical Psychology, 34, 555- Psychotherapy Practice and Research, 5, 152-159.
569. Luborsky, L., Diguer, L., Seligman, D. A., Rosenthal, R., Krause, E. D.,
*Hollon, S., DeRubeis, R., Evans, M., Weimer, M., Garvey, M., Grove, Johnson, S., Halperin, G., Bishop, M., Berman, J. S., & Schweizer, E.
W., & Tuason, V. (1992). Cognitive therapy and pharmacotherapy for (1999). The researcher's own therapy allegiances: A "wild card" in
depression. Archives of General Psychiatry, 49, 774-781. comparisons of treatment efficacy. Clinical Psychology: Science and
Howard, K. I., Lueger, R., Maling, M., & Martinovich, Z. (1993). A phase Practice, 6, 95-106.
model of psychotherapy: Causal mediation of outcome. Journal of *Marks, I., Swinson, R., Basoglu, M., Kuch, K., Noshirvani, H.,
Consulting and Clinical Psychology, 54, 106-110. O'Sullivan, G., Lelliott, P., Kirby, M., McNamee, G., Sengun, S., &
Humphreys, K., & Weisner, C. (2000). Use of exclusion criteria in select- Wickwire, K. (1993). Alprazolam and exposure alone and combined in
MULTIDIMENSIONAL META-ANALYSIS 891

panic disorder with agoraphobia. British Journal of Psychiatry, 162, S., Perel, J., Lave, J., Houck, P., & Coulehan, J. (1996). Treating major
776-787. depression in primary care practice. Archives of General Psychiatry, 53,
*McLean, P., & Hakstian, A. (1990). Relative endurance of unipolar 913-917.
depression treatment effects: Longitudinal follow-up. Journal of Con- Seligrnan, M. (1995). The effectiveness of psychotherapy. American Psy-
sulting and Clinical Psychology. 58, 482-488. chologist, 50, 965-974.
*Michelson, L., Marchione, K., Greenwald, M., Glanz, L., Testa, S., & *Shapiro, D., Barkham, M., Rees, A., Hardy, G., Reynolds, S., & Startup,
Marchione, N. (1990). Panic disorder Cognitive-behavioral treatment. M. (1994). Effects of treatment duration and severity of depression on
Behavior Research and Therapy, 28, 141-151. the effectiveness of cognitive-behavioral and psychodynamic-
Morrison, K., & Westen, D. (2000). The external validity of psychotherapy interpersona! psychotherapy. Journal of Consulting and Clinical Psy-
trials: An empirical investigation. Unpublished manuscript, Boston Uni- chology, 62, 522-534.
versity. *Shapiro, D., & Firth-Cozens, J. (1990). Two-year follow-up of the Shef-
Mueller, T. L, Leon, A. C., Keller, M. B., Solomon, D. A., Endicott, J., field psychotherapy project. British Journal of Psychiatry, 157, 389-
Coryell, W., Warshaw, M., & Maser, J. D. (1999). Recurrence after 391.
recovery from major depressive disorder during 15 years of observa- *Shapiro, D., Rees, A., Barkham, M., Hardy, G., Reynolds, S., & Startup,
tional follow-up. American Journal of Psychiatry, 156, 1000-1006. M. (1995). Effects of treatment duration and severity of depression on
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Nathan, P. (1998). Practice guidelines: Not yet ideal. American Psychol- the maintenance of gains after cognitive—behavioral and psycho-
ogist, 53, 290-299. dynamic-interpersonal psychotherapy. Journal of Consulting and Clin-
Nathan, P. E., Stuart, S. P., & Dolan, S. L. (2000). Research on psycho- ical Psychology, 63, 378-387.
therapy efficacy and effectiveness: Between Scylla and Charybdis? *Shea, M., Elkin, I., Imber, S., Sotsky, S., Watkins, J., Collins, J., Pilkonis,
Psychological Bulletin, 12, 964-981. P., Beckham, E., Glass, D., Dolan, R., & Parloff, M. (1992). Course of
National Advisory Mental Health Council, Clinical Treatment and Services depressive symptoms over follow-up: Findings from the National Insti-
Research Workgroup. (1999). Bridging science and service (NIMH tute of Mental Health Treatment of Depression Collaborative Research
Publication No. 99-4353). Bethesda, MD: National Institute of Mental Program. Archives of General Psychiatry, 49, 782-787.
Health. Shea, M., Widiger, T., & Klein, M. (1992). Comorbidity of personality
*Neimeyer, R. (1990). The role of homework skill acquisition in the disorders and depression: Implications for treatment. Journal of Con-
outcome of group cognitive therapy for depression. Behavior Ther- sulting and Clinical Psychology, 60, 857-868.
apy. 21, 281-292. *Shear, M., Pilkonis, P., Cloitre, M., & Leon, A. (1994). Cognitive behav-
Oldham, J., Skodol, A., Kellman, H., Hyler, S., Doidge, N., Rosnick, L., & ioral treatment compared with nonprescriptive treatment of panic disor-
Gallaher, P. (1995). Comorbidity of Axis I and Axis II disorders. der. Archives of General Psychiatry, 51, 395-401.
American Journal of Psychiatry, 152, 571-578. Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychotherapy
O'Leary, K. D., & Beach, S. R. (1990). Marital therapy: A viable treatment outcome studies. American Psychologist, 32, 752-760.
for depression and marital discord. American Journal of Psychiatry, 147, Snyder, D. K., Wills, R. M., & Grady-Fletcher, A. (1991). Long-term
183-186. effectiveness of behavioral versus insight-oriented marital therapy: A
*0st, L., & Westling, B. (1995). Applied relaxation vs cognitive-behavior 4-year follow-up study. Journal of Consulting and Clinical Psychol-
therapy in the treatment of panic disorder. Behavior Research and ogy, 59, 138-141.
Therapy, 33, 145-158. Spielberger, C. D., Gorsuch, R. L., & Lushene, R. E. (1970). The State-
Ost, L., Westling, B., & Hellstrom, K. (1993). Applied relaxation, exposure Trait Anxiety Inventory (STAI) test manual. Palo Alto, CA: Consulting
in vivo and cognitive methods in the treatment of panic disorder with Psychologists Press.
agoraphobia. Behavior Research and Therapy, 31, 383-394. Street, L., Niederehe, G., & Lebowitz, B. (2000). Toward greater public
Pennebaker, J. (1997). Writing about emotional experiences as a therapeu- health relevance for psychotherapeutic intervention research: An NIMH
tic process. Psychological Science, 8, 162-166. workshop report. Clinical Psychology: Science and Practice, 7, 127-
Persons, J., & Silberschatz, G. (1998). Are results of randomized controlled 137.
trials useful to psychotherapists? Journal of Consulting and Clinical Stuart, G. L., Treat, T. A., & Wade, W. A. (2000). Effectiveness of an
Psychology, 66, 126-135. empirically based treatment for panic disorder delivered in a service
Pilkonis, P., Heape, C., Ruddy, J., & Serrao, P. (1991). Validity in the clinic setting: 1-year follow-up. Journal of Consulting and Clinical
diagnosis of personality disorders: The use of the LEAD standard. Psychology, 68, 506-512.
Psychological Assessment, 3, 46—54. Tang, T., & DeRubeis, R. J. (1999). Sudden gains and critical sessions in
Power, K., Simpson, R., Swanson, V., Wallace, L., Feistner, A., & Sharp, cognitive-behavioral therapy for depression. Journal of Consulting and
D. (1990). A controlled comparison of cognitive-behaviour therapy, Clinical Psychology, 67, 894-904.
diazepam, and placebo, alone and in combination, for the treatment of *Taylor, S., Woody, S., Koch, W., McLean, P., & Anderson, K. (1996).
generalized anxiety disorder. Journal of Anxiety Disorders, 4, 267-292. Suffocation false alarms and efficacy of cognitive behavioral therapy for
Propst, L. R., Ostrom, R., Watkins, P., Dean, T., & Mashburn, D. (1992). panic disorder. Behavior Therapy, 27, 115—126.
Comparative efficacy of religious and nonreligious cognitive-behavioral Telch, M., Lucas, J., Schmidt, N., Hanna, H., Jaimez, T., & Lucas, R.
therapy for the treatment of clinical depression in religious individuals. (1993). Group cognitive behavioral treatment of panic disorder. Behav-
Journal of Consulting and Clinical Psychology, 60, 94—103. ior Therapy and Research, 31, 279-287.
Rijiken, H., Kraaimaat, F., De Ruiter, C., & Garssen, B. (1992). A Telch, M. J., Schmidt, N. B., Jaimez, T. L., Jacquin, K. M., & Harrington,
follow-up study on short-term treatment of agoraphobia. Behavior Re- P. (1995). Impact of cognitive-behavioral treatment on quality of life in
search and Therapy, 30, 63-66. panic disorder patients. Journal of Consulting and Clinical Psychol-
Rosenthal, R. (1991). Meta-analytic procedures for social research. Thou- ogy, 63, 823-830.
sand Oaks, CA: Sage. Thase, M. E., Bowler, K., & Harden, T. (1991). Cognitive behavior therapy
Roth, A., & Fonagy, P. (1996). What works for whom? A critical review of of endogenous depression: II. Preliminary findings in 16 unmedicated
psychotherapy research. New York: Guilford Press. inpatients. Behavior Therapy, 22, 469—477.
*Schulberg, H., Block, M., Madonia, M., Scott, P., Rodriguez, E., Imber, Thase, M. E., Reynolds, C. F., Frank, E., Simons, A. D., et al. (1994a). Do
892 WESTEN AND MORRISON

depressed men and women respond similarly to cognitive behavior constraint satisfaction in high-stakes political and legal decision mak-
therapy? American Journal of Psychiatry, 151, 500-505. ing. Unpublished manuscript, Boston University.
Thase, M. E., Reynolds, C. F., Frank, E., Simons, A. D., et al. (1994b). Westen, D., Feit, A., & Zittel, C. (1999). Methodological issues in research
Response to cognitive-behavioral therapy in chronic depression. Jour- using projective techniques. In P. C. Kendall, J. N. Butcher, & G.
nal of Psychotherapy Practice & Research, 3, 204-214. Holmbeck (Eds.), Handbook of research methods in clinical psychology
Thase, M. E., & Simons, A. D. (1992). Cognitive behavior therapy and (2nd ed., pp. 224-240). New York: Wiley.
relapse of nonbipolar depression: Parallels with pharmacotherapy. Psy- Westen, D., & Harnden-Fischer, J. (2001). Personality profiles in eating
chopharmacology Bulletin, 28, 117-122. disorders: Rethinking the distinction between Axis I and Axis II. Amer-
*Thase, M. E., Simons, A. D., McGeary, J., Cahalane, J., Hughes, C., ican Journal of Psychiatry, 158, 547-562.
Harden, T., & Friedman, E. (1992). Relapse after cognitive behavior Westen, D., & Morrison, K. (2000). The efficacy of short-term psychother-
therapy of depression: Potential implications for longer courses of treat- apies for anxiety and depression: A reappraisal. Unpublished manu-
ment. American Journal of Psychiatry, 149, 1046-1052. script, Boston University.
Van den Hout, M., Arntz, A., & Hoekstra, R. (1994). Exposure reduced Whisman, M. A., Miller, I. W., Norman, W. H., & Keitner, G. I. (1991).
agoraphobia but not panic, and cognitive therapy reduced panic but not Cognitive therapy with depressed inpatients: Specific effects on dys-
agoraphobia. Behaviour Research & Therapy, 32, 447-451. functional cognitions. Journal of Consulting and Clinical Psychol-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Wade, S., Monroe, S., & Michelson, L. (1993). Chronic life stress and ogy, 59, 282-288.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

treatment outcome in agoraphobia with panic attacks. American Journal Whisman, M. A., Miller, I. W., Norman, W. H., & Keitner, G. I. (1995).
of Psychiatry, 150, 1491-1495. Hopelessness depression in depressed inpatients: Symptomatology, pa-
Wampold, B., Mondin, G., Moody, M., Such, R, Benson, K., & Ahn, H. tient characteristics, and outcome. Cognitive Therapy & Research, 19,
(1997). A meta-analysis of outcome studies comparing bona fide psy- 377-398.
chotherapies: Empirically "all must have prizes." Psychological Bulle- *Williams, S. L., & Falbo, J. (1996). Cognitive and performance-based
tin, 122, 203-215. treatments for panic attacks in people with varying degrees of agora-
Westen, D. (1998). The scientific legacy of Sigmund Freud. Toward a phobic disability. Behavior Research and Therapy, 34, 253-264.
psychodynamically informed psychological science. Psychological Bul- Williams, J. M., Mathews, A., & MacLeod, C. (1996). The emotional
letin, 124, 333-371. Stroop task and psychopathology. Psychological Bulletin, 120, 3-24.
Westen, D. (1999). Psychodynamic theory and technique in relation to Wilson, G. T. (1999). Rapid response to cognitive behavior therapy.
research on cognition and emotion: Mutual implications. In T. Dalgleish Clinical Psychology: Science and Practice, 6, 289-292.
& M. Power (Eds.), Handbook of cognition and emotion (pp. 727-746). Zimmerman, M., McDermut, W., & Mattia, J. (2000). Frequency of anx-
New York: Wiley. iety disorders in psychiatric outpatients with major depressive disorder.
Westen, D., & Arkowitz-Westen, L. (1998). Limitations of axis II in American Journal of Psychiatry, 157, 1337-1340.
diagnosing personality pathology in clinical practice. American Journal Zinbarg, R. E., Barlow, D. H., Liebowitz, M., & Street, L. (1994). The
of Psychiatry, 155, 1767-1771. DSM-IV field trial for mixed anxiety-depression. American Journal of
Westen, D., & Arkowitz, J. (2001). May it please the court: Emotional Psychiatry, 151, 1153-1162.
MULTIDIMENSIONAL META-ANALYSIS 893

Appendix A

Studies Excluded From Meta-Analysis

Disorder Reason

Depression
Bowman et al. (1995) Treatments were self-administered.
Bowman et al. (1997) Treatments were self-administered.
DeRubeis et al. (1990) Data previously included with Hollon (1992).
Elkin et al. (1995) Data previously included with Shea (1992).
Hardy et al. (1995) Data previously included with Shapiro (1995).
O'Leary & Beach (1990) Data previously included with Beach & O'Leary (1992).
Propst et al. (1992) Patients were members of an atypical population (fundamentalist and evangelical Christians).
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Thase & Simons (1992) Study did not include a comparison group.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Thase et al. (1991) Data previously included with Thase et al. (1992).
Thase et al. (1994a) Data previously included with Thase et al. (1992).
Thase et al. (1994b) Data previously included with Thase et al. (1992).
Whisman et al. (1991) Study reported measures of cognitive dysfunction but not depression outcome.
Whisman et al. (1995) Study did not report depression outcome.
Panic
Chambless & Williams (1995) Study did not include or report measure of panic outcome; sample composed of participants from several prior
studies in which treatment type, duration, and frequency varied along with other variables relevant to our
analyses.
Cox et al. (1993) Study did not include or report measure of panic outcome.
Emmelkamp et al. (1992) Study focused exclusively on agoraphobia with no direct measure of panic.
Fava et al. (1995) Study was an open trial (a multiple case report) and not a controlled study; sole assessor was not blind.
Ost et al. (1993) Study did not include or report measure of panic outcome.
Rijiken et al. (1992) Study did not include or report measure of panic outcome.
Telch et al. (1993) Assessors inspected participants' panic diaries and excluded entries they classified as limited symptom attacks
with no blind or reliability.
Telch et al. (1995) Study did not include or report measure of report panic outcome.
Van den Hout et al. (1994) Study did not provide information necessary to calculate variables of interest, including improvement rate,
effect size, inclusion/exclusion criteria, or sample selection.
Wade et al. (1993) Study did not include or report measure of panic outcome.
GAD
Power et al. (1990) Study was actually an open trial, which included only one measure of anxiety (HARS) rated nonblind by the
same clinician who performed most of the treatments in all conditions.

Note. GAD = generalized anxiety disorder; HARS = Hamilton Anxiety Rating Scale.

(Appendixes continue)
894 WESTEN AND MORRISON

^ O~ ^ ^ Vi" 0? ^ X£
•x s~*. ^^
2 II
r-^ o -** a\ t^, o< id- Ci 2£. •o H
§ C C-
5 ~ 8=
CO II ^
OC
l/^l ——
\£) — VJ O^ " ^ 1 ^ ^
§ ON vi ^o r~- o\ | vi oo o\ P** |
J2
Q"
1^
o oo o ^ oq JZ ^
irj " S "^
CO ^ -S
et — -S
I sq oq M u-> p p . "gG
< | O OO vd Ov O r^ I

in »n o V) r^ o rn i
IE
1 1* rJ CN « c*S c4 c4 — 1 S II
(1
>
*-
UJ
a
>
f-
^o
-
10
o
i
|
m
6 1 1
jl
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Q 1
1
1 O vi O O O O *O "^

f % !S 1 l ? i S1 VI ^J- ffl N

#
-^
CQ !•}
1 c"g
U o c

'>> IC
o
n
o
O
o o
J
r
N
-
d
v

o
C
O O N O
* S — V J (
O N r t o o N O r ^
N f i O N f S
o o
O Is
73
C
O

Iio
"^
2 •o <- II
3 m CS[ p p CN p vi Tf f-i p p S J
-^ vS oo
^
e
• ••H
1 c i S f i S m 2 2 2 ? C«^ ^ V) so .-
'"" c >>
—. *o 'S ^«
CQ D E^ ^ l_l

.H •§ c •!
t^ *
1 1 » * -a S1 "a «
.1 H 11
V, g 32 | ^
1 Is
<* -2 0> i" 1 1i1• 1i s a| fl
c/) •S § . c g ^ c * ^ ^ |
1 11 si
ex ">.-o °I 5 S 1 |"f
tflU c n J I C L ^ J - - 'S'S OJ ^-^ o S

C 0) *5 ^ "pi j S ^ c « J 3 >v ^
1 11 H 8
"o i"
.§ ^ S *- 1" §• I-'E "i* li-i-fl'l' 1 J 5 "o
OH y; CJ PQ
•3 fe <J
3 H
"11 -Illi lillliiil 1 S
S
S *
o "5.
* « _§ *
u ^?
«!

8.S:
M £
3
. .

S
Sffldl|§ < ffl|olilcQll§lilll||;| > co^ < ^^S|lffl §S
Q Q W > - W t t , V 5 ^ ^ w Q - O - t W W C D v J ^ C Q a , 0 5 U ' v J ^ H - . a « k .' ^ ^ * ^ ^
fc-
**^*™** > s
H-c
^" ^O xo r^- fS rO f*i f*^ o H I
^ V > t - m O f~ 2 N O O N T ? ON
NO
O
oo
NO
r~
£ J:
fa o
^ "e ^ "s ^ s *° aO- •>cs
-C
o, " S S o . c ' o c S
2 — *^ ed tti l QJ H
15 «3 8 J Q o S O ^ O s g-ll
°- H
o

S
J
?
ft
w
o
^ C
J Q^
IE
S?2
O j?^
S"
o CL Uft
o
^
U ft
o
^
1
J— 2
u ^^- J 3Tft
** ft-
j 3j <j | ?P S
0
LJ
O ^ o< 5
2 ^
r j Q^
C
'g S>
£ r\.

"^
l!8 g

a
vj"
o\
ii *
> u
*-
° cu
2
^ =y NO" C- ig § ~
I C C - c g ^ ^ g N g §s § •* ^ § M 0
S ii -g
o
^
O ON ^ O Q^ O1* O c« g- s
~*' ' « ^ 0s
** i:
ii
'I
• - G,
"P —
00

1 ^j W «"5'S«-1 C .. '"^ a3 u? ^ '"^ w


"^ ^^ ill
]g w
•~" S l
CO
i i li
OQ W 53 W ^
s S
oS Z
Ii W W
I'Sj' i ^ W P
| ^ I .s
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

(S
<N

CO O
00
<3

3
5

I
fc

S.
a.

u
a.

o.
MM
M

1 1

\i
M i l l

18
]i a\
"• oo

Illlllslllll
\ \ Ifc I g M I

"°-
° —•
MULTIDIMENSIONAL META-ANALYSIS

II
03

a.
•S
2
•B

Q
s
2

o.
a.
1

I
895
896 WESTEN AND MORRISON

•*s S" ii

s* ^^
i NO w"J NO f) — ""J °^ f~ "* — 'T O V) -^
J^ d CM d d —* —- d d —« — 1 CM' d CM s-
o
u
i. ^ vO
d o
^" ^"

^"
d
-~
o
1
'
OO
d
—•
d o
^ [
1
CM
d
NO
rJ
O N O N O O ^ O
d d d d —
be
c
2

X Q OO CM X

y to cs -^ s
u
0) >

d ^ I I I I I I I 3 I 51 i i 'I
8
u
l O N N O O O N C M ON T}-«n rM — O j O O O ^ ;t a.
a I l d d - ^ d r * i -^ -4o" CM ri — 1 d —^ — U
"S3
"o
t.
ON
1
o d d | j | I ® O C M ' | | d u
UJ £
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

'S
S
This document is copyrighted by the American Psychological Association or one of its allied publishers.

u II
O N O O irj m p . O p r*- o * n o o m
o.
.1
N o r ~ - * ^ r ^ ^ o f^ w ^ f ^ o o ^ N o r— V ^ N O a»
•a
Si • ^ O N O t ^ O CM N O i n O <*1 O O OO — O O
1
S
*1 r ^ d d
O N O O N
I M ^ O N
i r ^ N O
-^t
oo
NO'*
G O r-
ON
r-
f^
oo
CM
oo
d ^
o ON
r n Q
NO o
Q
o u
8
i
u
Cfl
*C«
•a *
0 . O . . O 0 0 0 00 0 0
#1 N O | OJ | | N O | | N O | 0 | r q | O O
*c3 J 1
c
<
cd
§"
s
U *£3
S ^
1
a. •o
U -S u g
II
£ "S >. "a. ta
a

t; sx = symptoms; cells with dast


•1
I
ex J3
-B 73
•a

<C "
'S

•sw H ii ikiiji: i! 1 1 pll liiiS till hi 1


_u
•3
II lllflltj ililil illtittl|i*g tfii|l|! flillli

hology.
3
•t-*
U^^oOOQ£2ouOuUUUUUCjS<?Soodu,S CjSEiXtiJ{£S5ZSS2EEU-<u3OUZUUUoiu2
c u
c v>
N o r ^ ^ ^ t t ^ N O "^ O W ^ N O r- ••* o oo m CMoo
% m m N D N O c ^ c M NO m w i O N tn wi —• fo ^o C M -^t |-3
tS "5

>* £* >» S^ ^s^1 S* £* 3


*c
I "!
i > u
ta> &
a
1
u
B-
n u
I-!u *
3 o
1u
> 00
1 — c
S £ H 1 I l ^ n | f S
_ <«t
•=^
«.
-
*
•"^
£• *.
£° ; > O, . >O ,. S-
s-
i
.«=«
•=•=
_ «a
1 ^ >.c -g3
c
11
c o ^ c 3
1 t*
II g
S | o ^ S S § ^ S S S S f e

C
1
^ ^
1 6
u u
b 1 6
g u
1
g fi
1 1 1
n < u
&
S
1
mn
11 1 1 1
< a in
"1
8u
1 I
|T1

1 1 ~ s! - s - li^l
_• 4>

1 •§•
1
i | | 1 |I | I 1 | 1 I 51 Il|
f{
1
~ | ^ ^ ™ S
« ^ S ^ S 2 "3 = ^ ^ ^ ^ >
° ^ S S c ™ c " S « i o " — S " i
^» '£
U1 < o a o Q z i a i a 5 Q t j U D 5 S iio S H ^ IS1
< CJ
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

oo
§
§

8
83
'x
'•3
•a

CL
ao.

O
a
II I I I I
SS

I I I II I
sII

[
MULTIDIMENSIONAL META-ANALYSIS

I
897
898 WESTEN AND MORRISON

0? P o ^
I^o oo ~
•—'
o
•*—•
I
3
^
H cd

Jt 2 oq co
i
^ •* ^
•<± ' in
u
ai

Q oT - P - ?J 1
-S
on ^S iO- S- 50- i£- ^ ^
Q: €

a C3 rn m ON 1- Eft

5 (S r— o K t~
O

JH
03
D §- SO OO NO Tt O O
•S a. —' r-i —- — rn t/5

i> .i*
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

o o\ C GO
This document is copyrighted by the American Psychological Association or one of its allied publishers.

U H o °°
0 \\ \\
H •^ •§
|-g
•a
f_ <U
cO O o o m S ?1
C CL wS <n TJ- ON r^ wo •§
g .H 10 SO CS I^ CO c

s ^ ^ "^
H "^
i
^^
£* W5
.22 "8 « C
*N
c 5 CM in o
CS m
y 3
< g
SO
wS
r-
r*"i
fo
oo

ON
00
OO
C5
OO
C/5 g

T3 8
^ I-
"S 1"3
u E
G
o
"B
J3 CTx •* ^
"i J
O (N
"o C ro ' in o o.
&
"w
18
w ^^
>> i
,, o.
>,
C
^^
II g
< en S
3
<s 1" *
tt —
S "^
<U >-,
"c3 O ^ C 3^ Q
JS * W W

C 1 S»
5 *
I'Sx
1 1 1 II 11"!
•i•o = cognitive-1
Treatmel

•c^l -Si rsus posttrea


3
S ga
•a . Z c
ts =
g T3
1
EA
11
ctf U U > 2 < U o
1oai#§as1u§*<1
<
11 S
3> u

3 W~3 SO
in CM ON |S
^
cd "o
>^ £ is
•fa c
£ 1.8
S:
T3 .c 1 ll II JS

3
O s 1 fl1 f. g

C
1 s ft
y
-g
1- 1°
-2.Z

O ^
§

Q
g
2 ON
>s
%
^ 5 ''~V
11 — 1u

f
(^ fM 'S cd 2T >
g 1 •a
I
O
1
cn
c
"3
5
<*J
1 o1 13 § II uu
a •3 | S
O » 8
5 6i
1 1 1 •S a 1
a o 3 'C a ll
52 O3 CD 03 <J Q
^ S
MULTIDIMENSIONAL META-ANALYSIS 899

& ii •
& \ _! 8s3
0 | o5 o5 °
£ oo' of r-'

t/j flj ca *c
S m j — ^ P Q_
1 8 E£ <
X
c a. ^ u

VI
o. g M
1 &•
X «3 "X3 .2 O
.§ £
<
if '5 S
I
r*\
1 E^, i
o"

«
"3 >
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

1 s
g
This document is copyrighted by the American Psychological Association or one of its allied publishers.

'x

c \ '2

£ 3
i c/5

X H
—; in
% S § JB
0
M)
C
•c
oo a!
% o

H 'x
V)
vq ^
c
^ S
2

s]

§
s^ II
&> -^ CO
•sc at "^ Qi

0 X I
m
oo ^ \d S
A. ri ^5
i o. g
.1 £ S 1
'« G,
^*

-C
I , o
1
I i
<N 1
1
-Q
1
D.
E
1
H
a c a.
S.

.0 # c
"S 8
u. U5
3
-s; ^—^
j2
u
| i
O c
00 ^.g I ""
***«1
>, o> Si "~ g Q
•ff
s C-J 2 2
^
a 55 —co • og., —
^ If Oi
u
>

o
| ^ sH *
«
y o y w 4j *ri
<N 3 "S IBS
Q O QQ CQ QQ Is

You might also like