D KEFSStroopasPVT2018 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/323366522

The Stroop Test as a Measure of Performance Validity in Adults Clinically


Referred for Neuropsychological Assessment

Article  in  Psychological Assessment · February 2018


DOI: 10.1037/pas0000525

CITATIONS READS

7 324

6 authors, including:

Laszlo A Erdodi Sanya Sagar


University of Windsor University of Windsor
58 PUBLICATIONS   387 CITATIONS    12 PUBLICATIONS   43 CITATIONS   

SEE PROFILE SEE PROFILE

Brandon Zuccato Robert M. Roth


University of Windsor Geisel School of Medicine at Dartmouth
12 PUBLICATIONS   53 CITATIONS    140 PUBLICATIONS   2,874 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Low scores on the Grooved Pegboard Test are associated with invalid responding and psychiatric symptoms View project

All content following this page was uploaded by Laszlo A Erdodi on 02 March 2018.

The user has requested enhancement of the downloaded file.


Psychological Assessment
The Stroop Test as a Measure of Performance Validity in
Adults Clinically Referred for Neuropsychological
Assessment
Laszlo A. Erdodi, Sanya Sagar, Kristian Seke, Brandon G. Zuccato, Eben S. Schwartz, and Robert
M. Roth
Online First Publication, February 22, 2018. http://dx.doi.org/10.1037/pas0000525

CITATION
Erdodi, L. A., Sagar, S., Seke, K., Zuccato, B. G., Schwartz, E. S., & Roth, R. M. (2018, February 22).
The Stroop Test as a Measure of Performance Validity in Adults Clinically Referred for
Neuropsychological Assessment. Psychological Assessment. Advance online publication.
http://dx.doi.org/10.1037/pas0000525
Psychological Assessment © 2018 American Psychological Association
2018, Vol. 1, No. 2, 000 1040-3590/18/$12.00 http://dx.doi.org/10.1037/pas0000525

The Stroop Test as a Measure of Performance Validity in Adults Clinically


Referred for Neuropsychological Assessment

Laszlo A. Erdodi, Sanya Sagar, Kristian Seke, Eben S. Schwartz


and Brandon G. Zuccato Waukesha Memorial Hospital, Waukesha, Wisconsin
University of Windsor

Robert M. Roth
Geisel School of Medicine at Dartmouth/Dartmouth-Hitchcock Medical Center
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

This study was designed to develop performance validity indicators embedded within the Delis-Kaplan
Executive Function Systems (D-KEFS) version of the Stroop task. Archival data from a mixed clinical
sample of 132 patients (50% male; MAge ⫽ 43.4; MEducation ⫽ 14.1) clinically referred for neuropsy-
chological assessment were analyzed. Criterion measures included the Warrington Recognition Memory
Test—Words and 2 composites based on several independent validity indicators. An age-corrected scaled
score ⱕ6 on any of the 4 trials reliably differentiated psychometrically defined credible and noncredible
response sets with high specificity (.87–.94) and variable sensitivity (.34 –.71). An inverted Stroop effect
was less sensitive (.14 –.29), but comparably specific (.85–90) to invalid performance. Aggregating the
newly developed D-KEFS Stroop validity indicators further improved classification accuracy. Failing the
validity cutoffs was unrelated to self-reported depression or anxiety. However, it was associated with
elevated somatic symptom report. In addition to processing speed and executive function, the D-KEFS
version of the Stroop task can function as a measure of performance validity. A multivariate approach
to performance validity assessment is generally superior to univariate models.

Public Significance Statement


The Stroop test can function as a performance validity indicator by identifying unusual patterns of
responding. Invalid performance was associated with higher levels of self-reported somatic symptoms.

Keywords: Stroop task, performance validity, embedded validity indicators

The validity of the neuropsychological evaluation hinges on the indicators (EVIs) are increasing in popularity. EVIs are derived
examinees’ ability and willingness to demonstrate their typical from traditional neuropsychological tests originally designed to
level of cognitive functioning (Bigler, 2015). Therefore, there is a measure cognitive ability, but were subsequently coopted as PVTs.
broad consensus within the profession that a thorough performance Many EVIs have strong empirical support and a long presence in
validity assessment is an essential part of the examination (Bush, the research literature. Some predate the most acclaimed stand-
Ruff, & Heilbronner, 2014; Chafetz et al., 2015; Heilbronner et al., alone PVTs, such as those based in verbal fluency (Hayward, Hall,
2009). As a result, the administration of multiple, nonredundant Hunt, & Zubrick, 1987), digit span (Greiffenstein, Baker, & Gola,
performance validity tests (PVTs) has become a widely accepted 1994), or symbol substitution (Trueblood, 1994) tasks.
practice standard (Boone, 2013; Larrabee, 2014). EVIs have several advantages over stand-alone PVTs. First,
Although stand-alone instruments are considered the gold stan- they allow clinicians to use multiple validity indicators without
dard for validity assessment (Green, 2013), embedded validity adding new measures, resulting in significant savings in test ma-
terial and administration time. Compressing the battery also lowers
the demand on patients’ mental stamina, which is especially im-
portant when assessing individuals with complex medical and
psychiatric history (Lichtenstein, Erdodi, & Linnea, 2017). EVIs
Laszlo A. Erdodi and Sanya Sagar, Department of Psychology, Univer- may also be more resistant to coaching, as they are less likely to be
sity of Windsor; Kristian Seke, Brain-Cognition-Neuroscience Program, identified as PVTs than stand-alone instruments (Chafetz et al.,
University of Windsor; Brandon G. Zuccato, Department of Psychology,
2015; Schutte, Axelrod, & Montoya, 2015). Finally, they automat-
University of Windsor; Eben S. Schwartz, Waukesha Memorial Hospital,
Waukesha, Wisconsin; Robert M. Roth, Geisel School of Medicine at
ically address concerns about the generalizability of the PVT
Dartmouth/Dartmouth-Hitchcock Medical Center. scores to the rest of the battery (Bigler, 2014). Overall, these
Correspondence concerning this article should be addressed to Laszlo A. features enable EVIs to achieve the ideal of ongoing monitoring of
Erdodi, 168 Chrysler Hall South, 401 Sunset Avenue, Windsor, ON N9B test-taking effort (Boone, 2009) without placing a significant ad-
3P4, Canada. E-mail: [email protected] ditional burden on either the examiner or examinee.

1
2 ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH

The Stroop (1935) paradigm, of which there are many variants, effect as an index of validity has not been replicated consistently
has the potential to function as an EVI. The task usually consists in the literature (Arentsen et al., 2013).
of at least three trials (MacLeod & MacDonald, 2000). In the first To our knowledge, the potential of the D-KEFS Stroop to
trial, the participant is asked to read a series of color words, printed function as an EVI has not been investigated. Given its more
in black ink, as quickly as possible. In the second trial, the nuanced difficulty gradient because of the unique combined inhi-
participant is asked to look at a series of color squares, and name bition/switching task (Trial 4), it may be particularly useful as a
the colors as quickly as possible. The third trial is the test of measure of performance validity. The purpose of this study is to
interference and the evoker of the classic Stroop effect: the par- examine the utility of the D-KEFS Stroop in differentiating cred-
ticipant is asked to look at a series of color words, printed in ible and noncredible response sets in a clinical setting.
incongruent ink colors, and name the color of the ink instead of
reading the word, as quickly as possible. For example, if the word Method
“red” is printed in green ink, the examinee is asked to say “green”
instead of “red.” Because reading words is more automatized than Participants
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

naming ink colors, inhibiting the overlearned response requires


This document is copyrighted by the American Psychological Association or one of its allied publishers.

additional cognitive resources, which results in increased comple- Data were collected from a consecutive sequence of 132 patients
tion time relative to the word reading and color naming trials (50% male, 89.4% right-handed), clinically referred for neuropsy-
(MacLeod & MacDonald, 2000). chological assessment at a northeastern academic medical center.
The Stroop task within the Delis-Kaplan Executive Function The vast majority of them (⬎95%) were White, reflecting the
System (D-KEFS; Delis, Kaplan, & Kramer, 2001) includes a demographic composition of the region. Age (M ⫽ 43.4, SD ⫽ 16)
fourth trial (inhibition/switching) designed to further increase the followed a bimodal distribution, with one peak around 20 years
cognitive load by requiring examinees to switch back and forth and another around 55 years. Mean level of education was 14.1
between two sets of rules. On Trial 4, half of the words are years (SD ⫽ 2.8). Overall intellectual functioning was in the
enclosed in boxes. The examinee is instructed to name the color of average range (MFSIQ ⫽ 101.2, SDFSIQ ⫽ 16.4), as were scores on
the ink for free-standing items (as in Trial 3 of the classic Stroop a single word reading test (MWRAT-4 ⫽ 104.4, SD WRAT-4 ⫽ 14.8).
task), but read the word (rather than name the ink color) for items The most common primary diagnosis was psychiatric (46.2%),
inside a box. Trial 4 was meant to be more difficult than the followed by TBI (35.6%), neurological disorders (14.4%) and
interference trial to capture more subtle executive deficits. How- other medical conditions (3.8%). Within the psychiatric sub-
ever, the empirical evidence on this difficulty gradient is mixed sample, most patients had been diagnosed with depression
(Lippa & Davis, 2010). (45.9%), followed by somatic (19.7%) and anxiety disorders
The Stroop paradigm has been shown to be sensitive to neuro- (13.1%). The majority of the TBI patients (81.1%) had sustained a
psychiatric conditions with executive dysfunction as a common mild injury. Likewise, the average self-reported depression was in
feature, such as traumatic brain injury (TBI; Larson, Kaufman, the mild range (MBDI-II ⫽ 16.4, SD BDI-II ⫽ 14.8). Most patients
Schmalfuss, & Perlstein, 2007; Schroeter et al., 2007) and (45.6%) scored in the minimal range (ⱕ13), while 21.6% scored in
attention-deficit-hyperactivity disorder (ADHD; Lansbergen, Ken- the mild (14 –19), 17.6% scored in the moderate (20 –28), and
15.2% scored in the severe (ⱖ29) range for self-reported depres-
emans, & Van Engeland, 2007). However, there is limited research
sion.
examining the utility of the Stroop paradigm as a measure of
noncredible performance. Arentsen and colleagues (2013) intro-
duced validity cutoffs for the word reading (ⱖ66 s), color naming Procedures
(ⱖ93 s), and interference (ⱖ191 s) trials in the Comalli Stroop Data were collected through a retrospective chart review from
Test (Comalli, Wapner, & Werner, 1962). All of these cutoffs patients assessed between December 2012 and July 2014. The
achieved specificity ⱖ.90 in a mixed clinical population, with main inclusion criterion was a complete administration of the
.29 –.53 sensitivity. D-KEFS Stroop. The study was approved by the ethics board of
A raw residual score (i.e., predicted score minus actual score) the hospital where the data were collected, and that of the univer-
of ⱕ⫺47 on the word reading trial of the Stroop Color and Word sity where the research project was finalized. Relevant guidelines
Test (Golden & Freshwater, 2002) discriminated noncredible from regulating research with human participants were followed
credible responders at .95 specificity and .29 sensitivity using throughout the study.
Slick, Sherman, and Iverson’s (1999) criteria for malingered neu- The names and abbreviation of tests administered are provided
rocognitive dysfunction (Guise, Thompson, Greve, Bianchini, & in Table 1. The percentage of the sample with scores on each test
West, 2014). Other studies (Egeland & Langfjaeran, 2007; is also listed. A core battery of tests was administered to most
Osimani, Alon, Berger, & Abarbanel, 1997) have found that non- patients, while the rest of the instruments were selected based on
credible performers may display slower overall reaction time (RT) the specific referral question. Therefore, they vary from patient to
and an inverted Stroop effect (i.e., better performance on the patient.
interference trial than the word reading or color naming trials). The main stand-alone PVT was Warrington’s Recognition
While Osimani and colleagues (1997) did not perform signal Memory Test—Words (RMT). Failure was defined as an accuracy
detection analyses, Egeland and Langfjaeran (2007) reported un- score of ⱕ43 or a completion time of ⱖ192 s (Erdodi, Tyson, et
acceptably low specificity (.59) for the inverted Stroop effect, even al., 2017). In addition, a composite of 11 validity indicators labeled
though the majority of noncredible performers exhibited this vio- “Effort Index Eleven” (EI-11) was developed to provide a com-
lation of the difficulty gradient. Furthermore, the inverted Stroop prehensive measure of performance validity (Erdodi, Abeare, et
STROOP TEST 3

Table 1
List of Tests Administered: Abbreviations, Scales, and Norms

Test name Abbreviation Norms % ADM

Beck Depression Inventory, 2nd Edition BDI-II — 94.4


Beck Anxiety Inventory BAI — 69.7
California Verbal Leaning Test, 2nd Edition CVLT-II Manual 100.0
Complex Ideational Material CIM Heaton 32.6
Conners’ Continuous Performance Test, 2nd Edition CPT-II Manual 78.8
Delis-Kaplan Executive Systems–Stroop D-KEFS Manual 100.0
Finger Tapping Test FTT Heaton 81.1
Letter and Category Fluency Test FAS & Animals Heaton 84.1
Personality Assessment Inventory PAI Manual 43.9
Recognition Memory Test–Words RMT — 100.0
Rey 15-Item Test Rey-15 — 81.8
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Rey Complex Figure Test RCFT Manual 96.2


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Trail Making Test (A & B) TMT (A & B) Heaton 56.8


Wechsler Adult Intelligence Scale, 4th Edition WAIS-IV Manual 99.2
Wechsler Memory Scale, 4th Edition WMS-IV Manual 99.2
Wide Range Achievement Test, 4th Edition WRAT-4 Manual 83.3
Wisconsin Card Sorting Test WCST Manual 91.7
Note. Heaton ⫽ Demographically adjusted norms published by Heaton, Miller, Taylor, and Grant (2004);
Manual ⫽ Normative data published in the technical manual; % ADM ⫽ Percent of the sample to which the test
was administered.

al., 2017; Erdodi, Kirsch, Lajiness-O’Neill, Vingilis, & Medoff, is recognized that this may increase error variance by potentially
2014; Erdodi & Roth, 2017). The constituent PVTs were dichot- misclassifying noncredible patients as credible.
omized into Pass (⫽ 0) and Fail (⫽ 1) along published cutoffs. The value of EI-11 is the sum of failures on its components.
Some PVTs have multiple indicators; failing any indicator was Given the relatively large number of indicators, and that the most
considered as failing the entire PVT (⫽ 1). Failing multiple indi- liberal cutoffs were used to maximize sensitivity (see Table 2), the
cators nested within the same measure was counted as a single EI-11 is prone to false positive errors by design. To correct for
failure (⫽ 1). Missing data were coded as Pass (⫽ 0), although it that, the more conservative threshold of ⱖ3 independent PVT

Table 2
Base Rates of Failure for EI-11 Components, Cutoffs, and References for Each Indicator

Test BRFail Indicator Cutoff Reference

Rey-15 10.6 Free recall ⱕ9 Lezak, 1995; Boone et al., 2002


TMT 15.9 A ⫹ B (“) ⱖ137 Shura et al., 2016
Digit Span 25.0 RDS ⱕ7 Greiffenstein et al., 1994; Pearson, 2009
ACSS ⱕ6 Axelrod et al., 2006; Spencer et al., 2013; Trueblood, 1994
LDF ⱕ4 Heinly et al., 2005
WCST 14.4 FMS ⱖ2 Larrabee, 2003; Suhr & Boyer, 1999
LRE ⬎1.9 Greve et al., 2002; Suhr and Boyer, 1999
CIM 6.1 Raw ⱕ9 Erdodi and Roth, 2017; Erdodi, Tyson, et al., 2016
T-score ⱕ29 Erdodi and Roth, 2017; Erdodi, Tyson, et al., 2016
LMWMS-IV 15.9 I ACSS ⱕ3 Bortnik et al., 2010
II ACSS ⱕ4 Bortnik et al., 2010
Recognition ⱕ20 Bortnik et al., 2010; Pearson, 2009
VRWMS-IV 18.2 Recognition ⱕ4 Pearson, 2009
CVLT-II 12.9 HitsRecognition ⱕ10 Bauer et al., 2005; Greve et al., 2009; Wolfe et al., 2010
FCR ⱕ15 Bauer et al., 2005; D. Delis (personal communication, May 10, 2012
RCFT 34.1 Copy raw ⱕ26 Lu et al., 2003; Reedy et al., 2013
3-min raw ⱕ9.5 Lu et al., 2003; Reedy et al., 2013
TPRecognition ⱕ6 Lu et al., 2003; Reedy et al., 2013
Atyp RE ⱖ1 Blaskewitz et al., 2009; Lu et al., 2003
FAS 9.8 T-score ⱕ33 Curtis et al., 2008; Sugarman and Axelrod, 2015
Animals 16.7 T-score ⱕ33 Hayward et al., 1987; Sugarman and Axelrod, 2015
Note. BRFail ⫽ Base rate of failure (% of the sample that failed one or more indicators within the test); TMT ⫽ Trail Making Test; RDS ⫽ Reliable digit
span; ACSS ⫽ Age-corrected scaled score; LDF ⫽ longest digit span forward; WCST ⫽ Wisconsin Card Sorting Test; FMS ⫽ Failure to maintain set;
UE ⫽ Unique errors; LRE ⫽ Logistical regression equation; CIM ⫽ Complex Ideational Material from the Boston Diagnostic Aphasia Battery; WMS-IV ⫽
Wechsler Memory Scale, 4th Edition; LM ⫽ Logical Memory; VR ⫽ Visual Reproduction; CVLT-II ⫽ California Verbal Learning Test, 2nd Edition;
FCR ⫽ Forced choice recognition; RCFT ⫽ Rey Complex Figure Test; TPRecognition ⫽ Recognition true positives; Atyp RE ⫽ Atypical recognition errors.
4 ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH

failures was used to define Fail on the EI-11. At the same time, to always significantly lower than scores in the Fail range. However,
maintain the purity of the credible group, Pass was defined as ⱕ1. scores in the Borderline range did not differ consistently from the
Hence, patients with EI-11 scores of two were considered Border- other two classification ranges (see Table 5).
line and excluded from the analyses (Erdodi & Roth, 2017; Erdodi, These analyses were repeated using the trichotomized EI-7PSP
Tyson, et al., 2017). (Pass-Borderline-Fail) as the independent variable, and the RMT
Relying on a mixture of PVTs representing a wide range of accuracy score, completion time, and the EI-11 scores as depen-
sensory modalities, cognitive domains, and testing paradigms is a dent variables. All contrasts were significant with large effects (␩2:
desirable feature of the EI-11, as it provides an ecologically valid .11–.34). As before, scores in the Pass range were always signif-
index of performance validity. However, this heterogeneity could icantly lower than scores in the Fail range, but scores in the
also become a source of error variance, especially when the pur- Borderline range did not differ consistently from the other two
pose of the instrument is to establish the credibility of the perfor- classification ranges (see Table 6). Overall, these findings provide
mance on a specific test, and not on the overall neurocognitive empirical support for eliminating participants with EI-11 and EI-
profile. The issue of modality-specificity as a confound in signal 7PSP scores in the Borderline range when computing the classifi-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

detection analyses was raised as a theoretical concern (Leighton, cation accuracy of the D-KEFS Stroop to minimize error variance
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Weinborn, & Maybery, 2014) and has found empirical support and establish criterion groups with neurocognitive profiles that are
(Erdodi, Abeare, et al., 2017; Erdodi, Tyson, et al., 2017). either clearly valid or invalid.
Therefore, because the D-KEFS Stroop is timed, another valid- Mean D-KEFS Stroop age-corrected scaled scores (ACSS) were
ity composite was developed based on constituent PVTs that were in the average range on all four trials. However, they were signif-
based on processing speed measures, labeled “Erdodi Index icantly below the nominal mean of 10, with small to medium effect
Seven” (EI-7PSP). The unique feature of the EI-7PSP is that instead sizes (Cohen’s d: .25–.44). Skew and kurtosis were well within
of the traditional Pass/Fail dichotomy, each of its components is ⫾1.0 (see Table 7). However, visual inspection revealed bimodal
coded on a 4-point scale ranging from zero (unequivocal Pass) to distributions with one peak in the impaired range and another in
three (unequivocal Fail), with one and two reflecting intermediate the average-to-high average range.
levels of failure (see Table 3). As such, the EI-7PSP captures both Trial 1 ACSS ⱕ7 failed to clear the minimum threshold for
the number and extent of PVT failures, recognizing the underlying specificity (.84; Larrabee, 2003) against the RMT and EI-11.
continuity in test-taking effort (Erdodi, Roth, Kirsch, Lajiness- The ⱕ6 cutoff produced good combinations of sensitivity (.43–
O’Neill, & Medoff, 2014; Erdodi, Tyson, et al., 2016). .71) and specificity (.86 –.94) against all three reference PVTs.
Because the practical demands of clinical classification require Lowering the cutoff to ⱕ5 produced negligible changes in classi-
a dichotomous outcome, EI-7PSP scores ⱕ1 were defined as Pass, fication accuracy. The more conservative ⱕ4 cutoff resulted in
and ⱖ4 as Fail. EI-7PSP values of two and three represent an predictable tradeoffs: improved specificity (.93–.99) at the expense
indeterminate range, as they could reflect either multiple failures at of sensitivity (.29 –.43).
a liberal cutoff or a single failure at the most conservative cutoff. Trial 2 ACSS ⱕ7 cleared the minimum threshold for specificity
As these performances are considered “near-passes” by some against the EI-11 and EI-7PSP, but fell short against the RMT.
(Bigler, 2012, 2015), patients with EI-7PSP scores in this range The ⱕ6 cutoff produced good combinations of sensitivity (.45–
were excluded from signal detection analyses in the interest of .62) and specificity (.87–.91) against all three reference PVTs.
obtaining diagnostically pure criterion groups, following method- Lowering the cutoff to ⱕ5 improved specificity across all refer-
ological guidelines established by previous researchers (Axelrod, ence PVTs (.92–.96) with minimal loss in sensitivity (.38 –.57).
Meyers, & Davis, 2014; Greve & Bianchini, 2004). The majority The more conservative ⱕ4 cutoff produced excellent specificity
of the sample (62.1%) performed in the passing range; only 15.9% (.96 –.99) with relatively well-preserved sensitivity (.33–.48).
scored ⱖ4 (see Table 4). Trial 3 ACSS ⱕ7 cleared the minimum threshold for specificity
against the EI-11 and EI-7PSP, but once again, fell short of expec-
tations against the RMT. Lowering the cutoff to ⱕ6 resulted in the
Data Analysis
predictable tradeoffs, but still failed to reach minimum specificity
Descriptive statistics (mean, SD, BRFail) are reported for the against the RMT. Lowering the cutoff to ⱕ5 improved specificity
relevant variables. The main inferential statistics were one-way across all reference PVTs (.87–.99) with minimal loss in sensitiv-
analyses of variance (ANOVAs) and independent sample t tests. ity (.26 –.62). The more conservative ⱕ4 cutoff produced excellent
Effect size estimates are reported in Cohen’s d and partial eta specificity (.94 –1.00) with acceptable sensitivity (.26 –.52).
squares (␩2). Classification accuracy (sensitivity and specificity) Trial 4 ACSS ⱕ7 failed to clear the minimum threshold for
was calculated using standard formulas (Grimes & Schulz, 2005). specificity against the RMT and EI-11. The ⱕ6 cutoff produced
The emerging standard for specificity is ⱖ.90 (Boone, 2013), with acceptable combinations of sensitivity (.29 –.48) and specificity
the minimum acceptable value at .84 (Larrabee, 2003). (.84 –.91) against all three reference PVTs. Lowering the cutoff
to ⱕ5 produced negligible changes in classification accuracy. The
more conservative ⱕ4 cutoff resulted in predictable tradeoffs:
Results
improved specificity (.92–.96) at the expense of sensitivity (.21–
One-way ANOVAs using the trichotomized EI-11 (Pass- .33).
Borderline-Fail) as the independent variable, and the RMT accu- To evaluate whether the pattern of performance across the trials
racy score, completion time, and the EI-7PSP scores as the depen- can reveal invalid responding, two additional derivative validity
dent variables, were statistically significant. Associated effect indices were examined: the Trials 4/3 raw score ratio, and the
sizes were large (␩2: .16 –.23). Scores in the Pass range were (Trials 1 ⫹ 2)/(Trials 3 ⫹ 4) ACSS ratio. The former index is a
STROOP TEST 5

Table 3 Lowering the cutoff to ⱕ0.75 cleared the lower threshold for
The Components of the EI-7PSP With Base Rates of Failure specificity against all reference PVTs (.88 –.89), with accept-
Corresponding to Each Cutoff able sensitivity (.26 –.29). Lowering the cutoff to ⱕ0.70 pro-
duced predictable tradeoffs: improved specificity (.92–.94) at
EI-7PSP values the expense of sensitivity (.21–.24). The more conserva-
EI-7PSP component 0 1 2 3 tive ⱕ0.65 cutoff produced excellent specificity (.98 –1.00), but
low sensitivity (.12–.24).
FTT number of failures 0 1 2 —
Base rate 96.2 .8 3.0 —
Finally, the effect of cumulative failures on independent
FAS T-scores ⬎33 32–33 28–31 ⱕ27 D-KEFS Stroop validity indicators was examined (see Table 8).
Base rate 87.9 5.3 2.3 4.5 Failing at least two of the six newly introduced embedded PVTs
Animals T-scores ⬎33 25–33 21–24 ⱕ20 produced good combinations of sensitivity (.61–.81) and specific-
Base rate 83.3 8.3 3.8 4.5 ity (.86 –.87) against the EI-11 and EI-7PSP, but fell short of the
TMT A ⫹ B raw scores ⬍137 137–221 222–255 ⱖ256
Base rate 84.1 10.6 2.3 3.0 minimum specificity standard against the RMT. Failing at least
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

CPT-II number of failures 0 1 2 ⱖ3 three indicators cleared the specificity threshold against all three
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Base rate 66.7 15.2 5.3 12.9 reference PVTs (.84 –.93), at the expense of sensitivity (.36 –.57).
WAIS-IV CD (ACSS) ⬎5 5 4 ⱕ3 Failing at least four indicators produced consistently high speci-
Base rate 90.2 1.5 5.3 3.0
WAIS-IV SS (ACSS) ⬎5 5 4 ⱕ3
ficity (.92–.99), with further loss in sensitivity (.29 –.52). Failing
Base rate 85.6 6.8 4.5 3.0 five indicators (the highest value observed) was associated with
perfect specificity, but low sensitivity (.10 –.14).
Note. EI-7PSP ⫽ “Erdodi Index Seven” based on processing speed mea-
sures; FTT Failures ⫽ Finger tapping test, number of scores at ⱕ35
Given the high base rate of psychiatric disorders in the sample,
(men)/28 (women) dominant hand and ⱕ66 (men)/58 (women) combined we examined the relationship between self-reported emotional
mean raw scores (Arnold et al., 2005; Axelrod, Meyers, & Davis, 2014); distress and PVT failure. There was no difference in BDI-II scores
FAS ⫽ Letter fluency T-score (Curtis et al., 2008; Sugarman & Axelrod, between patients who passed and those who failed the three
2015); Animals ⫽ Category fluency T-score (Sugarman & Axelrod, 2015); reference PVTs and five of the newly developed validity cutoffs
CPT-II Failures ⫽ Conners’ Continuous Performance Test, 2nd Edition;
number of T-scores ⬎ 70 on Omissions, Hit Reaction Time Standard Error, embedded in the D-KEFS Stroop. Trial 4 ACSS ⱕ6 was an
Variability and Perseverations (Erdodi, Pelletier, & Roth, 2016; Erdodi, isolated exception; failing this cutoff was associated with lower
Roth, et al., 2014; Lange et al., 2013; Ord, Boettcher, Greve, & Bianchini, levels of depression (d ⫽ .42). Similarly, no difference was found
2010); WAIS-IV CD (ACSS) ⫽ Age-corrected scaled score on the Coding in BAI scores between patients who passed and those who failed
subtest of the Wechsler Adult Intelligence Scale—Fourth Edition (Erdodi,
Abeare, et al., 2017; Etherton et al., 2006; N. Kim et al., 2010; Trueblood,
the three reference PVTs and four of the newly developed validity
1994); WAIS-IV SS (ACSS) ⫽ Age-corrected scaled score on the Symbol cutoffs embedded in the D-KEFS Stroop. The exceptions were
Search subtest of the Wechsler Adult Intelligence Scale—Fourth Edition Trial 2 ACSS ⱕ6 and (Trials 1 ⫹ 2)/(Trials 3 ⫹ 4) ACSS
(Erdodi, Abeare, et al., 2017; Etherton et al., 2006; Trueblood, 1994). ratio ⱕ0.75. In both cases, failing the cutoff was associated with
increased levels of anxiety (d: .41–.55).
measure of the absolute difficulty gradient, as examinees are To further explore the potential contribution of psychiatric
expected to take longer to finish Trial 4, given its added cognitive symptoms to PVT failure, we performed a series of t tests using
load of switching between two sets of rules. Indeed, on average, Pass/Fail status on the PVTs as independent variables, and the PAI
patients produced higher completion times on Trial 4 relative to clinical scales as dependent variables (see Table 9). All patients
Trial 3 (Trials 4/3 raw score ratio: M ⫽ 1.17, SD ⫽ 0.32, range: passed the validity cutoff on the Negative Impression Management
0.65–2.59). The distribution was bimodal, with the bulk of the scale. The Somatic Concerns scale was the only scale with signif-
sample forming a bell-shaped distribution around the mean and a icant contrasts. Effect sizes ranged from medium (d ⫽ .54) to large
small group of positive outliers. The latter index, a ratio of aggre- (d ⫽ .77). No difference emerged against the EI-7PSP and the
gated easy (Trials 1 and 2) versus hard (3 and 4) trials, is a derivative D-KEFS Stroop validity indices. Within the Somatic
measure of the relative difficulty gradient. Norm-referencing (i.e., Concerns scale, effect sizes were generally larger on the Conver-
age-correction) is expected to equalize the increase in task de- sion subscale (d: .50 –1.06), but again, contrasts involving the
mands from the first two to the last two trials. As expected, the
overall (Trials 1 ⫹ 2)/(Trials 3 ⫹ 4) ACSS ratio was close to 1.00:
M ⫽ 1.05, SD ⫽ 0.44, range: 0.33–3.50. Again, the distribution Table 4
was bimodal, with a bell-shaped majority around the mean and a Frequency Distribution of the EI-7PSP With Classification Ranges
small group of positive outliers.
A Trials 4/3 raw score ratio ⱕ0.90 cleared the minimum thresh- EI-7PSP f % %Cumulative Classification
old for specificity against all reference PVTs, but sensitivity was 0 61 46.2 46.2 Pass
low (.14 –.24). Lowering the cutoff to ⱕ0.85 resulted in predict- 1 21 15.9 62.1 Pass
able tradeoffs, with notable increase in specificity (.95–.97), but 2 12 9.1 71.2 Borderline
further loss in sensitivity (.10 –.14). Lowering the cutoff to ⱕ0.80 3 17 12.9 84.1 Borderline
4 5 3.8 87.9 Fail
produced negligible changes in classification accuracy. The 5 2 1.5 89.4 Fail
more conservative ⱕ0.75 cutoff produced excellent specificity 6 4 3.0 92.4 Fail
(.98 –1.00), but very low sensitivity (.07–.14). 7 3 2.3 94.7 Fail
A (Trials 1 ⫹ 2)/(Trials 3 ⫹ 4) ACSS ratio ⱕ0.80 failed to 8 0 .0 94.7 Fail
9 1 .8 95.5 Fail
achieve minimum specificity against any of the reference PVTs.
6 ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH

Table 5
Results of One-Way ANOVAs on RMT and EI-7PSP Scores Across EI-11Classification Ranges

EI-11
0 –1 2 ⱖ3
Outcome Descriptive n ⫽ 76 n ⫽ 18 n ⫽ 38 Significant
Measure Statistics PASS BOR FAIL F p ␩2 post hocs

RMTAccuracy M 47.4 44.5 42.2 12.1 ⬍.001 .16 PASS vs. BOR
SD 3.4 6.6 7.7 PASS vs. FAIL
RMTTime M 130.0 147.3 191.3 11.1 ⬍.001 .15 PASS vs. FAIL
SD 63.4 52.9 74.4 BOR vs. FAIL
EI-7PSP M 0.8 1.8 4.2 19.1 ⬍.001 .23 PASS vs. FAIL
SD 1.6 1.4 4.5 BOR vs. FAIL
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-11 ⫽ Effort
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Index Eleven; BOR ⫽ Borderline; RMTAccuracy ⫽ Recognition Memory Test–Words (Accuracy score);
This document is copyrighted by the American Psychological Association or one of its allied publishers.

RMTTime ⫽ Recognition Memory Test–Words (completion time in seconds); EI-7PSP ⫽ “Erdodi Index Seven”
based on processing speed measures; ANOVA ⫽ analysis of variance.

derivative D-KEFS Stroop validity indices failed to reach signif- versions of the Stroop task (Arentsen et al., 2013; Egeland & Lang-
icance. The Somatization subscale was only associated with failing fjaeran, 2007; Guise et al., 2014; Osimani et al., 1997).
the RMT (d ⫽ .54). Significant differences reemerged on the Labeling a score that is only 1 SD below the mean as invalid
Health Concerns subscale, with effect sizes ranging from medium may appear an extreme measure at first, as it implies that as many
(d ⫽ .52) to large (d ⫽ .84). Contrasts involving the EI-7PSP, as 16% of the original normative sample demonstrated invalid
D-KEFS Stroop Trial 2 and the derivative validity indices failed to performance. However, the practice is not without precedent.
reach significance. Shura, Miskey, Rowland, Yoash-Gantz, and Denning (2016) dem-
onstrated that an ACSS ⱕ7 (Low Average) on Letter-Number
Discussion Sequencing was a reliable indicator of noncredible responding.
This study explored the potential of D-KEFS Stroop to function as Moreover, Baker, Connery, Kirk, and Kirkwood (2014) found a
a PVT. A scaled score below 1 SD of the normative mean on any of recognition discriminability z-score of ⱕ⫺0.5 (Average) to be the
the four trials was a reliable indicator of psychometrically defined marker of invalid performance on the California Verbal Learning
invalid performance. Violating the difficulty gradient (i.e., scoring Test—Children’s Version.
better on difficult tasks than on easier tasks) was also reliably asso- This phenomenon of the noncredible range of performance
ciated with failure on reference PVTs. All six EVIs produced bimodal expanding into the traditional range of normal cognitive function-
distributions with a distinct cluster of outliers in the range of non- ing has been recently labeled as the “invalid-before-impaired’
credible impairment, indicating that valid and invalid performance paradox.” Erdodi and Lichtenstein (2017) recently argued that this
may start to diverge at the level of descriptive statistics. Overall, apparent psychometric anomaly has multiple possible explana-
results suggest that in addition to measuring basic processing speed tions, one of which is that few (if any) normative samples are
and executive function, the D-KEFS Stroop is also an effective PVT. screened for invalid performance. Therefore, noncredible respond-
This finding is consistent with earlier investigations using different ing contaminates the scaling process used to establish ACSSs. In

Table 6
Results of One-Way ANOVAs on RMT and EI-11Scores Across EI-7PSP Classification Ranges

EI-7PSP
0 –1 2–3 ⱖ4
Outcome Descriptive n ⫽ 82 n ⫽ 29 n ⫽ 21 Significant
Measure Statistics PASS BOR FAIL F p ␩2 post hocs

RMTAccuracy M 47.0 43.3 42.7 8.08 ⬍.001 .11 PASS vs. BOR
SD 4.1 7.2 7.7 PASS vs. FAIL
RMTTime M 123.2 189.9 207.9 21.5 ⬍.001 .25 PASS vs. BOR
SD 53.8 70.4 75.3 PASS vs. FAIL
EI-11 M 1.0 2.2 4.2 33.6 ⬍.001 .34 PASS vs. FAIL
SD 1.3 1.7 2.2 PASS vs. BOR
BOR vs. FAIL
Note. Post hoc pairwise contrasts were computed using the least significant difference method; EI-7PSP ⫽
“Erdodi Index Seven” based on processing speed measures; BOR ⫽ Borderline; RMTAccuracy ⫽ Recognition
Memory Test–Words (Accuracy score); RMTTime ⫽ Recognition Memory Test–Words (Completion time in
seconds); EI-11 ⫽ Effort Index Eleven; ANOVA ⫽ analysis of variance.
STROOP TEST 7

Table 7 mation about the credibility of the response set and therefore,
D-KEFS Stroop Age-Corrected Scaled Scores Across the Four should not be automatically discounted.
Trials for the Entire Sample (N ⫽ 132) Within the D-KEFS Stroop, derivative validity indicators be-
haved differently both relative to reference PVTs and to single-
D-KEFS Stroop Trials trial validity cutoffs. First, the derivative validity indicators had
Name Color naming Word reading Inhibition Inhibition/switching consistently lower BRFail, suggesting that pattern violations are
less common manifestations of noncredible responding than ab-
Number 1 2 3 4
M 8.6 9.2 9.0 9.1 normally slow completion time. This finding is congruent with
SD 3.4 3.5 3.8 3.5 earlier reports that an inverted or absent Stroop effect does not
Median 10 10 9.5 10 occur in credible examinees; therefore, it is highly specific to
Skew ⫺.57 ⫺.66 ⫺.53 ⫺.63 invalid performance (Osimani et al., 1997). As a direct conse-
Kurtosis ⫺.33 ⫺.33 ⫺.27 ⫺.12
Range 1–15 1–15 1–15 1–15 quence of this, derivative validity indicators were generally less
sensitive, which may also reflect the inconsistency in the literature
D-KEFS ⫽ Delis-Kaplan Executive Systems.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Note.
regarding the inverted Stroop effect as an index of performance
This document is copyrighted by the American Psychological Association or one of its allied publishers.

validity. While some studies found that noncredible performers


perform better on more difficult trials, this pattern of performance
turn, later research discovers that scores commonly interpreted as failed to demonstrate adequate classification accuracy (Arentsen et
within normal limits are in fact indicative of invalid performance. al., 2013; Egeland & Langfjaeran, 2007).
They concluded that EVI cutoffs that reach into the range of Despite the variability in sample characteristics, methodology,
functioning traditionally considered intact provide valuable infor- version of Stroop task, reference PVTs, and BRFail, our findings

Table 8
Classification Accuracy of Validity Indicators Embedded in the D-KEFS Stroop Task Against Reference PVTs

RMT (n ⫽ 132) EI-11 (n ⫽ 114) EI-7PSP (n ⫽ 103)


31.8 33.3 20.4
D-KEFS
Stroop Cutoff BRFail SENS SPEC SENS SPEC SENS SPEC

Trial 1 ⱕ7 34.1 .55 .76 .61 .82 .76 .88


ⱕ6 23.5 .43 .86 .53 .92 .71 .94
ⱕ5 21.2 .43 .89 .50 .93 .62 .94
ⱕ4 13.6 .29 .93 .37 .96 .43 .99
Trial 2 ⱕ7 26.5 .48 .83 .52 .87 .62 .88
ⱕ6 23.5 .45 .87 .45 .88 .62 .91
ⱕ5 17.4 .38 .92 .44 .95 .57 .96
ⱕ4 13.6 .33 .96 .39 .98 .48 .99
Trial 3 ⱕ7 31.1 .45 .76 .55 .88 .71 .85
ⱕ6 22.0 .31 .82 .42 .91 .62 .93
ⱕ5 17.4 .26 .87 .37 .93 .62 .99
ⱕ4 12.1 .26 .94 .32 .97 .52 1.00
Trial 4 ⱕ7 26.5 .33 .77 .47 .83 .67 .88
ⱕ6 19.7 .29 .84 .34 .87 .48 .91
ⱕ5 16.7 .24 .88 .29 .90 .48 .95
ⱕ4 12.1 .21 .92 .21 .92 .33 .96
Trials 4/3 ⱕ.90 14.3 .14 .86 .21 .90 .24 .85
Raw score ⱕ.85 6.8 .10 .96 .13 .97 .14 .95
ⱕ.80 5.3 .07 .97 .11 .99 .14 .98
ⱕ.75 3.8 .07 .98 .11 .99 .14 .99
Trials ⱕ.80 22.3 .29 .80 .32 .82 .29 .81
(1 ⫹ 2)/(3 ⫹ 4) ⱕ.75 16.7 .26 .88 .27 .88 .29 .89
ACSS ⱕ.70 11.4 .21 .93 .18 .92 .24 .94
ⱕ.65 7.6 .14 .96 .16 .96 .24 .99
Cumulative ⱖ2 31.1 .50 .78 .61 .86 .81 .87
Failures ⱖ3 22.0 .36 .84 .45 .92 .57 .93
ⱖ4 14.4 .29 .92 .32 .93 .52 .99
ⱖ5 3.0 .10 1.00 .11 1.00 .14 1.00
Note. D-KEFS ⫽ Delis-Kaplan Executive Function System; PVT ⫽ performance validity tests; RMT ⫽ Warrington Recognition Memory Test–Words
[Pass: accuracy score ⬎ 43 and time-to-completion ⬍ 192”; Fail: accuracy score ⱕ43 or time-to-completion ⱖ192” (Erdodi, Kirsch, et al., 2014; Erdodi,
Tyson, et al., 2017; M. S. Kim et al., 2010)]; EI-11 ⫽ Effort Index Eleven [Pass ⱕ1; Fail ⱖ3 (Erdodi & Roth, 2017; Erdodi, Tyson, et al., 2017)]; EI-7PSP ⫽
“Erdodi Index Seven” based on processing speed measures [Pass ⱕ1; Fail ⱖ4 (Erdodi, Roth, et al., 2014; Erdodi, Tyson, et al., 2016, 2017)]; BRFail ⫽
Base rate of failure (percentage); SENS ⫽ Sensitivity; SPEC ⫽ Specificity; Trial 1 ⫽ Color Naming age-corrected scaled score (ACSS); Trial 2 ⫽ Word
Reading ACSS; Trial 3 ⫽ Inhibition ACSS (classic Stroop task); Trial 4 ⫽ Inhibition/Switching ACSS; Cumulative Failures ⫽ Number of validity indices
failed (Trials 1– 4 ACSS ⱕ6; Trials 3/4 raw score ratio ⱕ.90; Trials (1 ⫹ 2)/(3 ⫹ 4) ACSS ratio ⱕ.75).
8 ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH

Table 9
Results of Independent T-Tests Comparing Scores on PAI Somatization Scales as a Function of Passing or Failing PVTs

D-KEFS Stroop trials


4 共1⫹2兲
PAI scale PVT-outcome RMT EI-11 EI-7PSP 1 2 3 4 共3⫹4兲
3
n 58 49 47 58 58 58 58 58 58
SOM Pass
M 59.1 58.4 59.2 59.3 60.0 59.4 60.3 61.2 61.0
SD 11.1 11.8 11.8 11.7 11.8 11.7 12.0 13.0 12.5
Fail
M 67.4 65.9 65.2 70.0 69.9 69.2 70.3 61.9 63.3
SD 16.1 15.7 18.5 15.7 17.8 15.9 19.3 13.9 17.2
p ⬍.05 ⬍.05 .11 ⬍.01 ⬍.05 ⬍.05 ⬍.05 .44 .33
d .60 .54 — .77 .66 .70 .62 — —
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

SOMCONV Pass
This document is copyrighted by the American Psychological Association or one of its allied publishers.

M 56.4 52.7 56.1 55.6 56.4 55.7 57.3 58.2 57.2


SD 11.1 9.2 12.2 11.2 12.2 11.6 12.7 14.2 13.0
Fail
M 64.4 65.4 64.3 71.6 72.1 71.0 70.2 60.6 65.5
SD 19.5 17.8 19.0 18.1 18.8 17.9 22.1 14.6 20.1
p ⬍.05 ⬍.05 ⬍.05 ⬍.01 ⬍.01 ⬍.01 ⬍.05 .31 .07
d .50 .90 .51 1.06 .99 1.01 .72 — —
SOMSOM Pass
M 57.6 59.9 58.5 58.7 58.9 58.5 59.2 59.9 60.0
SD 14.2 14.6 13.6 14.0 13.9 14.1 13.8 15.0 14.1
Fail
M 65.2 60.6 60.2 64.1 64.6 64.9 63.8 59.0 57.6
SD 13.9 15.2 19.0 16.3 17.7 15.4 20.6 12.6 17.4
p ⬍.05 .43 .37 .14 .15 .10 .23 .44 .33
D .54 — — — — — — — —
SOMH-CON Pass
M 59.2 58.9 59.1 59.7 60.3 59.9 60.1 61.0 60.9
SD 9.6 10.8 10.1 10.4 10.3 10.3 10.7 10.8 10.8
Fail
M 65.8 65.4 65.0 66.9 65.6 66.1 69.3 61.3 61.9
SD 13.3 11.9 14.0 12.4 15.2 13.2 11.1 12.6 13.0
p ⬍.05 ⬍.05 .07 ⬍.05 .11 ⬍.05 ⬍.05 .47 .41
d .57 .57 — .63 — .52 .84 — —
Note. D-KEFS ⫽ Delis-Kaplan Executive Function System; PVT ⫽ performance validity tests; PAI ⫽ Personality Assessment Inventory; SOM ⫽
Somatic Concerns scale; SOMCONV ⫽ Conversion subscale; SOMSOM ⫽ Somatization subscale; SOMH-CON ⫽ Health Concerns subscale; RMT ⫽
Warrington Recognition Memory Test–Words [Pass: accuracy score ⬎ 43 and time-to-completion ⬍ 192”; Fail: accuracy score ⱕ43 or time-to-
completion ⱖ192” (Erdodi, Kirsch, et al., 2014; Erdodi, Tyson, et al., 2017; M. S. Kim et al., 2010)]; EI-11 ⫽ Effort Index Eleven [Pass ⱕ1; Fail ⱖ3
(Erdodi & Roth, 2017; Erdodi, Tyson, et al., 2017)]; EI-7PSP ⫽ “Erdodi Index Seven” based on processing speed measures [Pass ⱕ1; Fail ⱖ4 (Erdodi,
Kirsch, et al., 2014; Erdodi, Tyson, et al., 2017; Erdodi, Abeare, et al., 2017)]; BRFail ⫽ Base rate of failure (percentage); SENS ⫽ Sensitivity; SPEC ⫽
Specificity; Trial 1 ⫽ Color Naming age-corrected scaled score (cutoff for failure ⱕ6); Trial 2 ⫽ Word Reading age-corrected scaled score (cutoff for
failure ⱕ6); Trial 3 ⫽ Inhibition (classic Stroop task) age-corrected scaled score (cutoff for failure ⱕ6); Trial 4 ⫽ Inhibition/Switching age-corrected scaled
score (cutoff for failure ⱕ6); Trial 4/3 ⫽ The ratio of Trial 4 and Trial 3 raw scores (cutoff for failure ⱕ.90); Trials (1 ⫹ 2)/(3 ⫹ 4) ⫽ The ratio of the
sum of Trials 1 and 2 over the sum of Trials 3 and 4 age-corrected scaled scores (cutoff for failure ⱕ.75)

are broadly consistent with the extant literature in that the inverted An emergent finding of cross-validation analyses is the
Stroop effect is more common in noncredible examinees, but has modality-specificity of classification accuracy (Leighton et al.,
limited discriminant power. Arentsen and colleagues (2013) note 2014). Of the three reference PVTs, one was a traditional stand-
that the interference trial may be associated with poor specificity alone measure based on the forced choice recognition paradigm
because while most PVTs are designed to appear difficult, but are (RMT), one was a composite measure based on the number of
in fact easy, the opposite is true for the Stroop task: the interfer- independent PVT failures (EI-11), and one was a composite of
ence trial is actually difficult for most individuals. validity indicators specifically selected to match the target con-
As such, the inverted Stroop effect as an EVI follows the reverse structs in the Stroop task (EI-7PSP). The four base trials of the
logic compared with classic stand-alone PVTs, as performing well D-KEFS Stroop and the (Trials 1 ⫹ 2)/(Trials 3 ⫹ 4) ACSS ratio
on a difficult task is meant to expose noncredible responding rather produced the best overall classification accuracy against the EI-
than performing poorly on an easy task. Although the inverted 7PSP. The Trials 4/3 raw score ratio was a marginal exception,
Stroop effect seems less effective at separating valid and invalid reiterating the divergence in the psychometric properties of the
response sets, it appears to tap different manifestations of noncred- derivative validity indices. Nevertheless, all newly introduced
ible performance. Therefore, it may provide valuable nonredun- D-KEFS Stroop based validity cutoffs had the highest sensitivity
dant information for the multivariate model of validity assessment against the EI-7PSP. In several cases, sensitivity values were dou-
(Boone, 2013; Larrabee, 2003). ble than that observed against the RMT.
STROOP TEST 9

These findings resonate with earlier studies (Erdodi, Abeare, et Further, as indeterminate cases were excluded from the analyses
al., 2017; Erdodi, Tyson, et al., 2017; Lichtenstein et al., 2017), to maximize the diagnostic purity of the criterion groups, this
and serve as a reminder that the choice of criterion measure can practice may have inflated classification accuracy estimates. More-
influence the perceived utility of the test being evaluated. In over, the time or sequence of administration was not available for
addition, they illustrate the importance of the methodological the D-KEFS Stroop, even though these factors have been raised as
pluralism in cross-validating PVTs (Boone, 2013; Larrabee, 2014) potential confounds in the clinical interpretation of cognitive tests
at group level, and determining the veracity of an individual in general (Erdodi & Lajiness-O’Neill, 2014), and of PVT failures
response set (Larrabee, 2003, 2008; Vallabhajosula, & van Gorp, specifically (Bigler, 2015). Finally, in the absence of data on
2001), as it can protect against instrumentation artifacts. Knowing litigation status, the criterion groups (Valid/Invalid) were psycho-
that a new cutoff performs well against several different reference metrically defined. Given that external incentive to appear im-
PVTs increases confidence in the reliability of its signal detection paired has been previously suggested as a relevant diagnostic
performance (Erdodi & Roth, 2017). criterion for noncredible neurocognitive performance (Slick, Sher-
Combining the newly developed EVIs within the D-KEFS man, & Iverson, 1999), the newly introduced cutoffs would benefit
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Stroop improved overall classification accuracy. Cutoffs based on from cross-validation using known-group designs that incorporate
This document is copyrighted by the American Psychological Association or one of its allied publishers.

cumulative failures produced superior signal detection profiles incentive status. As always, future research using different sam-
relative to individual EVIs at comparable BRFail, consistent with ples, diagnostic categories, and reference PVTs are needed to
previous research (Larrabee, 2003, 2008). Even though the internal establish the generalizability of these findings.
logic behind the practice of aggregating multiple validity indica-
tors prioritizes sensitivity over specificity (Proto et al., 2014), at
the appropriate cutoffs, multivariate models actually reduce false References
positive rates (Davis & Millis, 2014; Larrabee, 2014). Arentsen, T. J., Boone, K. B., Lo, T. T., Goldberg, H. E., Cottingham,
Passing or failing the newly developed validity cutoffs within M. E., Victor, T. L., . . . Zeller, M. A. (2013). Effectiveness of the
the D-KEFS Stroop was largely unrelated to depression and anx- Comalli Stroop Test as a measure of negative response bias. The Clinical
iety, consistent with previous reports investigating the relationship Neuropsychologist, 27, 1060 –1076. http://dx.doi.org/10.1080/13854046
between depression and PVT failure (Considine et al., 2011; Rees, .2013.803603
Tombaugh, & Boulay, 2001). However, patients who failed the Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., & McPher-
reference PVTs and the newly introduced validity cutoffs in Trials son, S. (2005). Sensitivity and specificity of finger tapping test scores for
1– 4 of the D-KEFS Stroop reported higher levels of somatization the detection of suspect effort. The Clinical Neuropsychologist, 19,
on the PAI, even though no systematic differences were observed 105–120. http://dx.doi.org/10.1080/13854040490888567
on any of the other clinical scales. This finding is consistent with Axelrod, B. N., Fichtenberg, N. L., Millis, S. R., & Wertheimer, J. C.
(2006). Detecting incomplete effort with Digit Span from the Wechsler
previous reports on the relationship between the somatization scale
Adult Intelligence Scale-Third Edition. The Clinical Neuropsychologist,
of the PAI and PVT failures (Whiteside et al., 2010). 20, 513–523. http://dx.doi.org/10.1080/13854040590967117
In this study, we introduced a range of validity cutoffs for each Axelrod, B. N., Meyers, J. E., & Davis, J. J. (2014). Finger Tapping Test
of the four base trials of the D-KEFS Stroop, as well as two performance as a measure of performance validity. The Clinical Neuro-
derivative validity indices, recognizing the need for flexible, psychologist, 28, 876 – 888. http://dx.doi.org/10.1080/13854046.2014
population-specific cutoff scores (Bigler, 2015). To our knowl- .907583
edge, this is the first attempt to develop EVIs within the D-KEFS Baker, D. A., Connery, A. K., Kirk, J. W., & Kirkwood, M. W. (2014).
version of the Stroop task. In addition, we examined the relation- Embedded performance validity indicators within the California Verbal
ship between PVT failures and self-reported psychiatric symp- Learning Test, Children’s Version. The Clinical Neuropsychologist, 28,
toms. The signal detection profiles of the new validity indicators 116 –127. http://dx.doi.org/10.1080/13854046.2013.858184
across the engineered differences among the reference PVTs pro- Bauer, L., Yantz, C. L., Ryan, L. M., Warden, D. L., & McCaffrey, R. J.
(2005). An examination of the California Verbal Learning Test II to
vided an opportunity to reflect on the instrumentation artifacts as
detect incomplete effort in a traumatic brain-injury sample. Applied
potential confounds in the cross-validation methodology used to Neuropsychology, 12, 202–207. http://dx.doi.org/10.1207/s15324826
calibrate new validity indices. an1204_3
The results of the study should be interpreted in the context of Bigler, E. D. (2012). Symptom validity testing, effort, and neuropsycho-
its limitations. The sample was geographically restricted and un- logical assessment. Journal of the International Neuropsychological
usually high functioning for a clinical setting. However, the overall Society, 18, 632– 640. http://dx.doi.org/10.1017/S1355617712000252
intellectual functioning in our sample was comparable with previ- Bigler, E. D. (2014). Effort, symptom validity testing, performance validity
ous research involving patients with neurological disorders from testing and traumatic brain injury. Brain Injury, 28, 1623–1638. http://
the Northeastern United States (Blonder, Gur, Gur, Saykin, & dx.doi.org/10.3109/02699052.2014.947627
Hurtig, 1989; Erdodi, Pelletier, & Roth, 2016; Saykin et al., 1995). Bigler, E. D. (2015). Neuroimaging as a biomarker in symptom validity
In addition, the sample was diagnostically heterogeneous. There- and performance validity testing. Brain Imaging and Behavior, 9, 421–
444. http://dx.doi.org/10.1007/s11682-015-9409-1
fore, it is unclear if the newly introduced cutoffs will perform
Blaskewitz, N., Merten, T., & Brockhaus, R. (2009). Detection of subop-
similarly across patients with different neuropsychiatric condi-
timal effort with the Rey Complex Figure Test and recognition trial.
tions. Until replicated in different clinical populations, these cut- Applied Neuropsychology, 16, 54 – 61. http://dx.doi.org/10.1080/
offs should only be applied to patients with clinical characteristics 09084280802644227
that are similar to the present sample, as they may be associated Blonder, L. X., Gur, R. E., Gur, R. C., Saykin, A. J., & Hurtig, H. I. (1989).
with unacceptably high false positive error rates in examinees with Neuropsychological functioning in hemiparkinsonism. Brain and Cog-
severe neurological conditions. nition, 9, 244 –257. http://dx.doi.org/10.1016/0278-2626(89)90034-1
10 ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH

Boone, K. B. (2009). The need for continuous and comprehensive sam- Erdodi, L. A., & Lajiness-O’Neill, R. (2014). Time-related changes in
pling of effort/response bias during neuropsychological examinations. Conners’ CPT-II scores: A replication study. Applied Neuropsychology
The Clinical Neuropsychologist, 23, 729 –741. http://dx.doi.org/10.1080/ Adult, 21, 43–50. http://dx.doi.org/10.1080/09084282.2012.724036
13854040802427803 Erdodi, L. A., & Lichtenstein, J. D. (2017). Invalid before impaired: An
Boone, K. B. (2013). Clinical Practice of Forensic Neuropsychology. New emerging paradox of embedded validity indicators. The Clinical Neuro-
York, NY: Guilford Press. psychologist. Advance online publication. http://dx.doi.org/10.1080/
Boone, K. B., Salazar, X., Lu, P., Warner-Chacon, K., & Razani, J. (2002). 13854046.2017.1323119
The Rey 15-item recognition trial: A technique to enhance sensitivity of Erdodi, L. A., Pelletier, C. L., & Roth, R. M. (2016). Elevations on select
the Rey 15-item memorization test. Journal of Clinical and Experimen- Conners’ CPT-II scales indicate noncredible responding in adults with
tal Neuropsychology, 24, 561–573. http://dx.doi.org/10.1076/jcen.24.5 traumatic brain injury. Applied Neuropsychology: Adult, 22, 851– 858.
.561.1004 Erdodi, L., & Roth, R. (2017). Low scores on BDAE Complex Ideational
Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler, E., Material are associated with invalid performance in adults without
Cottingham, M. E., . . . Zeller, M. A. (2010). Examination of various aphasia. Applied Neuropsychology: Adult, 24, 264 –274. http://dx.doi
WMS-III logical memory scores in the assessment of response bias. The .org/10.1080/23279095.2016.1154856
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Clinical Neuropsychologist, 24, 344 –357. http://dx.doi.org/10.1080/ Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-O’neill, R., & Medoff,
This document is copyrighted by the American Psychological Association or one of its allied publishers.

13854040903307268 B. (2014). Aggregating validity indicators embedded in Conners’ CPT-II


Bush, S. S., Heilbronner, R. L., & Ruff, R. (2014). Psychological assess- outperforms individual cutoffs at separating valid from invalid perfor-
ment of symptom and performance validity, response bias, and malin- mance in adults with traumatic brain injury. Archives of Clinical Neu-
gering: Official position of the Association of Psychological Advance- ropsychology, 29, 456 – 466. http://dx.doi.org/10.1093/arclin/acu026
ment in Psychological Injury and Law. Psychological Injury and Law, 7, Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D., Pelletier,
197–205. http://dx.doi.org/10.1007/s12207-014-9198-7 C. L., Rai, J. K., & Roth, R. M. (2016). The BDAE Complex Ideational
Chafetz, M. D., Williams, M. A., Ben-Porath, Y. S., Bianchini, K. J., Material—A measure of receptive language or performance validity?
Boone, K. B., Kirkwood, M. W., . . . Ord, J. S. (2015). Official position Psychological Injury and Law, 9, 112–120. http://dx.doi.org/10.1007/
of the American Academy of Clinical Neuropsychology Social Security s12207-016-9254-6
Administration policy on validity testing: Guidance and recommenda- Erdodi, L. A., Tyson, B. T., Shahein, A. G., Lichtenstein, J. D., Abeare,
C. A., Pelletier, C. L., . . . Roth, R. M. (2017). The power of timing:
tions for change. The Clinical Neuropsychologist, 29, 723–740. http://
Adding a time-to-completion cutoff to the Word Choice Test and Rec-
dx.doi.org/10.1080/13854046.2015.1099738
ognition Memory Test improves classification accuracy. Journal of
Comalli, P. E., Jr., Wapner, S., & Werner, H. (1962). Interference effects
Clinical and Experimental Neuropsychology, 39, 369 –383. http://dx.doi
of Stroop color-word test in childhood, adulthood, and aging. The
.org/10.1080/13803395.2016.1230181
Journal of Genetic Psychology, 100, 47–53. http://dx.doi.org/10.1080/
Etherton, J. L., Bianchini, K. J., Heinly, M. T., & Greve, K. W. (2006).
00221325.1962.10533572
Pain, malingering, and performance on the WAIS-III Processing Speed
Considine, C. M., Weisenbach, S. L., Walker, S. J., McFadden, E. M.,
Index. Journal of Clinical and Experimental Neuropsychology, 28,
Franti, L. M., Bieliauskas, L. A., . . . Langenecker, S. A. (2011).
1218 –1237. http://dx.doi.org/10.1080/13803390500346595
Auditory memory decrements, without dissimulation, among patients
Golden, C., & Freshwater, S. (2002). A Manual for the Adult Stroop Color
with major depressive disorder. Archives of Clinical Neuropsychology,
and Word Test. Chicago, IL: Stoelting.
26, 445– 453. http://dx.doi.org/10.1093/arclin/acr041
Green, P. (2013). Spoiled for choice: Making comparisons between forced-
Curtis, K. L., Thompson, L. K., Greve, K. W., & Bianchini, K. J. (2008).
choice effort tests. In K. B. Boone (Ed.), Clinical practice of forensic
Verbal fluency indicators of malingering in traumatic brain injury:
neuropsychology. New York, NY: Guilford Press.
Classification accuracy in known groups. The Clinical Neuropsycholo- Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of
gist, 22, 930 –945. http://dx.doi.org/10.1080/13854040701563591 malingered amnesia measures with a large clinical sample. Psycholog-
Davis, J. J., & Millis, S. R. (2014). Examination of performance validity ical Assessment, 6, 218 –224. http://dx.doi.org/10.1037/1040-3590.6.3
test failure in relation to number of tests administered. The Clinical .218
Neuropsychologist, 28, 199 –214. http://dx.doi.org/10.1080/13854046 Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut-offs on
.2014.884633 psychometric indicators of negative response bias: A methodological
Delis, D. C., Kaplan, E., & Kramer, J. H. (2001). Delis-Kaplan executive commentary with recommendations. Archives of Clinical Neuropsychol-
function system (D-KEFS). San Antonio, TX: Psychological Corpora- ogy, 19, 533–541. http://dx.doi.org/10.1016/j.acn.2003.08.002
tion. Greve, K. W., Bianchini, K. J., Mathias, C. W., Houston, R. J., & Crouch,
Egeland, J., & Langfjaeran, T. (2007). Differentiating malingering from J. A. (2002). Detecting malingered performance with the Wisconsin card
genuine cognitive dysfunction using the Trail Making Test-ratio and sorting test: A preliminary investigation in traumatic brain injury. The
Stroop Interference scores. Applied Neuropsychology, 14, 113–119. Clinical Neuropsychologist, 16, 179 –191. http://dx.doi.org/10.1076/clin
http://dx.doi.org/10.1080/09084280701319953 .16.2.179.13241
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T., Kucharski, Greve, K. W., Ord, J. S., Bianchini, K. J., & Curtis, K. L. (2009).
B., Zuccato, B. G., & Roth, R. M. (2017). Wechsler Adult Intelligence Prevalence of malingering in patients with chronic pain referred for
Scale-Fourth Edition (WAIS-IV) processing speed scores as measures of psychologic evaluation in a medico-legal context. Archives of Physical
noncredible responding: The third generation of embedded performance Medicine and Rehabilitation, 90, 1117–1126. http://dx.doi.org/10.1016/
validity indicators. Psychological Assessment, 29, 148 –157. http://dx j.apmr.2009.01.018
.doi.org/10.1037/pas0000319 Grimes, D. A., & Schulz, K. F. (2005). Refining clinical diagnosis with
Erdodi, L. A., Kirsch, N. L., Lajiness-O’Neill, R., Vingilis, E., & Medoff, likelihood ratios. The Lancet, 365, 1500 –1505. http://dx.doi.org/10
B. (2014). Comparing the Recognition Memory Test and the Word .1016/S0140-6736(05)66422-7
Choice Test in a mixed clinical sample: Are they equivalent? Psycho- Guise, B. J., Thompson, M. D., Greve, K. W., Bianchini, K. J., & West, L.
logical Injury and Law, 7, 255–263. http://dx.doi.org/10.1007/s12207- (2014). Assessment of performance validity in the Stroop Color and
014-9197-8 Word Test in mild traumatic brain injury patients: A criterion-groups
STROOP TEST 11

validation design. Journal of Neuropsychology, 8, 20 –33. http://dx.doi Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2017). Introducing a
.org/10.1111/jnp.12002 forced-choice recognition task to the California Verbal Learning Test—
Hayward, L., Hall, W., Hunt, M., & Zubrick, S. R. (1987). Can localised Children’s Version. Child Neuropsychology, 23, 284 –299.
brain impairment be simulated on neuropsychological test profiles? Lippa, S. M., & Davis, R. N. (2010). Inhibition/switching is not necessarily
Australian and New Zealand Journal of Psychiatry, 21, 87–93. http:// harder than inhibition: An analysis of the D-KEFS color-word interfer-
dx.doi.org/10.3109/00048678709160904 ence test. Archives of Clinical Neuropsychology, 25, 146 –152. http://dx
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised .doi.org/10.1093/arclin/acq001
comprehensive norms for an expanded Halstead-Reitan battery: Demo- Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003). Effectiveness
graphically adjusted neuropsychological norms for African American of the Rey-Osterrieth Complex Figure Test and the Meyers and Meyers
and Caucasian adults. Lutz, FL: Psychological Assessment Resources. recognition trial in the detection of suspect effort. The Clinical Neuro-
Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis, psychologist, 17, 426 – 440. http://dx.doi.org/10.1076/clin.17.3.426
S. R., & the Conference Participants. (2009). American Academy of .18083
Clinical Neuropsychology Consensus Conference Statement on the neu- MacLeod, C. M., & MacDonald, P. A. (2000). Interdimensional interfer-
ropsychological assessment of effort, response bias, and malingering. ence in the Stroop effect: Uncovering the cognitive and neural anatomy
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

The Clinical Neuropsychologist, 23, 1093–1129. http://dx.doi.org/10 of attention. Trends in Cognitive Sciences, 4, 383–391. http://dx.doi.org/
This document is copyrighted by the American Psychological Association or one of its allied publishers.

.1080/13854040903155063 10.1016/S1364-6613(00)01530-8
Heinly, M. T., Greve, K. W., Bianchini, K. J., Love, J. M., & Brennan, A. Ord, J. S., Boettcher, A. C., Greve, K. J., & Bianchini, K. J. (2010).
(2005). WAIS digit span-based indicators of malingered neurocognitive Detection of malingering in mild traumatic brain injury with the Con-
dysfunction: Classification accuracy in traumatic brain injury. Assess- ners’ Continuous Performance Test-II. Journal of Clinical and Experi-
ment, 12, 429 – 444. http://dx.doi.org/10.1177/1073191105281099 mental Neuropsychology, 32(4), 380 –387.
Kim, M. S., Boone, K. B., Victor, T., Marion, S. D., Amano, S., Cotting- Osimani, A., Alon, A., Berger, A., & Abarbanel, J. M. (1997). Use of the
ham, M. E., . . . Zeller, M. A. (2010). The Warrington Recognition Stroop phenomenon as a diagnostic tool for malingering. Journal of
Memory Test for words as a measure of response bias: Total score and Neurology, Neurosurgery & Psychiatry, 62, 617– 621. http://dx.doi.org/
response time cutoffs developed on “real world” credible and noncred- 10.1136/jnnp.62.6.617
ible subjects. Archives of Clinical Neuropsychology, 25, 60 –70. http:// Pearson, N. C. S. (2009). Advanced clinical solutions for WAIS-IV and
WMS-IV: Administration and scoring manual. San Antonio, TX: The
dx.doi.org/10.1093/arclin/acp088
Psychological Corporation.
Kim, N., Boone, K. B., Victor, T., Lu, P., Keatinge, C., & Mitchell, C.
Proto, D. A., Pastorek, N. J., Miller, B. I., Romesser, J. M., Sim, A. H., &
(2010). Sensitivity and specificity of a digit symbol recognition trial in
Linck, J. F. (2014). The dangers of failing one or more performance
the identification of response bias. Archives of Clinical Neuropsychol-
validity tests in individuals claiming mild traumatic brain injury-related
ogy, 25, 420 – 428. http://dx.doi.org/10.1093/arclin/acq040
postconcussive symptoms. Archives of Clinical Neuropsychology, 29,
Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T., Pancholi, S.,
614 – 624. http://dx.doi.org/10.1093/arclin/acu044
Bhagwat, A., & French, L. M. (2013). Clinical utility of the Conners’
Reedy, S. D., Boone, K. B., Cottingham, M. E., Glaser, D. F., Lu, P. H.,
Continuous Performance Test-II to detect poor effort in U.S. military
Victor, T. L., . . . Wright, M. J. (2013). Cross validation of the Lu and
personnel following traumatic brain injury. Psychological Assessment,
colleagues (2003). Rey-Osterrieth Complex Figure Test effort equation
25, 339 –352. http://dx.doi.org/10.1037/a0030915
in a large known-group sample. Archives of Clinical Neuropsychology,
Lansbergen, M. M., Kenemans, J. L., & van Engeland, H. (2007). Stroop
28, 30 –37. http://dx.doi.org/10.1093/arclin/acs106
interference and attention-deficit/hyperactivity disorder: A review and
Rees, L. M., Tombaugh, T. N., & Boulay, L. (2001). Depression and the
meta-analysis. Neuropsychology, 21, 251–262. http://dx.doi.org/10
test of memory malingering. Archives of Clinical Neuropsychology, 16,
.1037/0894-4105.21.2.251 501–506. http://dx.doi.org/10.1093/arclin/16.5.501
Larrabee, G. J. (2003). Detection of malingering using atypical perfor- Saykin, A. J., Stafiniak, P., Robinson, L. J., Flannery, K. A., Gur, R. C.,
mance patterns on standard neuropsychological tests. The Clinical Neu- O’Connor, M. J., & Sperling, M. R. (1995). Language before and after
ropsychologist, 17, 410 – 425. http://dx.doi.org/10.1076/clin.17.3.410 temporal lobectomy: Specificity of acute changes and relation to early
.18089 risk factors. Epilepsia, 36, 1071–1077. http://dx.doi.org/10.1111/j.1528-
Larrabee, G. J. (2008). Aggregation across multiple indicators improves 1157.1995.tb00464.x
the detection of malingering: Relationship to likelihood ratios. The Schroeter, M. L., Ettrich, B., Schwier, C., Scheid, R., Guthke, T., & von
Clinical Neuropsychologist, 22, 666 – 679. http://dx.doi.org/10.1080/ Cramon, D. Y. (2007). Diffuse axonal injury due to traumatic brain
13854040701494987 injury alters inhibition of imitative response tendencies. Neuropsycho-
Larrabee, G. J. (2014). False-positive rates associated with the use of logia, 45, 3149 –3156. http://dx.doi.org/10.1016/j.neuropsychologia
multiple performance and symptom validity tests. Archives of Clinical .2007.07.004
Neuropsychology, 29, 364 –373. http://dx.doi.org/10.1093/arclin/acu019 Schutte, C., Axelrod, B. N., & Montoya, E. (2015). Making sure neuro-
Larson, M. J., Kaufman, D. A., Schmalfuss, I. M., & Perlstein, W. M. psychological data are meaningful: Use of performance validity testing
(2007). Performance monitoring, error processing, and evaluative con- in medicolegal and clinical contexts. Psychological Injury and Law, 8,
trol following severe TBI. Journal of the International Neuropsycholog- 100 –105. http://dx.doi.org/10.1007/s12207-015-9225-3
ical Society, 13, 961–971. http://dx.doi.org/10.1017/S135561770 Shura, R. D., Miskey, H. M., Rowland, J. A., Yoash-Gantz, R. E., &
7071305 Denning, J. H. (2016). Embedded performance validity measures with
Leighton, A., Weinborn, M., & Maybery, M. (2014). Bridging the gap postdeployment veterans: Cross-validation and efficiency with multiple
between neurocognitive processing theory and performance validity measures. Applied Neuropsychology: Adult, 23, 94 –104. http://dx.doi
assessment among the cognitively impaired: A review and methodolog- .org/10.1080/23279095.2015.1014556
ical approach. Journal of the International Neuropsychological Society, Slick, D. J., Sherman, E. M., & Iverson, G. L. (1999). Diagnostic criteria
20, 873– 886. http://dx.doi.org/10.1017/S135561771400085X for malingered neurocognitive dysfunction: Proposed standards for clin-
Lezak, M. D. (1995). Neuropsychological assessment. New York, NY: ical practice and research. Clinical Neuropsychologist, 13, 545–561.
Oxford University Press. http://dx.doi.org/10.1076/1385-4046(199911)13:04;1-Y;FT545
12 ERDODI, SAGAR, SEKE, ZUCCATO, SCHWARTZ, AND ROTH

Spencer, R. J., Axelrod, B. N., Drag, L. L., Waldron-Perrine, B., Pangili- Clinical and Experimental Neuropsychology, 16, 597– 607. http://dx.doi
nan, P. H., & Bieliauskas, L. A. (2013). WAIS-IV reliable digit span is .org/10.1080/01688639408402671
no more accurate than age corrected scaled score as an indicator of Vallabhajosula, B., & van Gorp, W. G. (2001). Post-Daubert admissibility
invalid performance in a veteran sample undergoing evaluation for of scientific evidence on malingering of cognitive deficits. Journal of the
mTBI. The Clinical Neuropsychologist, 27, 1362–1372. http://dx.doi American Academy of Psychiatry and the Law, 29, 207–215.
.org/10.1080/13854046.2013.845248 Whiteside, D., Clinton, C., Diamonti, C., Stroemel, J., White, C., Zimber-
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. off, A., & Waters, D. (2010). Relationship between suboptimal cognitive
Journal of Experimental Psychology, 18, 643– 662. http://dx.doi.org/10 effort and the clinical scales of the Personality Assessment Inventory.
.1037/h0054651 The Clinical Neuropsychologist, 24, 315–325. http://dx.doi.org/10.1080/
Sugarman, M. A., & Axelrod, B. N. (2015). Embedded measures of 13854040903482822
performance validity using verbal fluency tests in a clinical sample. Wolfe, P. L., Millis, S. R., Hanks, R., Fichtenberg, N., Larrabee, G. J., &
Applied Neuropsychology Adult, 22, 141–146. http://dx.doi.org/10.1080/ Sweet, J. J. (2010). Effort indicators within the California verbal learn-
23279095.2013.873439 ing test-II (CVLT-II). The Clinical Neuropsychologist, 24, 153–168.
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin Card Sorting Test http://dx.doi.org/10.1080/13854040903107791
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

in the detection of malingering in student simulator and patient samples.


Journal of Clinical and Experimental Neuropsychology, 21, 701–708.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

http://dx.doi.org/10.1076/jcen.21.5.701.868 Received October 5, 2016


Trueblood, W. (1994). Qualitative and quantitative characteristics of ma- Revision received July 11, 2017
lingered and other invalid WAIS-R and clinical memory data. Journal of Accepted July 17, 2017 䡲

View publication stats

You might also like