Wonderlic Test Critique As A Measure of Success For NFL Players
Wonderlic Test Critique As A Measure of Success For NFL Players
Wonderlic Test Critique As A Measure of Success For NFL Players
Stephanie A. Chadwick
University of Denver
Wonderlic Critique 2
assessments of physical skills including reaction time, speed, coordination, and other relevant
bodily tasks. The National Football League (NFL), however, uses more wide-ranging approach
tests, collectively known as the NFL Combine (History, 2010). Other well known standardized
tests include the SAT, the GRE, and perhaps one of the most well known, the Stanford-Binet
Intelligence Quotient (IQ) test. The NFL uses none of these, but does use an IQ test called the
Wonderlic Personnel Test-Quicktest (WPT-Q) to assess general intelligence. Although the NFL
has not made any official statements about why they use the WPT-Q—or any other intelligence
test—Wonderlic, Inc. (2010) currently suggests that, “How well a player will learn the playbook
and adapt within the scope of the team is forecast by the WPT. Of course, this is true for any
team, in any workplace” (Football and “The Wonderlic” section, ¶ 11). This definition was
slightly amended through the years (Wonderlic, 2004), but in general low scores on the
Wonderlic tend to indicate an aptitude or “learning problem” (Mulligan, 2004). While the overall
purpose of the Wonderlic test may be to assess general intelligence (g) which evidently
determines ability to learn the playbook, using psychometric criteria, many have questioned the
use of the Wonderlic Personnel Test as a measure of intelligence, and as a tool for measuring
success and performance in the NFL. Unfortunately for this examination, official Wonderlic, Inc.
materials were inaccessible due to the proprietary nature, but an analysis of recent research on
History
The Wonderlic Personnel Test, developed in 1937 by Edlon Wonderlic, was first
published in 1950 as a pre-employment test (Wonderlic, 2010). Since the original publishing,
Wonderlic, Inc.—now run by Eldon Wonderlic’s grandson, Charles Wonderlic—has amended the
original WPT and republished the test. Wonderlic, Inc. has also developed several versions of the
WPT, including the slightly shorter Wonderlic Personnel Test-Quicktest (WPT-Q) used in the
NFL, in addition to a variety of other tests for employers and educators. The Wonderlic
Personnel Test costs about $101 (Blumberg, 2006), and is a is a 12 minute, timed-test with 50
questions which can be completed on a computer or with pen and pencil; the Wonderlic
Personnel Test-Quicktest is simply a short-version of the WPT and is only eight minutes. Bertsch
& Pesta (2001) report that the WPT has a population mean score of 22 out of 50, and a
population SD of 7. The NFL average score in 2004 on the Wonderlic was a 19 (Mulligan,
2004). The Wonderlic is most often used as an employment selection tool, and is reported to be
used by almost three million people every year by almost seven thousand clients (Hatch, 2009).
Reliability
Reliability is one of the two primary techniques for evaluating the psychometric value of
a psychological test. The stability or consistency of a set of test scores from one person or from a
group of people establishes reliability (Johnson & Christensen, 2008). There are many different
types of reliability, any of which are suitable methods to verify stability of test scores, including
consistency. Concerning the reliability of the Wonderlic Personnel Test, Bell, Matthews, Lassiter,
& Leverett (2002), found that “Internal consistency reliabilities of the WPT range from .88 to .94
1
The Fourteenth Mental Measurements Yearbook (2001) reports the price of one test in 2000 as
$1.80. The price of the test may vary depending on the volume of use, and the type of form used.
Wonderlic Critique 4
while alternate-form reliability estimates range from .73 to .95 (Wonderlic, 1992). Test-retest
reliabilities range from .82 to .94 (Dodrill, 1983; Wonderlic, 1992)” (p. 116). More recently,
Geisinger (2001), reported from the Wonderlic Manual a .82 to .94 test-retest reliability
(internal consistency) as .88. It seems as though reliability has remained fairly stable over time,
and is with the range of minimum psychometric criteria2 for a psychological test.
Validity
The second type measure of psychometric value is validity. There are three dimensions to
and Construct validity. The validity of a test establishes whether the test measures exactly, and
only, what it claims to measure, and unlike reliability, all three of these dimensions are equally
Content-related validity is the process of making a “judgment of the degree to which the
evidence suggests that the items, tasks, or questions on your test adequately represent the domain
of interest” (Johnson & Christensen, 2008, p. 152). Unfortunately the content validity of the
Wonderlic Personnel Test was not addressed in the literature. The only bit of evidence to
demonstrate content related validity was by Geisinger (2001) who reported that the creator,
Edlon Wonderlic, adapted from a well-known and often used test called Otis Self-Administering
Tests of Mental Ability. Eldon Wonderlic was also an industrial psychologist, so his
understanding of current practices for assessing employee qualifications may have addressed the
2
“A popular rule of thumb is that the size of coefficient alpha should generally be, at a minimum, greater than or
equal to .70 for research purposes and somewhat grater than the value (e.g. ≥ .90) for clinical purposes (i.e. for
assign single individuals)” (Johnson & Christensen, p. 149).
Wonderlic Critique 5
Collecting the necessary questions to create a test is the first way to establish validity, and
ensuring a test predicts what it claims to predict, a criterion, is the second. According to
Wonderlic, Inc. (2010), “The WPT-Q is a short-form measure of general intelligence or cognitive
ability-the most powerful predictor of job success” (¶ 1). To further define success, Michael
Callans, President of Wonderlic Consulting, states that, “‘Corporations could learn a lot from the
NFL's use of testing,’ …Intelligence and personality determine a candidate's success, on the field
or off. Team owners recognize that the most successful [draft] choices are those players who not
only have the strength to play the game but the mental acuity to win. And they don't rely on gut
instinct to tell them who's right for the job. They get proof through testing” (Wonderlic, 2004, ¶
9). The proof, or measures of mental acuity as it relates to “success” in the NFL is are often
subjective, and relative, terms to the public, the players, and especially the team managers and
owners. Nonetheless, independent measures of success in this analysis are the criterion from
which we can determine whether the Wonderlic Personnel test measures what it claims.
Kuzmits & Adams (2008) offer a definition of success similarly for three positions,
quarterbacks, wide receivers, and running backs. The first definition is draft order, and two other
definitions are measured in the first three years of a player’s career and include salary and games
played. Other measures vary depending on the player’s position, but can consist of quarterback
rating3, average carry yards gained for a running back, and average reception yards gained for
wide receivers (Kuzmitz and Adams, 2008). In a replication study, Kuzmitz and Adams (2008)
then correlated these measures of success with the player’s scores obtained from the NFL
Combine and found similar results as previous research , that the research “failed to show a
relationship between the WPT and NFL scores” (p. 1726). Indeed, of the 30 correlations ran,
3
A quarterback’s rating is based on four criteria, “percentage of completions per attempt, average yards gained per
attempt, percentage of touchdown passes per attempt, and percentage of interceptions per attempt” (NCAA and NFL
passing efficiency computation, 2008, p. 1723).
Wonderlic Critique 6
most of the coefficients were less than r = .20, and only two were considered statistically
significant. The lack of a relationship between the WPT and measures of success indicates a
evidence is that which measures a relationship between the current measure and events which
your test and a second, validated criterion-related test (called a focal test) at the same time, and
verifying the test scores correlate. Wonderlic, Inc. (2010) claims the WPT is a test for general
intelligence and cognitive ability, and to determine concurrent evidence between the WPT and
another measure of general intelligence, Matthews and Lassiter (2007) compared the Wonderlic
Test to test measures of general intelligence (g) in the Woodcock Johnson Revised Test of
Cognitive Abilities (WJ-R), which now also measures crystallized intelligence (Gc), and fluid
intelligence (Gf). Fluid intelligence is “an ability to solve novel problems quickly and
(Matthews & Lassiter, 2007, pp. 707-709). The authors’ conclusion was that the WPT indeed
showed measures of concurrent validity with the WJ-R full battery (r = .55, p ≤ .01), but
differing associations with crystallized intelligence (r = .34, p ≤ .05), and fluid intelligence (r = .
26, n.s.), which may indicate problems with convergent validity, a type of construct validity
In addition to content and criterion validity, construct validity is the third appraisal of
validity. McIntire & Miller (2007) describe the process of establishing construct validity as a
“gradual accumulation of evidence that the scores on the test relate to the observable behaviors
in the ways predicted by the theory underlying the construct” (p. 228). Although construct
Wonderlic Critique 7
validity is by far the most difficult to define and prove, methodologists have established two
specific strategies: discriminant (constructs that should not be related are not) and convergent
(construct is related to test scores) validity (McIntire & Miller, 2007). To address discriminant
and convergent validity, Schraw (2001) reported that, “construct validity of the [WPT]
instrument is also nicely addressed; its correlations with instruments such as the [Wechsler Adult
Intelligence scale] WAIS Full Scale IQ and the General Aptitude Test Battery's 'Aptitude G' (for
general mental ability or intelligence) are high—in the range of .70-.92. In contrast, the WPT is
uncorrelated with a wide variety of personality assessment measures”(¶ 6). Although Schraw’s
(2001) analysis of the WPT as a highly correlated tool with other measures of intelligence sounds
similar to criterion-related, concurrent validity, there are specific differences. Using a tool
concurrently with another tool at the outset of design is a valuable way to establish concurrent
validity. However, after time, the test must continue to correlate with a host of other
psychometrically valuable tools which measure the same construct (i.e., IQ), the convergent
validity, and also must not measure other constructs it does not claim to measure, (i.e. personality
trait factors), the discriminant validity. In addition to discovering concurrent validity (described
above), Matthews & Lassiter (2007) found problems with the WPT to measure new theories of
intelligence which as mentioned include fluid and crystallized intelligence. Whether Wonderlic is
Unfortunately, even with all of this evidence of validity, McDonald (2005) would agree
that while the Wonderlic Personnel Test does measure intelligence at a statistically significant
rate, in regard to success of a player, within the modern draft era, there exists no statistically
significant relationship between intelligence and quarterback performance at either the collegiate
or professional level. Likewise, more intelligent quarterbacks are neither selected earlier nor
Wonderlic Critique 8
compensated more for their mental abilities. Summary and Conclusions section, ¶ 2. Considering
the criterion and content validity presented here both posed problems for the NFL in terms of
defining successful players by their intelligence, it seems the Wonderlic should be reconsidered
as a tool.
Conclusion
The Wonderlic Personnel Test is a very commonly used tool for employment selection
purposes. There is little debate about whether the Wonderlic carries psychometric value in terms
measure for more current theories of intelligence including crystallized and fluid intelligence.
Although the justification for using the Wonderlic, or any IQ test for the NFL draft is somewhat
unclear, the evidence does not seem to show a relationship between this part of the Combine and
successful outcomes for players. The NFL also appears to be the only professional sports
organization to use this test (Hatch, 2009), and using it may be at their own risk of eliminating
players which may be well-suited for play, but did not score well on the test. One possible flaw
with this examination, and others, is that authors of many of these studies may be defining
“success” and “performance” differently than how the NFL defines it. Unfortunately until the
NFL comes clean about their specific use of the Wonderlic or definition of such terms, the
.
Wonderlic Critique 9
References
Bell, N., Matthews, T., Lassiter, K., & Leverett, J. (2002). Validity of the Wonderlic Personnel
Assessment. North American Journal of Psychology, 4(1), 113. Retrieved from Academic
Bertsch, S., & Pesta, B. (2009). The Wonderlic Personnel Test and elementary cognitive tasks as
Blumberg, J., & C., S. (2006). CHOOSE YOUR WEAPON. Inc, 28(8), 96. Retrieved from
Geisinger, K. (2001). Review of the Wonderlic personnel test and scholastic level exam. In B. S.
Plake & J. C. Impara (Eds.), The fourteenth mental measurements yearbook (pp. 1360–
Hatch, C. (2009). Fourth and short on equality. The disparate impact of the NFL’s use of the
Wonderlic Intelligence Test and the case for a football-specific test. Connecticut Law
History. (2010). NFL Scouting Combine. Retrieved May 5, 2010, from www.nflcombine.com.
Johnson, B., & Christensen, L. (2008) Educational research, quantitative, qualitative, and mixed
Kuzmits, F., & Adams, A.. (2008). The NFL combine: does it predict performance in the national
Retrieved May 6, 2010, from ProQuest Health and Medical Complete. (Document
ID: 1669462821).
Matthews , D. T. & Lassiter, K. S. (2007). What does the Wonderlic Personnel Test measure?
at http://www.thesportjournal.org/tags/2005?page=2
McIntire, S. A., & Miller, L. A. (2007). Foundations of Psychological Testing. 2nd Edition. Sage
Publications.
Mulligan, M. (2004, April 22). Wonderlic scores have NFL teams wondering - Angelo
acknowledges the disparity in test results has a lot of NFL people questioning the validity
of the scores. Chicago Sun-Times (IL) 125. Retrieved May 5, 2010, from NewsBank on-
NCAA and NFL passing efficiency computation. (2008). In Kuzmits, F., & Adams, A.. (2008).
The NFL combine: does it predict performance in the national football league? Journal of
Strength and Conditioning Research, 22(6), 1721-1727. Retrieved May 6, 2010, from
Plake, B. S. & Impara, J. C. (Eds.). (2001). The fourteenth mental measurements yearbook.
Schraw, G. (2001). Review of the Wonderlic personnel test and scholastic level exam. In B. S.
Plake & J. C. Impara (Eds.), The fourteenth mental measurements yearbook (pp. 1360–
Wonderlic, Inc. (2004) “How Smart is Your First Round Draft Pick?” Wonderlic.com. Retrieved
reposted on http://www.freerepublic.com/focus/f-news/1312628/posts