Ebook Mathletics How Gamblers Managers and Fans Use Mathematics in Sports 2Nd Edition Wayne L Winston Online PDF All Chapter
Ebook Mathletics How Gamblers Managers and Fans Use Mathematics in Sports 2Nd Edition Wayne L Winston Online PDF All Chapter
https://ebookmeta.com/product/mathletics-how-gamblers-managers-
and-fans-use-mathematics-in-sports-2nd-edition-wayne-l-winston-2/
https://ebookmeta.com/product/cambridge-igcse-and-o-level-
history-workbook-2c-depth-study-the-united-states-1919-41-2nd-
edition-benjamin-harrison/
https://ebookmeta.com/product/primary-mathematics-3a-hoerst/
https://ebookmeta.com/product/microsoft-excel-365-data-analysis-
and-business-modeling-7th-edition-wayne-l-winston/
Microsoft Excel 2019 Data Analysis and Business
Modeling 6th Edition Wayne Winston
https://ebookmeta.com/product/microsoft-excel-2019-data-analysis-
and-business-modeling-6th-edition-wayne-winston/
https://ebookmeta.com/product/sport-fans-the-psychology-and-
social-impact-of-fandom-2nd-edition-daniel-l-wann/
https://ebookmeta.com/product/microsoft-excel-data-analysis-and-
business-modeling-office-2021-and-microsoft-365-7th-edition-
winston-wayne/
https://ebookmeta.com/product/agents-of-opportunity-sports-
agents-and-corruption-in-collegiate-sports-kenneth-l-shropshire/
https://ebookmeta.com/product/neuroscience-for-coaches-how-
coaches-and-managers-can-use-the-latest-insights-to-benefit-
clients-and-teams-3rd-edition-amy-brann/
MATHLETICS
MATHLETICS
H O W G A M B L E R S , M A NA GE R S ,
AND FANS USE
M A T H E M A T I C S I N S P O R T S
2ND EDITION
Princeton University Press is committed to the protection of copyright and the intellec-
tual property our authors entrust to us. Copyright promotes the progress and integrity
of knowledge. Thank you for supporting free speech and the global exchange of ideas by
purchasing an authorized edition of this book. If you wish to reproduce or distribute any
part of it in any form, please obtain permission.
press.princeton.edu
This book has been composed in Adobe Text Pro, Bee Four, and Courier New
Printed on acid-free paper. ∞
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
FROM KOSTAS:
To my wife, who puts up with my long hours of work,
and to my dad, who introduced me to the world of sports
but left this world sooner than he should have.
FROM WAYNE:
To Vivian, Greg, and Jen. Thanks for putting up with me!
FROM SCOTT:
To “Vegas Howie,” who shared his love of sports
and numbers with so many people.
“In God we trust; all others must bring data.”
—W. Edwards Deming
CONTENTS
Preface xi
Acknowledgments xv
Abbreviations xvii
PART I BASEBALL
PART II FOOTBALL
Bibliography 569
Index 579
PREFACE
When WW (Wayne Winston) began writing the first edition of
Mathletics in 2005, many North American sports teams did not
have analytics departments and the first annual famed M.I.T. Sloan
School Sports Analytics Conference had not taken place. Very few
schools taught sports analytics courses. Now every major (baseball,
football, basketball, and hockey) North American sports team has
an analytics department, tickets to the Sloan School Conference
quickly sell out, and many schools teach sports analytics courses.
Google searches for the term “sports analytics” have quadrupled
since 2007, and surely the number of scholarly publications involv-
ing sports analytics has grown exponentially. New data based on
cameras in venues and fitness devices worn by athletes have opened
up new areas of analysis. The joke is that some teams (like the Hous-
ton Rockets) have more analytics personnel than players!
The first edition of Mathletics was well received, but with the
rapid development of sports analytics, the time seems ripe for a
second edition. I am fortunate to have two great co-authors: Scott
Nestler (SN) of Notre Dame and Kostas Pelechrinis (KP) of the Uni-
versity of Pittsburgh.
If you want a self-contained introduction to the use of math
in sports, we feel this is the book for you. No prior knowledge is
assumed, and even a bright high school student should be able to
absorb most of the material. Even if your career does not involve
sports, we are sure that anyone using analytics in his or her c areer
will benefit from learning how analytics lead to better outcomes for
xiiPreface
WHAT’S NEW
We have added 17 new chapters and made substantial changes to all the
chapters in the first edition. A summary of the major changes follows:
BOOK RESOURCES
cal approaches used to gain insights and make decisions, the specific
tool used to implement these is of secondary focus. Furthermore,
using Excel allows for democratizing accessibility of the material to
people not proficient in programming, which can include, for ex-
ample, high school students and non-STEM graduates. Neverthe-
less, understanding the importance of programming skills in today’s
world, we have also provided a companion website and code re-
pository, where we have provided the implementation in Python
of the most challenging analytical tasks covered in the book. While
these are not meant to teach from scratch how one can program in
Python, we hope that they will be an additional valuable resource
for p eople with minimal programming background. Whenever pos
sible, the code repository content w ill also be dynamic, and periodi-
cally updated with more recent data.
The website for this book is http://www.mathleticsbook .com.
You will be able to find a variety of resources there, including errata,
datasets, Excel files, Python scripts, and R scripts (forthcoming).
Code will also be available in a GitHub repository at https://github
.com/mathletics-book/.
CONTACT US
We hope you enjoy this book as much we enjoyed writing it. Feel
free to contact us via email.
BASEBALL
CHAPTER 1
R2
= Estimate of percentage of games won (1′)
R2 + 1
A B C D E F G H I J
1 exp 2.000 MAD: 0.021
2 Year Team Wins Losses Runs Opp Runs Ratio Pred W–L% Act W–L% Error
3 2016 ARI 69 93 752 890 0.845 0.42 0.43 0.009
4 2016 ATL 68 93 649 779 0.833 0.41 0.42 0.010
5 2016 BAL 89 73 744 715 1.041 0.52 0.55 0.030
6 2016 BOS 93 69 878 694 1.265 0.62 0.57 0.041
7 2016 CHC 103 58 808 556 1.453 0.68 0.64 0.043
8 2016 CHW 78 84 686 715 0.959 0.48 0.48 0.002
9 2016 CIN 68 94 716 854 0.838 0.41 0.42 0.007
10 2016 CLE 94 67 777 676 1.149 0.57 0.58 0.011
11 2016 COL 75 87 845 860 0.983 0.49 0.46 0.028
12 2016 DET 86 75 750 721 1.040 0.52 0.53 0.011
13 2016 HOU 84 78 724 701 1.033 0.52 0.52 0.002
14 2016 KCR 81 81 675 712 0.948 0.47 0.50 0.027
15 2016 LAA 74 88 717 727 0.986 0.49 0.46 0.036
16 2016 LAD 91 71 725 638 1.136 0.56 0.56 0.002
17 2016 MIA 79 82 655 682 0.960 0.48 0.49 0.008
18 2016 MIL 73 89 671 733 0.915 0.46 0.45 0.005
19 2016 MIN 59 103 722 889 0.812 0.40 0.36 0.033
20 2016 NYM 87 75 671 617 1.088 0.54 0.54 0.005
21 2016 NYY 84 78 680 702 0.969 0.48 0.52 0.034
errors is called the MAD (mean absolute deviation).1 We find that for
our dataset the predicted winning percentages of the Pythagorean
Theorem were off by an average of 2.17% per team.
Instead of blindly assuming win percentage can be approximated
by using the square of the scoring ratio, perhaps we should try a
formula to predict winning percentage, such as
R exp
. (2)
R exp + 1
If we vary exp in (2) we can make (2) better fit the actual dependence
of winning percentage on the scoring ratio for different sports.
1. Why didn’t we just average the actual errors? Because averaging positive and
negative errors would result in positive and negative errors canceling out. For ex-
ample, if one team wins 5% more games than (1′) predicts and another team wins
5% less games than (1′) predicts, the average of the errors is 0 but the average of the
absolute errors is 5%. Of course, in this s imple situation estimating the average error
as 5% is correct while estimating the average error as 0% is nonsensical.
6Chapter 1
N O
5 MAD
6 0.021
7 1.1 0.02812245
8 1.2 0.02617963
9 1.3 0.02441563
10 1.4 0.02289267
11 1.5 0.02160248
12 1.6 0.02069009
13 1.7 0.02014272
14 1.8 0.0199295
15 1.9 0.0201094
16 2 0.020513
17 2.1 0.02114432
18 2.2 0.02208793
19 2.3 0.02328749
20 2.4 0.02473436
21 2.5 0.02640258
22 2.6 0.02823811
23 2.7 0.03019355
24 2.8 0.03228514
25 2.9 0.03447043
26 3 0.03670606
For baseball, we will allow exp in (2) (exp is short for exponent)
to vary between 1 and 3. Of course exp = 2 reduces to the Pythago-
rean Theorem.
Figure 1-2 shows how the MAD changes as we vary exp between
1 and 3. This was done using the Data T able feature in Excel.2 We see
that indeed exp = 1.8 yields the smallest MAD (1.99%). An exp value
of 2 is almost as good (MAD of 2.05%), so for simplicity we will stick
with Bill James’s view that exp = 2. Therefore exp = 2 (or 1.8) yields the
best forecasts if we use an equation of form (2). Of course, there might
be another equation that predicts winning percentage better than the
Pythagorean Theorem from runs scored and allowed. The Pythago-
2. See Chapter 1 Appendix for an explanation of how we used Data Tables to de-
termine how MAD changes as we vary exp between 1 and 3. Additional information
available at https://support.office.com/en-us/article/calculate-multiple-results-by
-using-a-d
ata-table-e95e2487-6ca6-4413-ad12-77542a5ea50b.
Baseball’s Pythagorean Theorem7
rean Theorem is simple and intuitive, however, and does very well.
After all, we are off in predicting team wins by an average of 162 * .0205,
which is approximately three wins per team. Therefore, I see no reason
to look for a more complicated (albeit slightly more accurate) model.
3. In four playoff series the opposing teams had identical win-loss records, so
the “games won” approach could not make a prediction.
8Chapter 1
A B C D E F G H I J K L M N
1 Exp 2.370 MAD 0.051
2 Year Team Wins Losses Ties PF PA Ratio Pred W–L% Act W-L% Error
3 2015 Arizona Cardinals 13 3 0 489 313 1.56 0.742 0.813 0.071
4 2015 Atlanta Falcons 8 8 0 339 345 0.98 0.490 0.5 0.010 MAD
5 2015 Baltimore Ravens 5 11 0 328 401 0.82 0.383 0.313 0.070 Exp 0.051130558
6 2015 Buffalo Bills 8 8 0 379 359 1.06 0.532 0.5 0.032 1.5 0.087458019
7 2015 Carolina Panthers 15 1 0 500 308 1.62 0.759 0.938 0.179 1.6 0.083786393
8 2015 Chicago Bears 6 10 0 335 397 0.84 0.401 0.375 0.026 1.7 0.080410576
9 2015 Cincinnati Bengals 12 4 0 419 279 1.50 0.724 0.75 0.026 1.8 0.077291728
10 2015 Cleveland Browns 3 13 0 278 432 0.64 0.260 0.188 0.072 1.9 0.074380834
11 2015 Dallas Cowboys 4 12 0 275 374 0.74 0.325 0.25 0.075 2 0.071698879
12 2015 Denver Broncos 12 4 0 355 296 1.20 0.606 0.75 0.144 2.1 0.069282984
13 2015 Detroit Lions 7 9 0 358 400 0.90 0.435 0.438 0.003 2.2 0.067048672
14 2015 Green Bay Packers 10 6 0 368 323 1.14 0.577 0.625 0.048 2.3 0.065010818
15 2015 Houston Texans 9 7 0 339 313 1.08 0.547 0.563 0.016 2.4 0.063455288
16 2015 Indianapolis Colts 8 8 0 333 408 0.82 0.382 0.5 0.118 2.5 0.062158811
17 2015 Jacksonville Jaguars 5 11 0 376 448 0.84 0.398 0.313 0.085 2.6 0.061279631
18 2015 Kansas City Chiefs 11 5 0 405 287 1.41 0.693 0.688 0.005 2.7 0.060819271
19 2015 Miami Dolphins 6 10 0 310 389 0.80 0.369 0.375 0.006 2.8 0.060758708
20 2015 Minnesota Vikings 11 5 0 365 302 1.21 0.610 0.688 0.078 2.9 0.060941558
21 2015 New England Patriots 12 4 0 465 315 1.48 0.716 0.75 0.034 3 0.061357921
22 2015 New Orleans Saints 7 9 0 408 476 0.86 0.410 0.438 0.028 3.1 0.061891886
23 2015 New York Giants 6 10 0 420 442 0.95 0.470 0.375 0.095 3.2 0.062648637
24 2015 New York Jets 10 6 0 387 314 1.23 0.621 0.625 0.004 3.3 0.063594958
25 2015 Oakland Raiders 7 9 0 359 399 0.90 0.438 0.438 0.000 3.4 0.06474528
26 2015 Philadelphia Eagles 7 9 0 377 430 0.88 0.423 0.438 0.015 3.5 0.065955742
27 2015 Pittsburgh Steelers 10 6 0 423 319 1.33 0.661 0.625 0.036
exp = 2.37 gives the most accurate predictions for winning percent-
age, while for the NBA, equation (2) with exp = 13.91 gives the most
accurate predictions for winning percentage. Figure 1-3 gives the
predicted and a ctual winning percentages for the 2015 NFL, while
Figure 1-4 gives the predicted and actual winning percentages for
the 2015–2016 NBA. See the file Sportshw1.xls
For the 2008–2015 NFL seasons we found MAD was minimized
by exp = 2.8. Exp = 2.8 yielded a MAD of 6.08%, while Morey’s
exp = 2.37 yielded a MAD of 6.39%. For the NBA seasons 2008–2016
we found exp = 14.4 best fit actual winning percentages. The MAD
for t hese seasons was 2.84% for exp = 14.4 and 2.87% for exp = 13.91.
Since Morey’s values of exp are very close in accuracy to the values
we found from recent seasons we will stick with Morey’s values of
exp. See file Sportshw1.xls.
Assuming the errors in our forecasts follow a normal random
variable (which turns out to be a reasonable assumption) we would
10Chapter 1
A B C D E F G H I J K L M
1 Exp 13.910 MAD 0.0287
2 Year Team Wins Losses Points Opp Points Ratio Pred W-L% Act W-L% Error
3 2015-16 Atlanta Hawks 48 34 8433 8137 1.04 0.622 0.585 0.037
4 2015-16 Boston Celtics 48 34 8669 8406 1.03 0.606 0.585 0.021
5 2015-16 Brooklyn Nets 21 61 8089 8692 0.93 0.269 0.256 0.013 Exp 0.0287
6 2015-16 Charlotte Hornets 48 34 8479 8256 1.03 0.592 0.585 0.007 12 0.0340286
7 2015-16 Chicago Bulls 42 40 8335 8456 0.99 0.450 0.512 0.062 12.2 0.0332135
8 2015-16 Cleveland Cavaliers 57 25 8555 8063 1.06 0.695 0.695 6E-05 12.4 0.0324282
9 2015-16 Dallas Mavericks 42 40 8388 8413 1 0.490 0.512 0.022 12.6 0.0317199
10 2015-16 Denver Nuggets 33 49 8355 8609 0.97 0.397 0.402 0.005 12.8 0.0310445
11 2015-16 Detroit Pistons 44 38 8361 8311 1.01 0.521 0.537 0.016 13 0.0304509
12 2015-16 Golden State Warriors 73 9 9421 8539 1.1 0.797 0.89 0.093 13.2 0.0298964
13 2015-16 Houston Rockets 41 41 8737 8721 1 0.506 0.5 0.006 13.4 0.0294269
14 2015-16 Indiana Pacers 45 37 8377 8237 1.02 0.558 0.549 0.009 13.6 0.0290408
15 2015-16 Los Angeles Clippers 53 29 8569 8218 1.04 0.641 0.646 0.005 13.8 0.0287533
16 2015-16 Los Angeles Lakers 17 65 7982 8766 0.91 0.214 0.207 0.007 14 0.0285995
17 2015-16 Memphis Grizzlies 42 40 8126 8310 0.98 0.423 0.512 0.089 14.2 0.0284997
18 2015-16 Miami Heat 48 34 8204 8069 1.02 0.557 0.585 0.028 14.4 0.0284481
19 2015-16 Milwaukee Bucks 33 49 8122 8465 0.96 0.360 0.402 0.042 14.6 0.0284727
20 2015-16 Minnesota Timberwolves 29 53 8398 8688 0.97 0.384 0.354 0.03 14.8 0.028568
21 2015-16 New Orleans Pelicans 30 52 8423 8734 0.96 0.377 0.366 0.011 15 0.0287573
22 2015-16 New York Knicks 32 50 8065 8289 0.97 0.406 0.39 0.016 15.2 0.0289692
23 2015-16 Oklahoma City Thunder 55 27 9038 8441 1.07 0.721 0.671 0.05 15.4 0.0292675
24 2015-16 Orlando Magic 35 47 8369 8502 0.98 0.445 0.427 0.018 15.6 0.0296178
25 2015-16 Philadelphia 76ers 10 72 7988 8827 0.9 0.200 0.122 0.078 15.8 0.0300081
26 2015-16 Phoenix Suns 23 59 8271 8817 0.94 0.291 0.28 0.011 16 0.0304529
27 2015-16 Portland Trail Blazers 44 38 8622 8554 1.01 0.528 0.537 0.009
C H A P T E R 1 A P P E N D I X : D A T A T A B L E S
The Excel Data T able feature enables us to see how a formula changes
as the values of one or two cells in a spreadsheet are modified. In this
appendix we show how to use a one-way data table to determine
how the accuracy of (2) for predicting team winning percentage de-
pends on the value of exp. To illustrate let’s show how to use a one-
way data t able to determine how varying exp from 1 to 3 changes
our average error in predicting an MLB’s team winning percentage
(see Figure 1-2).
Step 1: We begin by entering the possible values of exp (1, 1.1, . . . , 3)
in the cell range N7:N26. To enter t hese values we simply enter 1 in
N7 and 1.1 in N8 and select the cell range N7:N8. Now we drag the
cross in the lower right-hand corner of N8 down to N26.
Step 2: In cell O6 we enter the formula we want to loop through
and calculate for diff erent values of exp by entering the formula = J1.
Then we select the “table range” N6:O26.
Step 3: Now we select Data T able from the What If section of the
ribbon’s Data tab.
Step 4: We leave the row input cell portion of the dialog box
blank but select cell G1 (which contains the value of exp) as the col-
umn input cell. After selecting OK we see the results shown in Fig-
ure 1-2. In effect, Excel has placed the values 1, 1.1, . . . , 3 into cell G1
and computed our MAD for each listed value of exp.
CHAPTER 2
At age 24, Los Angeles Angels outfielder Mike Trout won the 2016
American League Most Valuable Player award for the second time in
his career. Also at age 24, Kris Bryant of the Chicago Cubs won the
2016 National League Most Valuable Player award. Table 2.1 shows
their key statistics:
Recall that a batter’s slugging percentage is given by
Total Bases
Slugging Percentage = , where
At Bats
We see in Table 2-1 that Trout had a higher batting average than
Bryant. However, Bryant had a slightly higher slugging percent-
age since he hit more doubles and home runs. Bryant also had 54
more at bats than Trout and three more hits. So, which player had
a better hitting year?
Runs Created13
T A B L E 2 . 1
Mike Trout and Kris Bryant 2016 Statistics
Event Trout (2016) Bryant (2016)
At Bats 549 603
Batting Average .315 .292
Slugging Percentage .550 .554
Hits 173 176
Singles 107 99
Doubles 32 35
Triples 5 3
Home Runs 29 39
Walks + Hit by Pitcher 127 93
1. Of course, this leaves out t hings like sacrifice hits, sacrifice flies, stolen bases,
and caught stealings. See http://danagonistes.blogspot.com/2004/10/brief-history
-of-r un-e stimation-r uns.html for an excellent summary of the evolution of runs
created.
14Chapter 2
A B C D E F G H I J K L M N
1 Team Runs At Bats Hits Singles 2B 3B HR BB+HBP
2 ARI 752 5665 1479 948 285 56 190 513
3 ATL 649 5514 1404 960 295 27 122 561 RC Formula =(D5+I5)*(E5+2*F5+3*G5+4*H5)/(C5+I5)
4 BAL 744 5524 1413 889 265 6 253 512
5 BOS 878 5670 1598 1022 343 25 208 601 RC 916.9805 917
6 CHC 808 5503 1409 887 293 30 199 752 Actual 878
7 CHW 686 5550 1428 950 277 33 168 508 Error –39
8 CIN 716 5487 1403 929 277 33 164 504 % Error –0.04442
9 CLE 777 5484 1435 913 308 29 185 580
10 COL 845 5614 1544 975 318 47 204 534
11 DET 750 5526 1476 983 252 30 211 546
12 HOU 724 5545 1367 849 291 29 198 601
13 KCR 675 5552 1450 1006 264 33 147 427
for each team during the 2010–2016 seasons and compares runs cre-
ated to actual runs scored. We find that Runs Created was off by
an average of 21 runs per team. Since the average team scored 693
runs, this is an average error of about 3% when we try to use (1) to
predict Team Runs Scored. It is amazing that this simple, intuitively
appealing formula does such a good job of predicting runs scored by
a team. Even though more complex versions of Runs Created more
accurately predict actual runs scored, the simplicity of (1) has caused
this formula to still be widely used by the baseball community.
The problem with any version of Runs Created is that the formula
is based on team statistics. A typical team has a batting average of
.250, hits HRs on 3% of all plate appearances, and has a walk or HBP
in around 10% of all plate appearances. Contrast these numbers to
Miguel Cabrera’s great 2013 season in which he had a batting average
of .348, hit an HR on approximately 7% of all plate appearances, and
received a walk or HBP during approximately 15% of his plate appear-
ances. One of the first ideas we teach in business statistics courses is to
not use a relationship that is fit to a dataset to make predictions for
data that is very different from the data used to fit the relationship.
Following this logic, we should not expect a Runs Created formula
based on team data to accurately predict the runs created by a super-
star such as Miguel Cabrera or a very poor player. In Chapter 4 we w ill
remedy this problem with a different type of model.
Despite this caveat, let’s plunge ahead and use (1) to compare Mike
Trout’s 2016 season to Kris Bryant’s 2016 season. For fun we also com-
puted Runs Created for Miguel Cabrera’s great 2013 season. See the
worksheet Figure 2-2 of the workbook Chapter2mathleticsfiles.xlsx.
From our data, we calculated that Mike Trout created 134 runs
and Kris Bryant created 129 runs. Cabrera created 148 runs in 2013.
16Chapter 2
A B C D E F G H I J K L M
1 Player At Bats Hits 1B 2B 3B HR Estimated Outs Other Outs BB+HBP Runs Created Game Outs Used Runs Created / Game
2 Bryant (2016) 603 176 99 35 3 39 416.146 11 93 129.09 427.146 8.10837519
3 Trout (2016) 549 173 107 32 5 29 366.118 17 127 134.02 383.118 9.385763732
4 Cabrera (2013) 555 193 102 26 1 44 352.01 21 95 147.54 373.01 10.61264318
10 J2 Formula =(C2+J2)*(D2+2*E2+3*F2+4*G2)/(B2+J2)
11 K2 Formula =I2+H2
12 L2 Formula =K2/(L2/26.83)
This indicates that Trout had a slightly better hitting year in 2016 than
Bryant. Miguel Cabrera’s 2013 season was superior to both Trout and
Bryant’s 2016 year according to this Runs Created approach.
A B C D E F G H I J K L M
6 Player At Bats Hits 1B 2B 3B HR Outs BB+HBP Runs Created Game Outs Used Runs Created / Game
7 Christian 700 190 170 10 1 9 497.4 20 66.79 497.4 3.60
8 Gregory 400 120 90 15 0 15 272.8 20 60.00 272.8 5.90
2. Since the home team does not always bat in the ninth inning and some games
go into extra innings, average outs per game is not exactly 27. For the years 2010–
2016, average outs per game was 26.83.
Runs Created17
To see how this works let’s look at Trout’s 2016 data. See Figure 2-2
and the PlayerRC worksheet in file Teams.xlsx.
How did we compute outs? Essentially all at bats except for hits
and errors result in an out. Approximately 1.8% of all at bats result in
errors. Therefore, we computed outs as at bats − hits − .018(at bats).
Hitters also create “extra” outs through sacrifice hits, sacrifice bunts,
caught stealings, and grounding into double plays. In 2016 Trout cre-
ated 17 of these extra outs. As shown in cell L3 Trout “used” up 383.11
383.11
outs for the Angels. This is equivalent to = 14.28 games. There-
134.02 26.83
fore, Trout created = 9.39 runs per game. More formally:
14.28
Runs Created
Runs Created Per Game = .982(At Bats) − Hits + GIDP + SF + SH + CS (2)
26.83
Equation (2) simply states that Runs Created per game is Runs Cre-
ated by a batter divided by Number of Games worth of outs used
by the batter. Figure 2-2 shows that Miguel Cabrera created 10.61
runs per game. Figure 2-2 also makes it clear that Trout was a more
valuable hitter than Bryant in 2016. Specifically, Trout created 9.39
runs per game while Bryant created approximately 1.28 fewer runs
per game (8.11 runs). We also see that runs created per game by the
notional Gregory is 2.29 runs (5.88 − 3.59) better per game than fic-
titious Christian. This resolves the problem that ordinary runs cre-
ated ranked Christian ahead of Gregory.
Our estimate of runs created per game of 9.39 for Mike Trout in-
dicates that we believe a team consisting of nine Mike Trouts would
score an average of 9.39 runs per game. Since no team consists of nine
players like Trout, a more relevant question might be, how many runs
would Mike Trout create when batting with eight “average hitters”? In
his book Win Shares (2002) Bill James came up with a more complex
version of runs created that answers this question. We will provide our
own answer to this question in Chapters 3 and 4.
CHAPTER 3
EVALUATING HITTERS
BY LINEAR WEIGHTS
In Chapter 2 we described how knowledge of a hitter’s at bats,
BBs + HBPs, singles, 2Bs, 3Bs, and HRs allows us to compare hitters
via the runs created metric. As we w ill see in this chapter, the linear
weights approach can also be used to compare hitters. In business
and science, we often try and predict a given variable (called Y or
the dependent variable) from a set of independent variables (call the
independent variables x1, x2, . . . , an). Usually we try to find weights
B1, B2, . . . , Bn and a constant that make the quantity
Let’s see if we can use basic arithmetic to come up with a crude esti-
mate of the value of an HR. For the years 2010–2016 in a game an aver-
age MLB team has 38 batters come to the plate and score 4.3 runs.
So roughly one out of nine batters scores. During a game the average
MLB team has around 12 batters reach base. Therefore 4.3/12 or
around 36% of all runners score. If we assume an average of one base
runner on base when an HR is hit then a Home Run creates “runs”
in the following fashion,
• The batter scores all the time instead of 1/8 of the time, which
creates 7/8 of a run.
• An average of one base runner will score 100% of the time
instead of 37% of the time, which creates .63 runs.
1. If we did not square the prediction error for each team, we would find that
the errors for teams that scored more runs than predicted would be cancelled out by the
errors for teams that scored fewer runs than predicted.
20Chapter 3
A B C D E F G
1 SUMMARY OUTPUT
2
3 Regression Statistics
4 Multiple R 0.949366525
5 R Square 0.9012968
6 Adjusted R Square 0.897876392
7 Standard Error 22.07547927
8 Observations 210
9
10 ANOVA
11 df SS MS F Significance F
12 Regression 7 898893.5132 128413.359 263.505645 6.66264E–98
13 Residual 202 98440.01059 487.3267851
14 Total 209 997333.5238
15
16 Coefficients Standard Error t Stat P–value Lower 95% Upper 95%
17 Intercept –411.8133561 33.00675506 –12.47663866 7.3423E–27 –476.8953293 –346.731383
18 BB+HBP 0.326171191 0.026991877 12.08405016 1.1813E–25 0.272949219 0.37939316
19 1B 0.459107774 0.028209869 16.2747222 1.325E–38 0.403484193 0.51473135
20 2B 0.805141015 0.070539419 11.41405797 1.31E–23 0.666052984 0.94422905
21 3B 1.072129559 0.185083303 5.792686554 2.6244E–08 0.707186489 1.43707263
22 HR 1.428105264 0.052270693 27.32133795 9.1608E–70 1.325039094 1.53117143
23 SB 0.250044999 0.063490957 3.938277396 0.00011296 0.124854967 0.37523503
24 CS –0.254380304 0.190576335 –1.334794818 0.18344599 –0.630154411 0.1213938
weight agrees with our simple calculation of 1.5, and the fact that a
double is worth more than a single but less than two singles makes
sense. We also observe that the fact that a single is worth more than
a walk makes sense because singles often advance runners two bases.
It is also reasonable to see that a triple is worth more than a double
but less than an HR.
T H E M E A N I N G O F P - V A L U E S
A B C D E F G
1 SUMMARY OUTPUT
2
3 Regression Statistics
4 Multiple R 0.948907909
5 R Square 0.900426219
6 Adjusted R Square 0.897483152
7 Standard Error 22.11794065
8 Observations 210
9
10 ANOVA
11 df SS MS F Significance F
12 Regression 6 898025.2542 149670.8757 305.948214 8.8346E–99
13 Residual 203 99308.26962 489.2032986
14 Total 209 997333.5238
15
16 Coefficients Standard Error t Stat P–value Lower 95% Upper 95%
17 Intercept –422.3214856 32.11582993 –13.14994775 5.654E–29 –485.6448728 –358.9980984
18 BB+HBP 0.328427033 0.026990732 12.16814092 6.1158E–26 0.275208898 0.381645169
19 1B 0.462425312 0.028154216 16.4247273 3.9961E–39 0.406913115 0.51793751
20 2B 0.809004928 0.070615562 11.45646795 9.2244E–24 0.669770893 0.948238964
21 3B 1.056646807 0.185074775 5.709296723 3.9868E–08 0.691731384 1.421562229
22 HR 1.432093994 0.052285581 27.38984579 4.1936E–70 1.329001529 1.535186459
23 SB 0.204454976 0.05362427 3.812732098 0.00018226 0.098722992 0.31018696
Predicted Runs =
−422.32 + .46(Singles) + .81(2Bs) + 1.06(3Bs)
+ 1.43(HRs) + .33(BBs + HBPs) + .205(SBs). (3)