Showing posts with label basic. Show all posts
Showing posts with label basic. Show all posts

Sneak Peek at WP 2.0

I've just completed the development, validation, and testing of the next-generation Win Probability model. It took the better part of the past 6 months. Despite many heartaches and frustrating turns, I'm really thrilled with the results. But as excited as I am to have this new tool, I'm also somewhat humbled by how inadequate the original model is in some regards.

As a quick refresher the WP model tells us the chance that a team will win a game in progress as a function of the game state--score, time, down, distance...etc. Although it's certainly interesting to have a good idea of how likely your favorite team is to win, the model's usefulness goes far beyond that.

WP is the ultimate measure of utility in football. As Herm once reminded us all, You play to win the game! Hello!, and WP measures how close or far you are from that single-minded goal. Its elegance lies in its perfectly linear proportions. Having a 40% chance at winning is exactly twice as good as having a 20% chance at winning, and an 80% chance is twice as good as 40%. You get the idea.

That feature allows analysts to use the model as a decision support tool. Simply put, any decision can be assessed on the following basis: Do the thing that gives you the best chance of winning. That's hardly controversial. The tough part is figuring out what the relevant chances of winning are for the decision-maker's various options, and that's what the WP model does. Thankfully, once the model is created, only fifth grade arithmetic is required for some very practical applications of interest to team decision-makers and to fans alike.

Implications of a 33-Yard XP

The NFL is experimenting with a longer XP this preseason. XPs have become so automatic (close to 99.5%) that there no longer much rationale for including them in the game. The Competition Committee's experiment is to move the line of scrimmage of each XP to the 15-yard line, making the distance of each kick 33-yards.

Over the past five seasons, attempts from that distance are successful 91.5% of the time. That should put a bit of excitement and drama into XPs, especially late in close games, which is what the NFL wants. But it might also have another effect on the game.

Currently, two-point conversions are successful at just about half that rate, somewhere north of 45%. The actual rate is somewhat nebulous, because of how fakes and aborted kick attempts into two-point attempts are counted.

It's likely the NFL chose the 15-yd line for a reason. The success rates for kicks from that distance are approximately twice the success rate for a 2-point attempt, making the entire extra point process "risk-neutral." In other words, going for two gives teams have half the chance at twice the points.

Momentum Part 5 - Series Level Analysis

This is the final part of my series on momentum in a football game. Is momentum a causative property that a team can gain or lose, or is it only something our minds project to explain streaks of outcomes that don't alternate as much as we expect? It's been a couple months since I began this series, so as a refresher, here is what I've looked at so far:

Part 1 examined the possibility that momentum exists by measuring whether teams that obtain the ball in momentum-swinging ways go on to score more frequently than teams that obtained the ball by regular means.

Part 2 looked at whether teams that gained possession following momentous plays went on to win more often than we would otherwise expect.

Part 3 focused on drive success following a turnover on downs, which is often cited by coaches and analysts as a reason not to go by the numbers when making strategic decisions.

Part 4 applied a different method of examining momentum by using the runs test so see the degree to which team performance is streakier than random, independent trials.

In this part, I'll apply the runs test at the series level, to see if teams convert first downs (or fail to convert them) more consecutively than random independence would suggest. But first, I'll tie up some loose ends left hanging from part 4. Specifically, I'll redo the play-level runs test to eliminate potential confusion caused by a team with disparate performance from their offensive and defensive squads.

The Value of a Timeout - Part 2

In the first part of this article, I made a rough first approximation of the value of a timeout. Using a selected subsample of 2nd half situations, it appeared that a timeout's value was on the order of magnitude of .05 Win Probability (WP). In other words, if a team with 3 timeouts had a .70 WP, another identical team in the same situation but with only 2 timeouts would have about a .65 WP.

In this part, I'll apply a more rigorous analysis and get a better approximation. We'll also be able to repeat the methodology and build a generalized model of timeout values for any combination of score, time, and field position.

Methodology

For my purposes here, I used a logit regression. (Do not try to build a general WP model using logit regression. It won't work. The sport is too complex to capture the interactions properly.) Logit regression is suitable in this exercise because we're only going to look at regions of the game with fairly linear WP curves. I'm also only interested in the coefficient of the timeout variables, the relative values of timeout states, and not the full prediction of the model.

I specified the model with winning {0,1} as the outcome variable, and with yard line, score difference, time remaining, and timeouts for the offense and defense as predictors. The sample was restricted to 1st downs in the 3rd quarter near midfield, with the offense ahead by 0 to 7 points.

Results

The Value of a Timeout - A First Approximation

During the NFC Championship Game the other day, we saw a familiar situation. Down by 4 with 14 minutes left in the game, the Seahawks were confronted with a decision. It was 4th and 7 on the SF 37. Should they go for it, punt, or even try a long FG to maybe make it a 1-point game? Pete Carroll ended up making what was the right decision according to the numbers, but not before calling a timeout to think it over.

As I noted in my game commentary, if you need to call a timeout to think over your options, the situation is probably not far from the point of indifference where the options are nearly equal in value. And timeouts have significant value, particularly in situations like this example--late in the game and trailing by less than a TD--because you'll very likely need to stop the clock in the end-game, either to get the ball back or during a final offensive drive. Would Carroll have been better off making a quick but sub-optimum choice, rather than make the optimum choice but by burning a timeout along the way?

Here's another common situation. A team trails by one score in the third quarter. It's 3rd and 1 near midfield and the play clock is near zero. Instead of taking the delay of game penalty and facing a 3rd and 6, the head coach or QB calls a timeout. Was that the best choice, or would the team be better off facing 3rd and 6 but keeping all of its timeouts?

Both questions hinge on the value of a timeout, which has been something of a white whale of mine for a while. Knowing the value of a timeout would help coaches make better game management decisions, including clock management and replay challenges.

In this article, I'll estimate the value of a timeout by looking at how often teams win based on how many timeouts they have remaining. It's an exceptionally complex problem, so I'll simplify things by looking at a cross section of game situations--3rd quarter, one-score lead, first down at near midfield. First, I'll walk through a relatively crude but common-sense analysis, then I'll report the results of a more sophisticated method and see how both approaches compare.

Momentum 4: How Streaky Are NFL Games?

This is the 4th part in my series on examining the concept of momentum in NFL games.The first part looked at whether teams that gained possession of the ball by momentum-swinging means went on to score more frequently than teams that gained possession by regular means. The second part of this series looked at whether teams that gained possession following momentous plays went on to win more often than we would otherwise expect. The third part focused on drive success following a turnover on downs, which is often cited by coaches and analysts as a reason not to go by the numbers when making strategic decisions.

This article will examine how 'streaky' NFL games tend to be. If momentum is real and it affects game outcomes, it would result in streaks of success and failure that are longer than we would expect by chance. But if consecutive plays are independent of previous success, the streaks of success and failure will tend to be no longer than expected by chance. This method of analysis does not rely on any particular definition of a precipitating momentum-swing, as it looks at entire games to measure whether success begets further success and whether failure leads to more failure.

For momentum to have a tangible effect on games, it does not require completely unbroken strings of successful or unsuccessful plays. But if success does enhance the chance of subsequent success, then the streaks of outcomes will be longer than if by chance alone.

For this analysis, I applied the Runs Test to the sequence of plays in a game. This produces a statistic indicating how streaky a string of results is compared to what would be expected by chance. For example, consider the following 3 strings of results of flipping a coin 8 times:

HTHTHTHT, HHHHTTTT, HTTTHTHH

The Runs Test works like this:

What Kind of Teams Are Super Bowl Winners?

What's the profile of a Super Bowl winner in the modern era? Does defense win championships? Are they predominantly elite offenses? Do they have to be above average on both sides of the ball? Are champions always dominant in the regular season? Is your team out of the mix for the Lombardi Trophy?

Here's the plot of every team's regular season Expected Points Added (EPA) for every team from 1999-2013. The horizontal axis represents their offensive EPA per game, and the horizontal axis represents their defensive EPA per game. The best teams are in the upper-right quadrant, while the worst are in the lower-left. (Click to enlarge...it's suitable for framing!)

Momentum Part 3: After Failed 4th Down Conversion Attempts

This is the third part of my look at momentum in the NFL. The first part examined whether several momentum-swinging types of events caused any increase in a team's chances of scoring on the subsequent possession. The second part compared the expected and observed Win Probability (WP) following momentum-swinging events to find out whether those events increased a team's chances of winning beyond what we would otherwise expect.

This installment cuts to the chase. From a strategic perspective, we want to understand how momentum may or may not affect the game so that coaches can make better decisions. Often, momentum is cited as a consideration to forgo strategically optimal choices for fear of losing the emotional and psychological edge thought to comprise momentum.

Here's the thinking: If a team tries to convert on 4th down but fails or unsuccessfully tries for a two-point conversion, it gives up the momentum to the other team. The implication is that failing on 4th down means that winning is now less probable than the resulting situation indicates, beyond what the numbers say. Therefore, the WP and Expected Points (EP) models used to estimate the values of the options no longer apply. In a nutshell, the analytic models underestimate the cost of failing.

[By the same token, the reverse argument should be just as valid. Wouldn't succeeding in a momentum-swinging play mean the chances of winning are even higher than the numbers indicate? For now, I'll set the 'upside' argument aside and examine only the 'downside' claim.]

Momentum Part 2: The Effect of Momentum-Swinging Events on Game Outcomes

Recently I tried to detect the existence of momentum within an NFL game. I examined drive success based on how 'momentous' the manner in which the offense gained possession. Admittedly, that analysis only measures one aspect of momentum. In this post, I'll take the analysis a step further and look at how a team's chances of winning are affected following several momentum-swinging types of events. This approach examines the potential effect of momentum on the entire remaining part of a game, not just on the subsequent drive.

Like the previous analysis, I relied on how possession was obtained as an indication of a momentum-swing. For all drives from 1999-2013 ( through week 8), I compared a team's expected chances of winning (based on time, score, field position, down and distance) with how often that team actually won. I divided the data among three categories: possession obtained following a momentous play, possession obtained following a turnover on downs, and possession obtained following a non-momentous play.

Momentous obtainment includes fumble recoveries, interceptions, muffed punts, blocked kicks, and blocked field goals. I excluded missed field goals from the analysis because it was unclear to me how momentous they are. They are often thought of as big momentum changing events in close games but are too common (almost 20% of all kicks) to truly be momentous.

Momentum 1: Scoring Rates following 'Momentum-Swinging' Events

Momentum might be one of the most over-cited concepts in sports. It's an idea borrowed from physics, and is something we witness every day. We see it in rising tides, building storms, and boulders rolling downhill. But does such a concept apply to sports? Certainly, better teams will likely continue to prevail, and lesser teams will likely continue to lose. But that's not momentum. It's just better teams being better.

In this article, I'll explain why I think we see momentum when it's not really there. And to test the existence of momentum within NFL games, I'll compare the results of drives following 'momentum-swinging' events with those following non-momentum-swinging events.

For momentum to be a real thing in sports, it needs to have some connection to reality beyond the metaphysical and metaphorical. The theory is that good outcomes are emotionally uplifting, which in turn leads to better performance, which then feeds upon itself. It's understandable to believe in game momentum when we see games like this each week:

When Should the Defense Decline a Penalty After a Loss? Part 1

Let's say there's a sack or other tackle that results in a several-yard loss. And to compound the offense's woes a flag for holding is thrown, potentially setting up a 1st and 20 situation. Should the defense accept or decline the penalty and force a 2nd and X? We can evaluate this question in a few ways. We'll use a simple method and a more complex method to find out when a defense should normally decline a penalty on first down.

Before you read on, what do you think the break-even yardage is? What do you think most coaches think it is?

Rest vs. Rust After Thursday Night Football

Andrew Mooney is the Co-President of the Harvard Sports Analysis Collective. He is a senior majoring in Social Studies, which is another way of saying he's an economics major. Andrew has worked as an analytics intern in the NFL for about two years, and previously wrote for the Stats Driven blog at Boston.com. He's a big fan of all Detroit sports, and he'll throw an octopus on your ice if you're not watching.

In struggling out of my bed at the witching hour of 8:00 am this morning, I had to wonder how much more equipped to tackle the day’s challenges I would be with an hour more of sleep. I then noticed that I wasn’t missing the tip of my finger, nor had I sustained a concussion the day before, incidents from which I would need significantly more than a week to recover. In this state of empathy, I couldn’t think of a more welcome time of the NFL season for a player than an extra day or two off.

Though I’m sure it provides players some much needed rest, it is not immediately clear what effect this time off has on performance. The qualitative cases for each side are pretty straightforward, and your grandfather used each of them liberally in instructing you in the wonders of sporting conventional wisdom. “Ah, they had an extra week to prepare AND get healthy,” he said knowingly after Washington’s 31-6 thrashing of the Eagles last season. “They just got rusty,” he told you after the Vikings fell to Chicago, 28-10, the following week. “What in the Sam Hill…” he muttered after the 49ers and Rams battled to a 24-24 tie.

"Thursdays are 6.3 percent less exciting."

Friend of ANS Aaron Gordon used the Excitement Index (EI) and Combeback Factor (CBF) to find out if the Thursday night games really are more boring than most games. From Aaron's article at Sports on Earth:

What NFL Network games have sorely missed are big comebacks. NFL Network games average a Comeback Factor 3.32 -- half the league average -- and only five games with a CBF of 5 or above (where the winning team had a win probability below 20 percent). By definition of the win probability model, 20 percent of the games played should feature a comeback with a CBF of 5 or above (which the larger data set confirms). For NFL Network games, its only 13 percent. For comparison, Monday night games --which often feature hand-picked matchups -- have an average CBF of 8.15, but are right about where they should be in terms of CBF games of 5 or above: 23 percent.

The Pay-Performance Linear Model

A couple months ago I posed an apparent paradox. Aaron Rodgers' new $21M/yr contract was either a solid bargain or a disastrous ripoff depending on how we analyze the data. By only flipping the x and y axes of a scatterplot, we can come to completely opposite conclusions about the value of a QB relative to what we'd expect for a given salary or for a given level of performance. Much of this post is derived from the many insightful comments in the original. Please take the time to read them, especially those from Peter, X, Phil and Steve.

By regressing salary on performance (adjusted salary cap hit on the vertical (y) axis and Expected Points Added per Game (EPA/G) on the horizontal (x) axis), Rodgers' deal is insanely expensive by conventional standards. But by regressing performance on salary, his new contract is a bargain.

Which one is correct? That depends on several considerations. First, there are generally two types of analyses. The one I do most often is normative analysis--what should a team do? The second type is descriptive analysis--what do teams actually do? The right analytic tool can depend on which question we are trying to answer.

The reason that we saw two different results by swapping the axes is that Ordinary Least Squares (OLS) regression chooses a best-fit line by minimizing the square of the errors between the estimate and the actual data of the y variable. OLS therefore produces an estimate that naturally has a shallow slope with respect to the x axis. When we swap axes, the OLS algorithm is not symmetrical because of that shallowness.

Feature Enhancement: Time Calculator

The Time Calculator is a tool that will estimate the time remaining in a game that a trailing defense can expect to get the ball back if they force a stop. It considers the current time and timeouts remaining while factoring in stoppage from the two minute warning and change of possession.

The previous version of the Time Calculator could only base its estimate beginning with the time of the first down snap of a series. For the vast majority of situations that's ok, because offenses will typically only run plays that avoid stopping the clock--runs that stay in bounds. But sometimes there is a stoppage, due to either an incomplete pass,a runner going out of bounds, penalty, or other reason.

The old calculator could account for an unexpected stoppage if you add a notional timeout to the game state. For example, say the defense began the series with 1 timeout, then used it following 1st down, and there was an unexpected stoppage after second down. This scenario would be no different than if the defense began the series with 2 timeouts rather than their actual 1.

Still, it would be easier and more straightforward to make the calculator work for any down. Now you can enter the time at the snap of any down in a series along with the number of timeouts remaining, and the calculator will estimate the time after the change of possession.

All the other options remain the same: the average duration of each play, the game-clock duration between plays, and whether the defense would prefer to trade away some time on the clock to preserve a timeout for use on offense.

Try it out. The Advanced NFL Stats Time Calculator.

The Extra Point Must Go


This week's article at the Post asks What's the point of the extra point?

 The extra point is something left over from gridiron football’s evolution from rugby. Originally, the ‘touchdown’ in rugby was less important than the ensuing free kick, and the points given for the touchdown and the ‘point after try’ varied during football’s early history. Today’s extra point is a vestige of football’s rugby roots. It’s football’s appendix–inconsequential, its original purpose uncertain...and safe to remove.

The Field Goal Likelihood Nexus

I'm not sure why meaningless things like this fascinate me so much. I was doing some experimentation modeling how often offenses are normally able to score given the various down/distance/yard line combinations. I always plot the results to make sure they make sense.

In this case I was looking at the probability of ending a drive with a made field goal in 2nd and 3rd down situations. (Second and third down modeling is especially challenging because there are fewer cases of each successive down. Plus there is an entire other dimension to consider--to-go distance. By comparison, first downs are almost always 10 yards to go.) After seeing the plots I thought there was clearly something wrong.

You'd expect that having fewer yards to go would lead to scoring more often, but once you think about it that's not always true when looking at only field goals. For most of the field, having fewer yards to go is better, but once a team passes a certain point, having more yards to go means it's more likely that a drive will stall inside field goal range.

Changes in the EP Curve over Time

I've written a lot about the continuing trend toward more potent offense and what it means strategically. But let’s look at things at a deeper level by examining trends in the overall NFL Expected Points (EP) curve.

As a refresher, EP is a concept of football utility. It measures the net point potential at any state of a drive, based on down, distance, and yard line. For example, a 1st and 10 at midfield represents 2 EP to the offense, meaning from that point forward it can expect, on average, a 2-point net advantage over its opponent. More details on the concept can be found here.

With offense gaining an ever firmer upper hand, the EP curve must be affected. But it can’t just be sliding up across all states. At its end-points, the curve must be bounded at slightly under 7 points at the opponent’s goal line to slightly less than -3 points inside a team’s own goal line. We would therefore expect the curve to bow slightly upward over time.

The graph below plots raw, unsmoothed EP values for 1st and 10 (or goal) states in normal football situations, when time is not yet a factor and the score is reasonably close. The blue line represents the first three seasons in my data set, 2000-03, and the red line represents the most recent three seasons, 2008-11.

How Much Time Does It Take to Get into FG Range?

You wouldn't know by reading this study. It sets up a multiple regression model that uses a kitchen-sink approach to estimating the time needed in the end-game to get to the 35-yd line, commonly accepted as FG range. It uses QB rating, time remaining at the start of the drive, number of all-pro players on the offense, time outs remaining, starting field position, home field advantage, and whether the 2-minute warning is still available. The dependent variable is the time taken to reach the 35.

There are numerous fatal problems with this study. First, the model assumes linearity of the effects of predictor variables. I can tell you from my intimate familiarity with the variables involved that they are not linear at all. The model also assumes a normally distributed outcome variable, which is not investigated, and I doubt could be possible because games are bounded by the expiration of regulation time.

The study uses 3 seasons of data, which only yields 92 example situations to analyze.

The authors find enormous multi-collinearity problems with their model, and I'm not surprised. The model specification looks like this:

time taken = constant + field position + time outs+ ...a bunch of other stuff... + game time at drive start + game time when reached 35

But doesn't time taken = time at start - time when 35 reached? Of course. You can't have a regression model where the dependent variable is always the exact sum of two of the independent variables. The model's r-squared is 0.97, because it's one giant tautology.

On Opponent Strength and Team Strength Correlation

This post at Football Outsiders caught my eye today. The IgglesBlog noticed something odd with their team rankings. I’ve notice the same phenomenon in my own systems—that team ranking methods that adjust for opponent strength tend to produce rankings that correlate (inversely) with a team’s strength of schedule. In other words, top ranked teams appear to have weaker schedules and low ranked teams appear to have stronger schedules. The problem is, assuming that a ranking method properly adjusts for opponent strength, it ostensibly should produce no correlation between each team’s ranking and its opponents' average ranking. In fact, we might expect the opposite result because of the two “strength of schedule” games each season—Last year’s 1st place teams play other 1st place teams, and so on.

In 2011 FO’s “DVOA” method correlated with opponent strength at -0.66, which is considerable. Here at ANS, Generic Win Probability correlated with Average Opponent GWP at -0.60 this season. FO notes that in other years the correlation isn’t nearly as strong, but there is an apparent tendency for negative correlations for most seasons.

This phenomenon was first pointed out to me a couple years back by a reader, and I too thought it was either a) randomness, or b) a flaw with my methodology. But I soon realized this is exactly what we should expect given the NFL’s scheduling rules. It’s neither luck nor a flaw. In fact, it's a sign the method is doing something right.

Consider a fictional four-team football league. Presume we have a perfect team ranking system that can peer omnisciently into each team’s soul to know its True Winning Probability (TWP). The Sharks, Knights, River Dogs, and Jack Rabbits each have a TWP of 0.75, 0.60, 0.40, and 0.25. (Notice the TWPs average to 0.50, as they would have to.)

2011 Regular Season Play-by-Play Data

Now uploaded. Find it here. Happy cruching.

EPX

The Packers' defensive efficiency ranking is a lot lower than their rankings in terms of other advanced statistical yardsticks including Expected Points Added (EPA), Win Probability Added (WPA), and Success Rate (SR). One explanation for the discrepancy is that GB's offense is so good that the defense can afford to guard the sidelines at the end of games, allowing teams to move the ball while burning clock. Because they know teams need to throw the ball to keep up, they can create big plays, particularly interceptions and big sacks. Their very high interception rate bolsters this theory.

Trash-time distorts the relationship between true team strength and team statistics, be they conventional or advanced, total stats or per play. To determine true team strength, we need to weed out the random outcomes and discount trash-time performance.

WPA is probably the ultimate explanatory statistic. EPA is less explanatory and more predictive, because it's not subject to the leverage of time and score, but it's also subject to the random outcomes of a bouncing or tipped oblong ellipsoid.

One way to eliminate trash time from the data would be to simply throw out the fourth quarter. As it turns out, there is a lot of baby in that bathwater. A better way might be to throw out data based on Win Probability (WP). A statistic that's based on EPA, but limited to when the game is still in play, could be the answer.

There's still the problem of the bouncing ball. There are sometimes huge EPA plays--James Harrison's 99-yard TD return in the Super Bowl a couple years ago comes to mind. A play like that  represents almost a 12-point swing in EP, but it's the kind of event that's so rare that it makes little sense to project future team performance on such a distorting play. Put simply, it does not have the equivalent predictive value of two solid 80-yard offensive drives.

But it's representative of something. We don't want to throw plays like that out. What we can do is limit their statistical impact. We can cap their EPA value at a certain amount, so that no single play will have more or less than a chosen value.

Interceptions by Targeted Receiver

One thing I've learned to appreciate is how much interceptions can be the fault of the intended receiver. Whether it's a bad route, a misplayed tip, or a failure to fight the defender for the ball, receivers can often be as much if not more at fault than the passer.

We're always happy to list interceptions by quarterback or by defender, but rarely do we see interception stats by receiver. So I put one together. These are not interceptions I have qualitatively determined to be the receivers fault, but simply all interceptions when the receiver was listed as the intended target.

QB Sneak vs. RB Dive

In the NO-ATL game Sunday, ATL went for a risky 4th down and 1 conversion attempt in OT with just inches to go. They elected for a RB dive play rather than a QB sneak. (By dive play, I just mean a straight RB handoff directly between the tackles.) But all '4th and 1' situations are not equal--from 1.5 yards down to an inch to go.

QB sneaks seem more successful on inches-to-go situations than RB dives. We'd like to know if the data back this up. Unfortunately, the play descriptions don't note how long the 'and 1' is, whether it's a long yard or just inches. We'd expect to see more QB sneaks on the shorter distances and more RB dives on the longer distances, which bias the numbers because longer to go distances would naturally be tougher to convert. Still, we may be able to draw some inferences.

The table below lists the success rates for 3rd and 4th down runs with 1 yard to go. It breaks out plays by QBs, RBs, and FBs. QB scrambles on pass plays have been removed. Kneel downs and spikes are also removed. Plays inside the 10 yd line are removed due to field compression effects.

WP: Instant Offensive Improvement

I've poured over the data, and spent the last few days doing lots of advanced math, and I think I've figured out the Redskins' problem: They don't have any good players on offense.

There isn't much that can be done about their roster at this point in the season, but here are five things the 'Skins can do to improve their offense overnight.

How Quarterbacks Age

Peyton Manning signed a $90 million contract extension that would hypothetically keep him playing through age 40. Nagging questions about his health will almost certainly plague him for the remaining years of his career. To back up the ailing Manning, the Colts brought in an even older veteran passer, Kerry Collins. In Philadelphia, Michael Vick signed a $100 million contract that will supposedly keep him playing until he's 37. What kind of performance should we expect from older passers?

It's a much more difficult question than it first seems. Averaging the performance of all the recent QBs by age doesn't work. A survivor bias ensures that only the successful QBs stay in the league long enough to have their stats in the sample. Another complication is the the steady inflation of passing stats over the years. In this post, I'll try to tackle those problems to better understand how QB performance is affected by age.

ESPN's New QB Stat

I, for one, welcome our new statistical overlords.

ESPN has a talented new analytics team, and their first foray into football is their Total QB Rating. It seems the first thing anyone does when they get into advanced football stats is to create their own QB rating system. The QBR is a major improvement over the NFL's traditional passer rating, and there are a lot of things I like about it, but it's not perfect. I'll try to summarize my understanding of the stat, and then I'll list the things I like about it and the things I don't like so much. As we say in the fighter pilot business--the goods and others.

According to ESPN's own explanation, the stat is based on three primary concepts--Expected Points, Win Probability, and division of credit. As I understand it, QBR begins with a QB's Expected Points Added for each play in which he was directly involved, including both pass plays and runs. It modifies each play's EPA value according to a clutch factor, which is based on Win Probability (WP). Here, I use something similar known as Leverage Index (LI). LI is the ratio of the potential swing in WP for a play compared to the average play's potential swing in WP. For example, an LI of 3 means that a play is 3 times more critical to a game's outcome than the typical NFL play. (You can find any play's LI on the interactive WP graphs here by hovering your cursor over the graph. I still consider it a 'beta' stat because I haven't settled on a final, single definition of potential success and failure for every play.)

Full 2010 Play-by-Play Now Available

Full play-by-play data for the 2010 season, including the post-season, is now freely available here. Happy crunching.

What Proportion of Football is _______?


How much of the game is passing, running, kicking, or punting? How much of football are interceptions? Fumbles? How much do penalties impact the outcomes of games? One way to answer that is to simply add up how many of those kinds of plays occurred and divide by the total number of plays. But that’s not going to tell us much, because each kind of play tends to have a different magnitude in terms of its effect on the outcome of games.

When we talk about “how much of football is” something, we need a good definition of what football is. The essence of football, like any other sport, is about competing and winning. So when we ask how much of football is passing, we want to know the impact of passing plays on the outcomes of NFL games compared to other types of plays.

Win Probability can provide the answers. In each game, the WP of each team starts at 0.5, but must end at 1 or 0, for a net total of 0.5 WPA. But between the first kickoff and the final whistle, the WP can swing up and down, traveling far more than the net 0.5. To calculate the total movement, we just need to add up the absolute value (7th grade flashback) of each play’s Win Probability Added (WPA).

EPA by Pass Depth and Down

In response to the recent post about the comparative success of short and deep passes a few readers asked for a break out of the results by down. Some suggested the advantage of deep passes might only due to unsuccessful check downs on 3rd down, which is certainly plausible. The numbers are going to be slightly different here because I neglected to exclude red zone attempts in the original post. (Please read the original post for definitions and caveats.)

Here is the break out by down. It appears that the advantage exists on all downs, and the advantage is not significantly greater for 3rd downs than for 1st or 2nd downs. But that doesn't necessarily rule out the check-down effect, as they are a common tactic on all downs.

Predictivity

Success Rate (SR) is a simple measure of whether or not a play improves an offense's expected net point potential. It essentially ignores the magnitude of a play's result, and instead focuses only on whether a play was simply a good outcome or bad outcome.

Although team SR statistics ignore important information in terms of explaining past wins, it may be able to predict future outcomes better than other measures. A team's SR on run plays is particularly informative, because it is not sensitive to the low-frequency but high-impact events that are largely subject to randomness, such as long broken runs or turnovers.

Compared to simple running efficiency, run SR correlates better with winning (0.39 compared to 0.15). This is telling and helpful, but it only accounts for past outcomes. How well a statistic predicts future outcomes is not just about the parlor game of picking winners. Stats that predict future outcomes measure the signal of how good a team really is, underneath all the noise of randomness.

In other words, there is no 'right now.' There is no is. There is the known past, clouded by randomness, and there is the unknown future, clouded by uncertainty. Now is merely the ephemeral intersection between the past and future. When trying to measure team strength or player ability, the focus should be on how well a team or player is likely to play in the future.

Play-by-Play Data

I've recently completed a project to compile publicly-available NFL play-by-play data. It took a while, but now it's ready.

The resulting database comprises nearly all non-preseason games from the 2002 through [edit: 2012] seasons. I have not performed any analysis on the data, so what you'll get are the only basics--time, down, distance, yard line, play description, and score. It's almost exactly what I started with. I'll leave any analysis up to you.

Measuring Defensive Playmakers

Traditional individual defensive stats don't tell us much. There are tackles, sacks, and turnovers, and that's pretty much it. Recently, I developed "Tackle Factor," a way to make better sense of tackle statistics, at least for the front-seven defenders. It's not perfect, but I think the consensus was that it's a step forward. Still, there's much more that can be done.

Offensive stats are straightforward, but objective defensive stats are problematic. When a running back picks up a 10-yard gain, although other teammates contributed, that's obviously a good play by the ball carrier. And when a running back stumbles at the line for no gain, that's obviously bad. But looking at the same two plays from the other side of the ball is much trickier. A strong safety, say Troy Polamalu, who makes the best play he can by preventing the runner getting past 10 yards, would be be debited for that 10 yard gain. The other four or five defenders who had a chance to make the play sooner, but didn't, aren't mentioned in the play description and wouldn't be docked for the play.

On the other hand, if Polamalu is playing run support, and he reads the play and stuffs the running back at the line, that's certainly to his credit. If only there were a way to credit each defender for plays like this, and at the same time ignore the plays that really should count against his teammates.

Football Island

Back in 1997 I was spent Easter Sunday with a good friend. His brother, who lived in the La Jolla area of San Diego, hosted us for dinner. On the drive up the La Jolla ridge overlooking the Pacific, my friend pointed over to the right side of the road and said, "That's Junior Seau's house." I caught a glimpse of number 55, along with what looked like a dozen family members filing out of a couple vans in front of the house. They were the largest human beings I had ever seen. The women were large, the men were unimaginably large, and even the children seemed enormous. Junior somehow appeared to be the runt of the family. I got the impression Samoans were all giants.

Last season there were 30 NFL players of Somoan descent, and 200 more playing Division IA college football. (Here's a list from 2008.) That's a lot of players from a group of people whose entire population could easily fit into an NFL stadium. Earlier this year 60 Minutes aired a profile on Samoan football, and if you missed it it's a great story. (I've embedded the clip at the end of this article. Edit--CBS has killed the link.)

Undoubtedly, the culture and character of the Samoan people are factors in their disproportionate level of success in football. But, as my drive through La Jolla suggested, hereditary traits may also play a role. Still, how can a single small island produce so many top players?

"Tackle Factor"

I keep seeing 49ers linebacker Patrick Willis' name listed at the top of defensive player statistics the last few years. He led the league in tackles in 2009 and 2007, and was second in 2008, but does this mean that Willis is really a top player?

Most fans understand that the tackle statistic is not a very good way to measure a defender. Weaker defenses tend to give up longer drives, giving players more opportunities to make tackles. So in a perverse way, more tackles can be a bad thing. If a defensive back has a lot of tackles, it may be because he's being thrown on successfully. Plus, certain positions get more tackles by the nature of team defense. Middle and inside linebackers will naturally have the most tackles by virtue of their role and where they are at the snap. If you scan down the list of the season leaders in tackles, you're likely to see a simple list of each team's central linebacker, assuming he was healthy most of the year. So how can we tell if Patrick Willis is really that good using just tackle information?

Fumble Rates by Play Type

A lot of analysis of running vs. passing takes into account the added risk of interceptions. Almost all sources of passing stats will include team or player interceptions. But fumbles are more tricky. Stat sites will tell you team fumbles, but they usually won't tell you how many were due to rushes or due to passes.

Unlike interceptions, fumbles can happen on either type of play. But is the risk of a fumble even between runs and passes, or are fumbles more likely to occur on one or the other type of play? Further, what about fumbles lost? Are fumbles more likely to be lost on runs or passes? And how does the sack-fumble factor in?

Expected Point Values

Lately I've been using the concept of 'Expected Points' (EP) as a measure of success for football plays. It's been the foundation of much of my analysis of fourth down decisions, onside kicks, run-pass balance, and even touchbacks.

Every down-distance-field position combination has an average net point advantage. For example, when an offense has a first and goal at their opponent's 1-yard line, they can expect about a 6-point advantage over their opponent in the long run. A first and 10 at midfield is worth about 2 EP.

Expected Points on first downs are easy to compute because there are so darn many 1st and 10s compared to any other down and distance combination. Here is the EP chart for 1st and 10 (or goal) from my fourth down study earlier this season.

Does FG Accuracy Decline In Clutch Situations?

Like most other Baltimore fans, I was disappointed at the end of the most recent Ravens game when kicker Steve Hauschka missed a 44-yard field goal that would have capped a dramatic comeback. What made it worse was that I had to suffer through the usual nonsense from the local sportswriters about how Matt Stover, the popular long-time Ravens kicker until released this year, would have undoubtedly made that kick.

The jury is certainly still out on whether Hauschka is any good, but let's keep one thing in mind. NFL kickers as a whole only make kicks from that distance 70% of the time(including blocks). We simply don't remember all the missed field goals in the first quarter, or when our favorite team is already 17 points ahead or behind. The ever-clutch Stover? His career accuracy from that range was...70%.

But then I wondered whether FG kickers are affected by the game situation. Do their nerves get rattled? Are kickers less accurate in clutch situations when the game is on the line?

More on the Cost of Interceptions

In a recent post I looked at the cost of interceptions in terms of equivalent yards and expected points. In this post, I'll look at them in terms of win probability added (WPA).

In a comment on my Fifth Down post regarding the context of Jay Cutler's 2008 interceptions, Will wrote:

"I know you can calculate a change in WP for a given play; this is how you came up with the best plays of the year for last season. Can you also calculate an average change in WP for a type of play? For example, can you find the average change in WP for a Cutler interception vs. a Favre interception vs. the league average, to see who throws more bad picks? I've long felt that many of Favre's interceptions equate to punts, as he throws it up deep on a late-down, long-yardage situation. By the same token, it might be good to know which passers have the highest delta-WP per attempt, or which rushers most change their team's fortunes per rush."

You can read my response in the original post, but I'll expand on it here. Will was getting a little ahead of me because I'm planning on publishing some neat stuff on individual player WPA (win probability added) later this season.

To make things a little easier on myself, I'll cite Bronco and Jet passing game numbers from 2008, not necessarily Cutler and Favre, but I think they're identical for practical purposes. I'll also calculate the league average.

Denver's 18 INTs cost a total of -1.56 WPA (or, in a sense, lost 1.56 games ). That averages to -.087 WPA/INT.

New York's 23 INTs cost a total of -2.49 WPA. That averages to -.108 WPA/INT. You could say that the Jets' interceptions were about 20% more costly than the Broncos' last year.

For reference, there were 465 INTs in the league in 2008, costing a total of -46.98 WPA. That averages to -.101 WPA/INT. So on average, an interception costs a team a 10% chance of winning. An interception equates to 3.8 points, 60 yards, or 10% WP lost.

Interceptions are not typically similar to punts. I've read elsewhere (I think at footballcommentary.com) that interceptions are on average returned to about the line of scrimmage. My data shows something slightly different. Interceptions are on average returned to within 8.1 yds of the original line of scrimmage. Removing the 2nd and 4th quarters from the data to account for Hail Mary interceptions, it's 7.2 yds. So, I suppose you could consider a typical interception like an incomplete pass and then a really, really short punt.

But WPA for any particular interception is dependent on a number of factors. Field position and score are obviously critical, so here is a graph of interception WPA by field position, broken out by selected score differences. Despite the noise in the graph, there are some points to be made.


It's interesting how the WPA drops the steepest for when a team is down by 3 points (the red line). Throwing a pick deep in one's own territory when up by 3 points is nearly equally as costly. The bigger the difference in score, whether ahead by a lot or down by lot, the smaller the impact of the interception. The graph makes sense (at least to me)--it's what I'd intuitively expect.

Time is also critical. So here is the same graph, except limited to only 4th quarter interceptions. (WPA is a function of many things--score, time, fld position, down, to go distance--that I can't show everything on a single graph.) It's a little noisier, so I grouped the field position by 20-yard chunks instead of 10.


Again, we see what we'd expect. The tighter the score, the more costly the interception. Tied or down by 3 in opponent territory is where they're the costliest.

Adjusting Adjusted Yards Per Attempt

Reader Jeff Clarke sent me an email a few weeks ago asking about the interception yardage value used for the Adjusted Yards Per Attempt (AdjYPA) passing statistic. AdjYPA is total passing yards minus 45 yds for every interception thrown, divided by total attempts. It's a really handy stat because it encapsulates passing performance as a simple, single number, and better still, it's a rate stat.

The 45 yard adjustment number comes from the 1988 book Hidden Game of Football. The authors don't fully explain how they arrived at that figure, but I gather it was based on an analysis based on expected points. They do however, make a good intuitive case for it. An interception can always be thought of as costing any chance at a first down and precluding a punt. Punts net between 35 and 40 yds, and forfeiting a chance of the first down costs and extra few yards, which together comes to about 45 yds. Perfectly reasonable...for 1988.

Fast forward 21 years and the passing game, and offense in general, has become more potent. With offenses being more efficient, the value of having the ball is therefore greater, and turnovers would accordingly be more costly.

Jeff Clarke made a great observation. He wrote,"your own 15 yard line is the point of indifference. Holding everything else neutral, you are indifferent between having the ball at your own 15 and your opponent having it at his 15. Doesn’t this mean that the penalty for throwing an interception on first down should be 70 yards – the distance between the 15s?" (Expected Point curve below).


Jeff went on to point out that the cost of an interception would be less on 3rd and 15 than say, 2nd and 1 because the expectation of a first down is different for each situation. Jeff's analysis predicts that the true yardage equivalent would be something shy of 70 yds--the distance between the 15 yd lines. I was convinced to dig a little deeper.

The average difference between interception plays and non-interception passes is 3.81 expected points. This is the weighted average for all plays on 1st, 2nd, and 3rd downs for all yard line. It accounts for return yards, and down & distance situation. I excluded 4th down passes, as those are often thrown in desperation situations, where high levels of risk are acceptable and the cost of the interception is not much different than a simple incomplete pass.

3.81 points equates to approximately 60 yards of field position. The graph below plots it nicely. The green line is the EP for non-interception plays, and the blue line is fit to the EP following interception plays.


EP is roughly linear away from the end zones. So if we look at the expected points graph for non-interception plays (the green line), +3.8 EP is at the opponent's 20 yd line. And the 0 EP point intersects at a team's own 20 (80 yds from the end zone on my graph). That's a difference of 60 yds.

60 passing yards is the modern interception equivalent of an interception, not 45.

The Value of a Touchback

This season will be the first that the Baltimore Ravens will start the year without place kicker Matt Stover. Stover has been a reliable fixture for the franchise for its entire existence. He's known for his reliable medium-range accuracy, but his field goal range and, possibly more importantly, his kickoff distance dwindled in recent years. Last year's kickoff specialist Steven Hauschka will now take over as the full-time field goal kicker.

Keeping a kickoff specialist on the roster has become somewhat fashionable in the NFL, but I'm not sure when the trend started or exactly how many teams do it. It's an expensive thing to do, not just in terms of salary, but in terms of a roster spot too. If you've read John Feinstein's Next Man Up, you know how precious every spot is for the coaches, and how difficult the weekly decisions are about who to dress for each game. A kick-off specialist is a costly luxury.

But maybe we're thinking about this backwards. Maybe we should ask whether it's worth it to have a field goal specialist.

Assessing FG Accuracy

It's been shown here and elsewhere that FG kickers are very hard to tell apart from one another. I have no doubt that it takes great skill and countless hours of dedication be as good as NFL kickers are. However, almost all kickers at the professional level can be considered statistically as accurate as any other.

By "statistically accurate" I mean that accounting for small sample size, environmental variables, and attempt distance, it is virtually impossible to tell one kicker apart from another. A kicker who is highly accurate one year is not likely to be as accurate the next. A big part of this variability in accuracy can be attributed to what's known as 'sample error.'

Typically, NFL FG kickers have between 30 and 40 attempts in a season. Think of baseball batters' averages after only 40 at bats, which would be about 8-10 games into the season. By this point some replacement-level guys are batting .500, and some future Hall of Fame sluggers are batting .100. But absolutely no one thinks the batters are truly .500 or .100 hitters. It's just a matter of a small statistical sample, which makes it impossible to really assess individual batting skill. And if batting had a wrinkle similar to FG attempt distance, it would be even harder to assess skill.

To me, it's absolutely laughable that some teams' kicker jobs are decided by pre-season contests based on maybe 4 or 5 attempts per kicker. I can only hope that coaches are really making these decisions based on many more attempts in practice.

The point is that we need dozens and dozens of attempts, from various distances and in various conditions, just to begin to be able to tell one FG kicker apart from another. And I bet that if we actually could tell good kickers from lesser ones, a very large part of the difference would be due to range.


So if range is important to both kinds of kicks, wouldn't a team prefer the guy with the deeper kickoffs? Plus, range is something we can actually measure. I can't definitively prove my point of view in a single post, but I can begin to look at some aspects of the value of deep kickoffs. In this post, I'll look at the value of something I think is often overlooked: the touchback.

The Value of a Touchback

About 10% of all NFL kickoffs (not including onside kicks) are touchbacks. Forcing the opponent to start at their own 20 doesn't exactly seem like a death blow, but it is modestly valuable.

The average starting position following all kickoffs (including penalties on the play) is the 30 yd line. But the average starting position for all non-touchback kickoffs is the 32. The difference between a touchback an non-touchback is 12 yds. If the 32 seems a little far down the field to you (like it does to me), it's because the median starting field position for non-touchbacks is the 27 yd line.

Here is the distribution of starting field position for non-touchback kicks.


[A couple of interesting notes. First, the spike at the 60 (a team's own 40 yd line) is from kicks out of bounds. Second, I think it's interesting that of long returns, there are many more that make it to the opponent's 30 or 20 or so than make it only to just past midfield. Then, if a returner makes it past the 20, he's probably going to make it all the way to the end zone.]

Back to touchbacks. Using the concept of Expected Points (EP), the average point value of a first down at each field position(see graph below), we can estimate the nominal value of a touchback. The 20 yd line represents 0.1 EP, and the weighted average of the distribution of non-touchback field position is 0.9 EP. That's a value of 0.8 EP per touchback. (This includes turnovers and penalties.)

Sacks are worth 1.7 EP, so a touchback could be considered the equivalent of about half a sack.


An alternative way of thinking of those 12 yards is to think of them as one additional first down required for a team to score. It's one more first down the offense will need to either score a TD or get into FG range. The average first down conversion rate in the NFL is 67%, so a touchback turns a TD drive into a FG drive or a FG drive into a punt 33% of the time.

We can also use the concept of win probability to asses the value of a touchback. Over the past 9 seasons, non-touchback kicks average a change of 0.002 in WP. Touchback kicks average an increase of 0.01 WP. The net value of a touchback is therefore an increase in 0.008 WP, or about 1%. One percent isn't much at all, but with about 5 kickoffs per game for each team, the effect can add up.

The WP added (WPA) of any given kickoff depends on the leverage of its particular game situation. With the game close and time dwindling, a touchback or deep kick can make a 2-minute offense that much harder for the offense.

The biggest two touchbacks in my database (going back to 2000) were each for 0.13 WPA. Kicker Steve Lindsay, who played only two seasons in the NFL, was picked up by Denver from Jacksonville halfway through the 2000 season. With the Broncos Trailing 37-31 to the Chargers and 4:05 left in the 4th quarter, Lindsay boomed the touchback heard 'round the world (not exactly--but it should have been.) Field position in this situation was critical. A FG by San Diego would have clinched the game, and Denver needed the ball back in as good field position as possible. As fate would have it, the Broncos went on to win the game 38-37.

Arizona kicker Neil Rackers owns the other touchback of the decade. In a 2007 game against Seattle, tied 20-20 with 4:53 remaining in the 4th, Rackers' touchback made it that much more difficult for the Seahawks to put together a game-winning drive, and would have made Arizona's own drive that much easier. What actually happened was that Seattle fumbled on a 1st and 5 from the Arizona 36, allowing the Cards to put together a FG drive to win the game. Sure, the game turned on a turnover and not field position, but had Seattle found itself on the Arizona 24 and not the 36, maybe the play call would have been a little safer. We'll never know.

Anyway, those are my two nominations for the touchback hall of fame. It's not the most glamorous play in football, but it's certainly overlooked and worthy of examination.

Ascent of the Tight End

The Tight End position appears to be increasing in prominence in the passing game. Since 2001, TEs have been accumulating an increasing number of receiving yards, both in terms of total yards and in terms of the share of the overall receiving yards by all position types.

When I was going over the Koko fantasy projections for TEs, I noticed the trend illustrated in the graph below. TEs have been getting more receiving yards each year since 2001. Not a single year saw a decline, and the increase has been over 40% since the beginning of the nine-year period.


The increase from 2001-2002 can be partially explained by the expansion of the league by the addition of the Texans. However, the increase in that year exceeds the 1/31st increase we'd expect, and the increasing trend continues to be steady.

The increase is not simply due to an increase in overall passing yards as a whole. Receiving yards by wide receivers have held very steady and receiving yards by running backs have slightly declined during that span. The graph below shows the relative change in receiving yards by position compared to the 2001 baseline.

The decline in RB receiving yards is small and does not account for the increase in TE yards. In other words TEs are not robbing passes that used to go to RBs. The increase in TE yards are "new" yards. TEs eclipsed RBs in receiving yards for the first time in 2004, and have continued to hold a an edge ever since.



This isn't too surprising given the abundance of pass catching TEs around the league. Anyone following the game for the past 20 years has witnessed the evolution of the TE position. (And I suspect I may be retreading ground others have plowed.) Forty percent over nine years, however, is a stark increase. Could you imagine an equivalent change in baseball, say a 40% increase in hits by second basemen over nine years? It would be stunning.

The increase is far too large to be explained by a handful of outliers like Gonzalez, Heap, Clark, or Gates. It's due to a change in the league as a whole. I'm not sure when the trend began as my data only goes back to 2001. The PFR guys have data back to the dawn of time, and might be able to shed some light.

Comparing Running Performance

This post follows a discussion of how to rate running back performance (or team rushing performance) that began at PFR and continued at Smart Football. I'll add my two cents here.

Yards per carry (YPC) is a useful stat, but it doesn't tell us everything we want to know. Median yards gained isn't very useful because, with rare exceptions, every RB will have a median gain of 3 yds. There are any number of suggestions for alternate measures such as yards above team median, yards above replacement, or success rate (the Hidden Game of Football system used by Football Outsiders). The comments at the Smart Football post feature a great discussion of the topic. Unfortunately, there really is no single number that can capture the full picture. In fact, what we really need is a picture.

I'll explain that in a minute, but first I want to address an age-old water cooler question that Chris discussed in his post at Smart Football. Consider two RBs, both with identical YPC averages. One however, is a boom and bust guy like Barry Sanders, and the other is a steady plodder like Jerome Bettis. Which kind of RB would you rather have on your team?

The answer is it depends. Essentially, we have a choice between a high-variance RB and a low-variance RB. When a team is an underdog, it wants high-variance intermediate outcomes to maximize its chances of winning. And when a team is a favorite, it wants low-variance outcomes. Whether those outcomes occur through play selection, through 4th down doctrine, or through RB style isn't important. If you're an otherwise below-average team, you'd want the boom and bust style RB. If you're an otherwise above-average team, you'd want the steady plodder.

The same concept applies within a game. If you're losing during a game, you have become the underdog no matter how strong your team seemed on paper before kickoff. In this case, you want to increase the risk-reward balance with high-variance plays. You'd accept the risk of a 10-yd loss in the backfield for the possibility of breaking a 40-yd run. But if your team is up by a TD, the 10-yd loss isn't so acceptable.

Further, even if the high-variance RB has a lower average YPC, we'd still might want him carrying the ball when we're losing. This is due to the math involved in competing probability distributions.

Now back to the question on how to evaluate a RB or team rushing game. Mean, median, or even mode are handy ways of describing a central tendency. But on their own, they don't paint the whole picture. It's a bit like the proverb about several blind men each grasping a part of an elephant. We could say that LaDainian Tomlinson's 4.4 career YPC figure is good because it's above average, but it doesn't tell us much more than that. It's like grasping the elephant's trunk. Instead, we can look at the whole elephant.

Below is the distribution of Tomlinson's career gains. The horizontal axis are the gains, and the vertical axis represents how often he got each gain. The blue line is distribution for the NFL as a whole, and the red line is Tomlison's distribution.


We could simplify the distribution into large bins selected for certain signifcance. For example, we could divide the distribution into all losses, gains of 1-4 yds, 5-10 yds, and 10 yds or more. Tomlinson might be a "10/45/35/10." This is unwieldy, but it's not much different than how the baseball guys use a similar shorthand for wOBA, BAPIP, and the other stats they often bundle together.

Not that I'd ever expect anyone to use this, but we could use a more technical shorthand. The RB gain distributions can be modeled as a gamma distribution, a bell-type curve described by 2 parameters--k and theta. For example, Tomlinson is a Gamma(11, 1.1). That's about all we'd need to know to reproduce his gain distribution. The parameters are not intuitive at all, so it's not a workable solution. (Perhaps someone out there might suggest a better type of distribution to use.)

To be honest, I was expecting a bigger difference between Tomlinson and the rest of the league. So I looked at some other RB's distributions. I wanted to see a difference between boom-and-bust guys and plodder-types. I picked Adrian Peterson and Brian Westbrook to compare to Jerome Bettis and Jamal Lewis. Their distributions are plotted below.





What amazes me is how similar they all are to each other and to the league average. One notable exception is Jamal Lewis' peak. He has significantly more runs of between 0 and 3 yards than other backs. If you read the plot the wrong way, this might appear good, but it's defninitely not. Usually, a RB needs 4 to 5 yards to just break even in terms of his team's probability of converting a first down. What we'd want to see on a RB's distribution is as much probability mass as possible to the right of 4 yards.

So if Bettis' distribution looks so much like Tomlinson's, how does Bettis have a 3.9 career YPC and Tomlinson have a 4.4 career YPC? As others have noted previously, the difference among RB YPC numbers primarily come from big runs. It's the open field breakaway ability that separates the guys with big YPC stats from the other RBs. Of Tomlinson's runs, 1.5% were for 30 yards or more. Bettis' 30+ yd gains comprised only 0.46% of his carries. The other RBs and the league average are as follows:

NFL 0.91%
Lewis 0.88%
Westbrook 0.93%
Peterson 2.20%

Adrian Peterson's 2.2% figure is exceptional. It's interesting because it really suggests that what separates Peterson as a great runner is based on only 2% or so of his runs. Otherwise, he's practically average.

Of course, the usual caveats apply. When talking about a specific RB, we are really talking about his team's running performance when the RB has the ball. And we haven't considered game situation yet. Ideally, we'd want to plot a series of distributions, one for each typical down and distance situation--1st and 10, 2nd and long, 2nd and mid/short, and 3rd and short. But that's a far cry from a nice handy single number.