Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

DAT 121

Final Term Paper

Group 13

8th Dec. 2022

Evan Hayes, Matthew


Magnani, Aaron Tandatnick:
486146, 509084, 486107
Introduction

At its core, professional sports serve as a way for the average person to view the highest
caliber of athletes showcasing their staggering talents. In a truly remarkable display of peak
human performance, the spectacle that these athletes create captures the attention and wallets of
billions of people worldwide, generating an extraordinary amount of revenue. Spearheading and
organizing these athletic performances are the professional sports leagues, which act as a
governing body for the teams and facilitate all public games these athletes play in. The National
Basketball Association (NBA) has a widespread reach and popularity amongst fans domestically
and internationally, which contributes to its major stream of revenue. The NBA additionally
recognizes that their players are the central figures of their product, and so players receive
around a 50% cut of total league revenue. Similarly to most other professional sports leagues, the
NBA largely acts as a meritocracy, where a player earns a salary based on their success or
potential to succeed as a player. Moreover, the collective bargaining agreement (CBA) between
the players and owners has resulted in a complex salary cap and luxury cap system that upholds
many regulations surrounding how much a specific player can earn. However, despite the
intentionality surrounding player salaries, no universal methodology exists to predict a player’s
salary.
Thus, the purpose of this research is to determine what statistical factors influence a
player’s salary. As a player’s salary fluctuates throughout their career, many fans of the game
hold strong opinions about what they should be receiving, with little focus given to the statistics
related to a player’s on-court production. With this study, we hope to potentially shed light on
why certain players earn their specific salaries, as well as what on-court factors have the
strongest influence on the grand total.
When compared to the relevant literature, our study carves a niche in this space in two
places: it separates itself from the subjective analysis of mainstream publications’ predictions for
the future of an abstract player’s salary, and it relies mostly on advanced metrics, as opposed to
the statistical models. The Athletic’s article NBA Salaries Keep Going Up and Singapore travel
guide’s Can Statistics Used to Determine an NBA Player Salary both attempt to project the
future of the NBA salary cap and its effect on contracts, whereas our study analyzes a player’s

1
true contract value for the 2022-2023 season (See Appendix). We hypothesize that age and win
shares have the strongest impact on player salary.

Key Results

Our regression is displayed in Figure 1. At a significance level of 𝛂 = 0.05, we observe that Age,
3PT%, PTS, WS, and BPM are statistically significant variables in predicting salary. Age, PTS,
WS, and BPM all have positive coefficients. 3PT%–counterintuitively–has a negative
relationship to salary in the model; it is also the largest magnitude of slope coefficient in the
model. Age has the highest positive slope coefficient, with an $830,091.65 increase in salary for
every 1-year increase in age, holding all other independent variables constant. PTS resulted in a
similar slope coefficient magnitude, followed by WS and BPM. Our complete regression is
available in Figure 1. Scatter plots for Age and 3PT% are included in the graphs below. Overall,
the model is statistically significant at 𝛂 = 0.05, with an R2 of 0.67. However, it is important to
note that the R2 is an unreliable metric in MLR, particularly with ten independent variables. It is
also important to note that there is a fair degree of multicollinearity in the model. Advanced
basketball statistics (WS, BPM) are calculated using basic basketball statistics (PTS, 3PT%),
resulting in a correlation between variables. The logic holds in the model - PTS, PER, and BPM
all have VIF values over 10.

Figure 1: Age

2
As mentioned previously, Age has the largest statistically significant positive slope coefficient in
predicting salary. For an increase of 1 year of player age, the salary is expected to increase by
$830,091.65 on average, holding all other independent variables constant. This result is likely
due to two factors: player performance and contract structures. Generally, players reach the
highest performance–typically reflected by their salary–in the second half of their careers.
Additionally, NBA contracts are influenced by league requirements. For example, the league
implemented rookie scale contracts to limit a young player’s salary and protect team executives
from draft busts. Moreover, the maximum contract that teams are allowed to pay scales upwards
coinciding with player performance and years played.
Figure 2: Three-Point Percentage

Data

We found our data on basketballreference.com. Basketball Reference is the official


statistics provider of the NBA - the data is credible and consistent. The website contains box
scores on every NBA game since 1996, with both standard and advanced basketball statistics.
Our dataset focuses on the most recent season, 2022-2023, with observations on 467 players. We
chose variables that gave a holistic measure of player performance:
- Age:
- Minutes Played (MP): average minutes per game the player is on-court (out of 48)
- Field Goal Percentage (FG%) = (Field Goals Completed / Field Goals Attempted)
- 3 Point Percentage (3P%): = (3 Point Shots Completed / 3 Point Shots Completed)
- Turnovers (TOV): Number of times a player loses possession per game

3
- Points (PTS): Points per game
- Player Efficiency Rating (PER): a measure of per-minute production = (PTS +REB +
AST + STL + BLK - Missed FG - Missed FT - TOV) / (Games played)
- Usage Percentage (USG%): an estimate of the percentage of team plays used by the
player while on the floor = ((FGA + 0.44 * FTA + TOV) * (Tm MP / 5)) / (MP * (Tm
FGA + 0.44 * Tm FTA + Tm TOV))
- Win Share (WS): an estimate of the number of wins contributed by a player
- Box Plus/Minus (BPM): a box score estimate of the points per 100 possessions that a
player contributed above a league-average player, translated to an average team
- Salary: Annual base player compensation ($USD)
It is important to note that using conventional basketball statistics (Age, MP, 3P%, FG%, TOV,
PTS) in a model with advanced statistics (BPM, WS, USG%, PER) involves overlap between
statistics. As displayed above, PTS is an input in PER. This creates collinearity in the model.
However, we decided to choose these variables together to replicate expert analysis. NBA scouts
and franchise executives typically employ a combination of basic and advanced statistical
analysis to evaluate player contribution and compensation. Our independent variable –salary–is
continuous with a wide range of values. Descriptive statistics are provided for salary (and all
independent variables) in Figure 2. Our descriptive statistics are generally consistent with what
we would expect from the data. The average NBA salary for 2022-23 is $8,416,598, reflecting a
range of $5,849 to $48,070,014. This is much larger than the median of $3,722,040. Some of the
variables have high levels of kurtosis and skewness, notably BPM and PER. BPM and PER are
liable to generate outliers due to sample size. The outliers we observe across the two statistics are
low-salary players who excelled in a small number of minutes. We decided not to remove these
observations as their effects on the model are counteracted by variables that account for sample
size, such as USG%, MP, and WS.

Modeling

We developed a multiple linear regression in Excel to evaluate the relationship between NBA
players' Salaries and ten basketball statistics. We first conducted an F-test on the overall model to

4
evaluate the results. The model’s Significance Factor is 2.05E-103, lower than our significance
level of 𝛂 = 0.05–indicating that the model is statistically significant. Standard hypothesis tests
for MLR were developed for each variable to determine individual statistical significance. An
example is included below for Age - the process was repeated for every independent variable.

For each variable, the p-value (provided in Excel MLR) was compared to the 𝛂 = 0.05. In cases
where the p-value is lower than 𝛂 = 0.05, we reject H0 and conclude that [independent variable]
affects Salary. Age is the most statistically significant variable, followed by PTS, WS, 3PT%,
and BPM. All other independent variables fail the hypothesis test. From the regression, an
increase of [1 unit independent variable] is expected to increase/decrease salary by [respective
slope coefficient], holding all other independent variables constant. In the case of Age, a 1-year
increase in age is expected to increase salary by $830,091.65, holding all other independent
variables constant. The coefficients of statistically significant variables are included in Figure 1.
It is important to consider relationships between variables when assessing the model. To assess
multicollinearity, we used the statistical software R. Through the program, we developed a
variance inflation factor (VIF) table for all independent variables. The results are included in
Figure 5. The high variance inflation factor of several independent variables suggests a certain
level of multicollinearity in the model. Given the ideal VIF is 1, variables with a VIF greater
than 4 indicate multicollinearity. MP, TOV, PTS, PER, USG, and BPM all have VIF values
higher than 4. Similarly, the correlation table indicates multicollinearity in the model. As seen in
Figure 3, several variables have correlation metrics above 0.7. These include PTS/MP, WS/MP,
PTS/TOV, USG/PTS, WS/PTS, and BPM/PER. As mentioned before, this is partially the result
of overlapping statistics. For example, the number of points a player scores is part of the PER
calculation. The number of points the player’s team scores–including points scored by the
player–is part of the BPM calculation. This explains the 0.899 correlation between the two
variables.

5
Several outliers exist in the model. Outliers were evaluated through R residual analysis.
Observations with standardized residuals larger than +/-2, studentized residuals larger than +/-3,
or Cook’s D larger than 0.5 were investigated. Several players violate one or more of these
categories. Upon further analysis, it appears the outliers reflect the nature of NBA contract
structures rather than a failure of the model or data. For example, Russel Westbrook and John
Wall are considered outliers under the studentized and standardized residuals. The results are
presented below.

Both players received massive contracts during the prime of their careers (ages 27 & 28).
The large salaries ($47.3m & $47.1m, respectively) reflected their statistics at the time of the
agreement. Per NBA rules, players are guaranteed 4-5 year contracts. Now 32 and 34 years old,
the players no longer produce high statistical outputs. However, they are locked into some of the
highest-paying contracts in the league. This partially explains why Age is the highest slope
coefficient among statistically significant variables - as players get older, they are eligible for
more lucrative contracts. On the opposite end of this spectrum are players like Ja Morant. Morant
(-2.484 studentized residual) was in the final year of his rookie contract in 2022-2023. Rookie
contracts never exceed $15m in the final year. In July 2023–after the data was published– Morant
signed a deal with an annual salary of $34.05m. The increased salary would significantly
decrease Morant’s residuals. For future models, this problem can be remedied by including data
across several years, as contracts typically span 3-5 years.
Across the entire model, 63 observations were missing out of the total 5,137 (1.22%).
Blank data was replaced with zeroes. The missing data is unlikely to affect the model.

Summary

Our analysis provides insights into the variables that influence NBA player salaries. Age, PTS,
WS, and BPM were found to be statistically significant variables with positive relationships to
salary. 3PT% is statistically significant but negatively affects salary. Our findings generate
several implications for general managers of NBA franchises. First, the residual analysis is

6
helpful in finding undervalued outliers. Players like Ja Morant, Desmond Bane, and Jordan Poole
are all outliers because their contracts do not reflect their statistical output. Managers can gain a
competitive advantage by identifying these players. On the opposite end of the spectrum, our
model can help managers avoid rewarding maximum contracts to players with low statistical
output, such as Russel Westbrook and John Wall. Given the increasingly lucrative nature of NBA
contracts, such analysis can save managers tens of millions of dollars. While we cannot make
conclusions on other datasets based on our model, it may be a useful starting point. Scouts may
use the statistically significant variables in our model –Age, 3P%, PTS, WS, and BPM–to begin
to value prospective talent in the high school, college, or the NBA G-League. More analysis is
necessary to determine if these relationships hold at different levels of basketball.

Appendix

Figure 1: Complete MLR

Figure 2: Descriptive Statistics

7
Figure 3: Correlation Table

Figure 4: Scatter Plots

8
9
10
11
Figure 5: VIF Table

Figure 6: Residual Scatter Plots - Statistically Significant Variables

12
13
Figure 7: Box Plus/Minus Formula Explanation (Via Basketball Reference)

Relevant Literature Citation


Vorkunov, M. (2023, October 24). NBA salaries keep going up. prepare to have your
mind blown soon. The Athletic.
https://theathletic.com/4740069/2023/08/03/nba-salary-cap-rise-jaylen-brown/

Can statistics be used to determine an NBA player salary. Singapore Travel Guide. (n.d.).
https://www.streetdirectory.com/travel_guide/41055/recreation_and_sports/can_statistics_be_use
d_to_determine_an_nba_player_salary.html#:~:text=The%20general%20consensus%20of%20ba
sketball,popularity%20and%20non%2Dgame%20issues.

14

You might also like