Ruck Those Stats! Machine Learning As The New Coach
Ruck Those Stats! Machine Learning As The New Coach
In this project we aim to identify which quantifiable aspects of the game of rugby
union are most critical to carry a team to victory. To do this, we collect data from
thousands of past games and use this to train several learning algorithms. Using a
cross-validation algorithm we pick the most relevant features using the best available model. We then run the models again using these reduced features. Finally, we
shift from looking directly at the features to analyzing a teams deviation from their
past k performances in order to understand what aspects of the game usually require most focus and improvement.
1 Introduction+
!
2 Data Collection
3 Feature Selection/Preprocessing
From the website we obtained 22 different statistics
of each game. Most of these were in fact multiple performance metrics in one (Rucks won/ Rucks Lost was
just listed as one on the website). We preprocessed this
data so as to write all out statistics and eliminated redundant statistics, as well as performance metrics that were
too sparse. We therefore ended up with a staggering 38team performance measures per game. We chose these
performance metrics, as they are what both fans and
coaches analyze to understand if they had a good performance or a poor performance. Indeed we obtained game
stats sheets from a former professional rugby player, and
the stats they analyzed to evaluate their game performance were similar to the list of performance metrics we
included in our metrics.
In line with our objective, rather than comparing the performances of opposing teams, we only considered the
performances of one team as the input, and classified it
depending on whether the given team won the given
match. Initially we used all of these features for our first
algorithm. Seen as the ranges of the performance metrics
varied greatly (a team typically runs hundreds of meters
while they only get three or four tries per game) we decided to standardize the features (mean removal and variance scaling) to receive aid our models.
We began our project by having three distinct classes
for wins, losses and draws. As soon as we started to look
at the data and the result of the first machine learning
algorithm we realized that there were only a small number of draws (a little less 3% of the games were draws)
and their presences significantly dropped the accuracy of
our models. We therefore decided to rid both our training
set and test set of all draws. Again we do not believe this
goes against our objective since our mission is to gain
insight on the game of Rugby and we consider gaining
insight on 97% of the game will probably generalize to
that last 3%.
4 Two Approaches
We took two different approaches to creating the features we would input to a machine-learning algorithm.
Our first approach was to use the list of performance
metrics for a given game for a given team as the features
for a training input and classifying that training point as
won if the given team won the given game and as a loss if
the given team lost the given game. We train the different
models described below to this training data. Intuitively,
this first approach takes games on an individual basis,
looks at the teams performance, and tries to model how
each team has to perform with respect to each feature in
order to win or lose a game. If we have a strong model
that accurately predicts if a team won game given its performance, analyzing which features are most important
for the model should give us an idea of which performance metrics are most important for a team to win.
Our second approach is a little more elaborate and
comes from another strong professional belief in Rugby
that to do well in a game you simply have to have certain
performances metrics better than your usual performance.
For each performance metric we look at the performance
of the team in the game that we are analyzing, and subtract it from the same teams previous k games. The vector
containing these values for all of the features will be our
input vector for the game. We still classify this input as
we did in the first input (win if the given team wins the
given game). Intuitively this approach should have more
information on how the team is performing with respect
to its own standard in recent games, and might therefore
give a better idea if a team played well or not.
5.1 Initial+run,+setting+the+baseline+
As we explored the possibilities of proceeding
with this project it was necessary that we have a model
that would give us basic results and more importantly a
realistic baseline. With this in mind we used the Nave
Bayes algorithm. With no prior knowledge on the distribution of the features, we guessed a Gaussian distribution
and ran the corresponding Nave Bayes algorithm[1] with:
We inputted the raw feature matrix with the corresponding label vector and received a score on both the
training and test set. The Gaussian model reported a success rate of 16.7% on the training set and 24.2% on the
test set. These low results were largely due to the fact that
we were considering wins, draws and losses. But when
we realized that only 3% of games were draws, we took
these out of the modeling. Running the new data through
the Gaussian Nave Bayes, we got a success rate of
77.9% on training data and 71.3 on test data.
5.3 Feature+Ranking+and+Feature+Selection+
The next step is to understand how relevant are
the features we used. For this did an individual feature
ranking using an extra-trees classifier[2]. From Figure 1
we notice that the most important features were tries and
conversions made while red cards and mauls won are not
all that relevant.
5.2 First+Approach+
Given the above results, we decided to proceed using the data without draws and preprocessing
such that it was standardized to a Gaussian distribution of
mean 0.
No,Draws(Standardized)
Reduced,Features
Train
Test
Train
Test
Nave,Bayes,Gaussian
0.779
0.713
0.775
0.74
SVM,Linear,
0.901
0.775
0.841
0.785
SVC,Polynomial
0.91
0.753
0.899
0.764
SVC,RBF
0.901
0.775
0.89
0.77
Nearest,Centroid
0.793
0.752
0.788
0.754
Random,Forest
0.988
0.72
0.993
0.723
Table 1 Percentage of successful predictions for all algorithms using initial features and reduced features on both
training and testing sets.
5.4 Second+Approach+
As!mentioned!earlier,!this!approach!seeks!to!
take!into!account!the!momentum!of!a!season,!and!
model!what!aspects!of!the!game,!or!in!this!case!fea7
tures,!that!need!to!be!improved!from!game!to!game!in!
order!to!increase!the!chances!of!winning!the!next!
game.!
We!modified!the!data!as!described!in!part!4!
and!inputted!it!once!more!to!all!of!our!algorithms!and!
created!6!new!data!sets.!In!each!we!respectively!took!
the!averages!of!the!past!1,!5,!7,!10,!12,!and!14!perfor7
mances.!We!saw!a!slight!but!clear!increase!in!success!
rate!from!1!to!10!past!performances!but!it!then!de7
clined!as!we!considered!12!and!14.!In!Table&2&reports!
the!results!for!the!past!5!and!14!performances.!And!
Table&3&reports!the!results!for!the!past!10!perfor7
mances.!
!
5?Past,Performances
Figure 3 Ranking of 38 features using an extra-trees classifier. Tries and Conversions made rank at the top while Percentage of Mauls won and Red Cards lag at the bottom. Due to an
issue of scaling not all features are labeled above.!
!
Finally!we!want!to!see!which!features!were!the!most!
relevant!and!if!we!could!settle!for!a!set!of!core!fea7
tures!to!get!just!as!accurate!or!more!accurate!predic7
tions!than!we!did!with!all!38!features.!Figure&4&shows!
the!result!of!the!RFS!ran!on!the!data!with!past!10!per7
formances.!
14?Past,Performances
Train
Test
Train
Test
Nave,Bayes,Gaussian
0.721
0.712
0.737
0.704
SVM,Linear,
0.759
0.738
0.813
0.709
SVC,Polynomial
0.91
0.711
0.96
0.69
SVC,RBF
0.865
0.72
0.906
0.709
Nearest,Centroid
0.732
0.704
0.76
0.682
Random,Forest
0.99
0.694
0.993
0.668
!
Table&2&&Models&run&using&data&with&averages&of&past&5&
and&past&14&performances.&
&
!
Table&3&reflects!the!change!when!we!run!the!trimmed!
data!on!the!models!once!more.!We!observe!that!the!
linear!SVM!yields!the!best!results!with!79.9%!accura7
cy!on!the!test!set.!
!
&
Table&3&Models&run&using&past&10&performances.&This&
was&the&model&yielding&best&results.&&
&
As!we!did!in!the!first!approach!we!now!want!to!know!
what!the!ranking!of!the!individual!features!is!in!order!
to!get!a!grasp!of!what!were!the!most!relevant!factors!
in!deciding!whether!a!team!wins!or!loses!the!game.!
Figure&3&shows!the!ranking.!
6 Discussion
6.1 First Approach
For the first approach we see that our individual
rankings graph strongly indicates that there are a few
rankings that are significantly more important than the
rest. Upon inspection, we see that this corresponds to the
point scoring performance metrics such as tries scored,
conversions made and penalty kicks made. Since the
winner of a game is ultimately determined by the team
that has scored the most points, it make sense that these
features are most important in evaluating whether a team
has most likely won the game or not.
7 Conclusion
With 10 past games, the model is more accurate than
with the first approach, both with all of the features and
the reduced features.
By construction of our two methods this suggests that it
is more accurate to evaluate how well a team played by
considering their recent past performances than thinking
there is some absolute performance formula for a wellplayed game of Rugby. Nevertheless, they both confirm
that to be successful, a team must excel in a variety of
performance metrics, which most likely lead to success
only when they are combined. This optimal combination
of features seems to be somewhat stable as the 22 chosen
optimal features using 10 past games mostly overlaps
with the 23 features chosen for the first approach. Furthermore these performance metrics are in accordance
with the performance metrics professionals used to analyze their games.
8 Future
Our!next!step!would!be!to!see!if!our!results!still!
hold!when!we!test!them!on!games!from!other!tour7
naments,!and!see!how!that!impacts!the!accuracy!the!
models!and!hence!the!importance!of!our!chosen!fea7
tures.!
Even!though!we!deliberately!took!a!different!ap7
proach,!it!would!also!be!interesting!to!see!what!the!
results!would!be!like!if!we!combined!the!performanc7
es!of!opposing!teams.!Obviously!leaving!point!scoring!
metrics,!we!could!evaluate!at!which!metric!should!a!
team!outperform!the!opposing!team.!!
9 References
[1]
P. Geurts, D. Ernst., and L. Wehenkel, Extremely randomized trees, Machine Learning, 63(1), 3-42, 2006.
Patel, S. Parity and Predictability in the National Football
League
URL http://cs229.stanford.edu/proj2013/PatelParityAndPredictabilityInTheNationalFootballLeague.pdf