Tuesday, July 1, 2014

More on outliers and running backs.

I have not been blogging much because I am about to move to Denver.  Here is one more post on statistics of running backs.

More on the impact of outliers on running back comparisons

In a previous post, I pointed out that one of the first-choice running backs never played in the NFL due to an injury.


This post builds on that previous post by examining the impact of this outlier on the comparison of average first-choice and second-choice running back performance.

Raw data was posted at



Question:  What is the average number of carries, total career yards, yards per carry, and touchdowns for first-choice running backs, first choice running backs minus Larry Stegent, and second-choice running backs?

Is there a significant difference between first-choice RBs and second-choice RBs?

Are the simple t-tests for difference in means used in this post the best way to assess the value of the first running back picked compared to the second running back picked?


Discussion

The average for the four statistics are presented below.



Averages
First-Choice Entire Sample
First-Choice Without Larry Stegent
Second Choice Entire Sample
Carries
1554
1603
1286
Yards
6506
6709
5254
Yards Per Carry
3.91
4.03
3.99
Touchdowns
46.64
48.09
35.36


On average, first-choice running backs had more carries, more yards, and more touchdowns than second-choice running backs even when Larry Stegent is included in the sample.  These difference appear to be non trivial.

The Difference between yards per carry first-choice and second-choice RB is small.

The t-tests on the difference between the average for the four statistics – with and without Larry Stegent --- are presented in the next table.


P-value for students t-test on difference in means
Entire Sample
Sample missing 1970 for first-choice players
Carries
0.317
0.238
Yards
0.280
0.256
Yards Per Carry
0.563
0.623
Touchdowns
0.218
0.200
Test assumes identical variance in both samples.


The tests for difference in mean performance between first-choice and second-choice running backs do not rule out the possibility that this difference is zero.

This result is not changed when Larry Stegent is excluded from the sample.


Discussion of relevance:


The difference in averages first choice versus second-choice (carries, yards, and touchdowns) appears non-trivial.   However the t-test reveals the difference is not significantly different from zero. 

WHY?

The standard deviation across running backs is also very large.  Look at the raw data!

Some of the dispersion in running back performance occurred because in some years there were a lot of great running backs up and in other years running back quality was low.  A paired t-test would help account for differences in year effects.


It might be useful to count the number of years that the first choice running back outperforms the second choice running back.   

There is also an outlier impacting second-choice running backs.   See Emmitt Smith’s TD total.

This suggests we should be using a Wilcoxon test or some other non-parametric procedure.

 #NFL
#outliers
#NFL DRAFT





No comments:

Post a Comment