## Monday, August 12, 2019

### More on outliers and running backs.

In this post I exclude an outlier from the running back data but continue to use a classical hypothesis test.

More on the impact of outliers on running back comparisons

In a previous post, I pointed out that one of the first-choice running backs never played in the NFL due to an injury.

This post builds on that previous post by examining the impact of this outlier on the comparison of average first-choice and second-choice running back performance.

Raw data was posted at

Question:  What is the average number of carries, total career yards, yards per carry, and touchdowns for first-choice running backs, first choice running backs minus Larry Stegent, and second-choice running backs?

Is there a significant difference between first-choice RBs and second-choice RBs?

Are the simple t-tests for difference in means used in this post the best way to assess the value of the first running back picked compared to the second running back picked?

Discussion

The average for the four statistics are presented below.

 Averages First-Choice Entire Sample First-Choice Without Larry Stegent Second Choice Entire Sample Carries 1554 1603 1286 Yards 6506 6709 5254 Yards Per Carry 3.91 4.03 3.99 Touchdowns 46.64 48.09 35.36

On average, first-choice running backs had more carries, more yards, and more touchdowns than second-choice running backs even when Larry Stegent is included in the sample.  These difference appear to be non trivial.

The Difference between yards per carry first-choice and second-choice RB is small.

The t-tests on the difference between the average for the four statistics – with and without Larry Stegent --- are presented in the next table.

 P-value for students t-test on difference in means Entire Sample Sample missing 1970 for first-choice players Carries 0.317 0.238 Yards 0.280 0.256 Yards Per Carry 0.563 0.623 Touchdowns 0.218 0.200
Test assumes identical variance in both samples.

The tests for difference in mean performance between first-choice and second-choice running backs do not rule out the possibility that this difference is zero.

This result is not changed when Larry Stegent is excluded from the sample.

Discussion of relevance:

The difference in averages first choice versus second-choice (carries, yards, and touchdowns) appears non-trivial.   However the t-test reveals the difference is not significantly different from zero.

WHY?

The standard deviation across running backs is also very large.  Look at the raw data!

Some of the dispersion in running back performance occurred because in some years there were a lot of great running backs up and in other years running back quality was low.  A paired t-test would help account for differences in year effects.

It might be useful to count the number of years that the first choice running back outperforms the second choice running back.

There is also an outlier impacting second-choice running backs.   See Emmitt Smith’s TD total.

This suggests we should be using a Wilcoxon test or some other non-parametric procedure.

