In this post I exclude an outlier from the running back data but continue to use a classical hypothesis test.
More on the impact of
outliers on running back comparisons
In a previous post, I pointed out that one of the
firstchoice running backs never played in the NFL due to an injury.
This post builds on that previous post by examining the
impact of this outlier on the comparison of average firstchoice and
secondchoice running back performance.
Raw data was posted at
Question: What is the average number of carries, total
career yards, yards per carry, and touchdowns for firstchoice running backs,
first choice running backs minus Larry Stegent, and secondchoice running
backs?
Is there a significant difference between firstchoice RBs
and secondchoice RBs?
Are the simple ttests for difference in means used in this
post the best way to assess the value of the first running back picked compared
to the second running back picked?
Discussion
The average for the four statistics are presented below.
Averages


FirstChoice Entire
Sample

FirstChoice Without
Larry Stegent

Second Choice Entire
Sample


Carries

1554

1603

1286

Yards

6506

6709

5254

Yards Per Carry

3.91

4.03

3.99

Touchdowns

46.64

48.09

35.36

On average, firstchoice running backs had more carries,
more yards, and more touchdowns than secondchoice running backs even when
Larry Stegent is included in the sample.
These difference appear to be non trivial.
The Difference between yards per carry firstchoice and
secondchoice RB is small.
The ttests on the difference between the average for the
four statistics – with and without Larry Stegent  are presented in the next
table.
Pvalue for students ttest
on difference in means


Entire Sample

Sample missing 1970 for
firstchoice players


Carries

0.317

0.238

Yards

0.280

0.256

Yards Per Carry

0.563

0.623

Touchdowns

0.218

0.200

Test assumes identical variance in both samples.
The tests for difference in mean performance between
firstchoice and secondchoice running backs do not rule out the possibility
that this difference is zero.
This result is not changed when Larry Stegent is excluded
from the sample.
Discussion of
relevance:
The difference in averages first choice versus secondchoice
(carries, yards, and touchdowns) appears nontrivial. However the ttest reveals the difference is
not significantly different from zero.
WHY?
The standard deviation across running backs is also very
large. Look at the raw data!
Some of the dispersion in running back performance occurred
because in some years there were a lot of great running backs up and in other
years running back quality was low. A
paired ttest would help account for differences in year effects.
It might be useful to count the number of years that the
first choice running back outperforms the second choice running back.
There is also an outlier impacting secondchoice running
backs. See Emmitt Smith’s TD total.
This suggests we should be using a Wilcoxon test or some
other nonparametric procedure.
No comments:
Post a Comment