## Wednesday, May 25, 2016

### The impact of an outlier on Big Ten SAT Scores

Question:  Thirteen of 14 Big Ten Schools are large state universities.   One school – Northwestern is an elite expensive private school.

Calculate the mean, standard deviation, and 90% confidence interval for the 13 state schools in the Big Ten.

Discuss how and why the confidence interval for the 13 state schools differs from the confidence interval for the mean SAT scores for all 14 Big Ten schools.  The CI for the entire Big Ten was calculated in a previous post.

Why might the confidence interval without Northwestern be a better indicator of Big Ten SAT performance than the confidence interval based on all schools in the league?

How does Northwestern SAT scores compare to the upper bounds of the Big Ten SAT confidence intervals?

The Data used in this Study:

 Big Ten SAT Scores -- State Schools and Northwestern Big Ten School Verbal SAT 25 Math SAT 25 Verbal SAT 75 Math SAT 75 Ohio State 540 610 660 720 University of Michigan 620 660 720 760 Michigan State 420 550 580 690 University of Minnesota 550 620 690 740 University of Iowa 540 620 620 680 Purdue 520 560 630 690 Indiana University 520 540 630 660 Rutgers 520 570 640 690 University of Maryland 580 620 690 730 University of Illinois 560 700 670 780 Penn State 530 560 630 670 University of Wisconsin 530 630 650 750 University of Nebraska 490 520 660 670 Northwestern 690 700 760 790

Discussion:

The removal of Northwestern from the sample impacts the Big Ten confidence intervals in four ways.

• The mean SAT decreases.

• The standard deviation of the SAT increases because Northwestern is an outlier fairly far from the means of the SAT variables.

• The decrease in sample size decreases the denominator of the standard error from 14 to 13.

• The decrease in degrees of freedom from 13 to 12 decreases the t-statistic used in the calculation of the lower and upper bounds in the confidence intervals.  The change in the t-value is fairly small abs( t13,0.05) is 1.771.   The t value for 12 degrees of freedom is 1.782

Results:

SAT statistics and confidence intervals presented below.

 Impact of Northwestern on Big Ten SAT Score Confidence Intervals Verbal 25th Math 25th Verbal  75th Math 75th Mean Northwestern Omitted 532.3 596.9 651.5 710.0 Mean all Big Ten 543.6 604.3 659.3 715.7 Difference -11.3 -7.4 -7.7 -5.7 STD Northwestern Omitted 46.6 51.9 36.3 38.9 STD All Big Ten 61.5 56.9 45.3 43.1 Difference -14.9 -5.1 -9.1 -4.2 STDERR Northwestern Omitted 12.9 14.4 10.1 10.8 STDERR All Big Ten 16.4 15.2 12.1 11.5 Difference -3.5 -0.8 -2.1 -0.7 t13 t12 1.782 1.782 1.782 1.782 LB Northwestern Omitted 509.3 571.3 633.6 690.7 UB Northwestern Omitted 555.3 622.6 669.5 729.3 Difference 46.0 51.3 35.8 38.5 LB all Big Ten 514.5 577.3 637.8 695.3 UB all Big Ten 572.7 631.2 680.7 736.1 Difference 58.2 53.9 42.9 40.8 Northwestern 690 700 760 790

Observations:   Northwestern has a non-trivial impact on the location and size of the Big Ten confidence intervals

The biggest impact is on the verbal SAT at the 25th percentile.

Final Thought:  When one uses confidence intervals to determine whether one observation in a sample differs from the rest of the sample one has to consider that the observation in question impact the location and size of the confidence interval.   The gap between Northwestern and the upper bound of the confidence intervals is larger when Northwestern is not used in the creation of the confidence interval.