Wednesday, May 25, 2016

The impact of an outlier on Big Ten SAT Scores


Question:  Thirteen of 14 Big Ten Schools are large state universities.   One school – Northwestern is an elite expensive private school. 

Calculate the mean, standard deviation, and 90% confidence interval for the 13 state schools in the Big Ten.

Discuss how and why the confidence interval for the 13 state schools differs from the confidence interval for the mean SAT scores for all 14 Big Ten schools.  The CI for the entire Big Ten was calculated in a previous post.


Why might the confidence interval without Northwestern be a better indicator of Big Ten SAT performance than the confidence interval based on all schools in the league?

How does Northwestern SAT scores compare to the upper bounds of the Big Ten SAT confidence intervals?


The Data used in this Study:



Big Ten SAT Scores -- State Schools and Northwestern
Big Ten School
Verbal SAT 25
Math SAT 25
Verbal SAT 75
Math SAT 75
Ohio State
540
610
660
720
University of Michigan
620
660
720
760
Michigan State
420
550
580
690
University of Minnesota
550
620
690
740
University of Iowa
540
620
620
680
Purdue
520
560
630
690
Indiana University
520
540
630
660
Rutgers
520
570
640
690
University of Maryland
580
620
690
730
University of Illinois
560
700
670
780
Penn State
530
560
630
670
University of Wisconsin
530
630
650
750
University of Nebraska
490
520
660
670
Northwestern
690
700
760
790


Discussion: 

The removal of Northwestern from the sample impacts the Big Ten confidence intervals in four ways.

  • The mean SAT decreases.


  • The standard deviation of the SAT increases because Northwestern is an outlier fairly far from the means of the SAT variables.


  • The decrease in sample size decreases the denominator of the standard error from 14 to 13.


  • The decrease in degrees of freedom from 13 to 12 decreases the t-statistic used in the calculation of the lower and upper bounds in the confidence intervals.  The change in the t-value is fairly small abs( t13,0.05) is 1.771.   The t value for 12 degrees of freedom is 1.782



Results:

SAT statistics and confidence intervals presented below.

Impact of Northwestern on Big Ten SAT Score Confidence Intervals

Verbal
25th
Math
25th
Verbal
 75th
Math
75th
Mean Northwestern Omitted
532.3
596.9
651.5
710.0
Mean all Big Ten
543.6
604.3
659.3
715.7
Difference
-11.3
-7.4
-7.7
-5.7
STD Northwestern Omitted
46.6
51.9
36.3
38.9
STD All Big Ten
61.5
56.9
45.3
43.1
Difference
-14.9
-5.1
-9.1
-4.2
STDERR Northwestern Omitted
12.9
14.4
10.1
10.8
STDERR All Big Ten
16.4
15.2
12.1
11.5
Difference
-3.5
-0.8
-2.1
-0.7
t13
t12
1.782
1.782
1.782
1.782
LB Northwestern Omitted
509.3
571.3
633.6
690.7
UB Northwestern Omitted
555.3
622.6
669.5
729.3
Difference
46.0
51.3
35.8
38.5
LB all Big Ten
514.5
577.3
637.8
695.3
UB all Big Ten
572.7
631.2
680.7
736.1
Difference
58.2
53.9
42.9
40.8
Northwestern
690
700
760
790


Observations:   Northwestern has a non-trivial impact on the location and size of the Big Ten confidence intervals

The biggest impact is on the verbal SAT at the 25th percentile.

Final Thought:  When one uses confidence intervals to determine whether one observation in a sample differs from the rest of the sample one has to consider that the observation in question impact the location and size of the confidence interval.   The gap between Northwestern and the upper bound of the confidence intervals is larger when Northwestern is not used in the creation of the confidence interval.  

Additional Reading:

Small sample confidence intervals around means explained by Khan Academy:



No comments:

Post a Comment