This
post uses Big Ten SAT data to consider how an outlier can impact the size and
location of confidence intervals.
Question: Thirteen of 14 Big Ten Schools are large state universities. One
school – Northwestern is an elite expensive private school.
Calculate
the mean, standard deviation, and 90% confidence interval for the 13 state
schools in the Big Ten.
Discuss
how and why the confidence interval for the 13 state schools differs from the
confidence interval for the mean SAT scores for all 14 Big Ten
schools. The CI for the entire Big Ten was calculated in a previous
post.
Why
might the confidence interval without Northwestern be a better indicator of Big
Ten SAT performance than the confidence interval based on all schools in the
league?
How
does Northwestern SAT scores compare to the upper bounds of the Big Ten SAT
confidence intervals?
The Data used in this Study:
Big Ten SAT Scores 
State Schools and Northwestern


Big Ten School

Verbal SAT 25

Math SAT 25

Verbal SAT 75

Math SAT 75

Ohio State

540

610

660

720

University of Michigan

620

660

720

760

Michigan State

420

550

580

690

University of Minnesota

550

620

690

740

University of Iowa

540

620

620

680

Purdue

520

560

630

690

Indiana University

520

540

630

660

Rutgers

520

570

640

690

University of Maryland

580

620

690

730

University of Illinois

560

700

670

780

Penn State

530

560

630

670

University of Wisconsin

530

630

650

750

University of Nebraska

490

520

660

670

Northwestern

690

700

760

790

Discussion:
The
removal of Northwestern from the sample impacts the Big Ten confidence
intervals in four ways.
 The mean SAT decreases.
 The standard deviation of the SAT
increases because Northwestern is an outlier fairly far from the means of
the SAT variables.
 The decrease in sample size
decreases the denominator of the standard error from 14 to 13.
 The decrease in degrees of freedom
from 13 to 12 decreases the tstatistic used in the calculation of the
lower and upper bounds in the confidence intervals. The change in
the tvalue is fairly small abs( t_{13,0.05}) is
1.771. The t value for 12 degrees of freedom is 1.782
Results:
SAT
statistics and confidence intervals presented below.
Impact of Northwestern
on Big Ten SAT Score Confidence Intervals


Verbal
25th

Math
25th

Verbal
75th

Math
75th


Mean Northwestern
Omitted

532.3

596.9

651.5

710.0

Mean all Big Ten

543.6

604.3

659.3

715.7

Difference

11.3

7.4

7.7

5.7

STD Northwestern Omitted

46.6

51.9

36.3

38.9

STD All Big Ten

61.5

56.9

45.3

43.1

Difference

14.9

5.1

9.1

4.2

STDERR Northwestern
Omitted

12.9

14.4

10.1

10.8

STDERR All Big Ten

16.4

15.2

12.1

11.5

Difference

3.5

0.8

2.1

0.7

t13


t12

1.782

1.782

1.782

1.782

LB Northwestern Omitted

509.3

571.3

633.6

690.7

UB Northwestern Omitted

555.3

622.6

669.5

729.3

Difference

46.0

51.3

35.8

38.5

LB all Big Ten

514.5

577.3

637.8

695.3

UB all Big Ten

572.7

631.2

680.7

736.1

Difference

58.2

53.9

42.9

40.8

Northwestern

690

700

760

790

Observations: Northwestern has a nontrivial impact on the
location and size of the Big Ten confidence intervals
The
biggest impact is on the verbal SAT at the 25^{th} percentile.
Final Thought: When one uses confidence intervals to determine
whether one observation in a sample differs from the rest of the sample one has
to consider that the observation in question impact the location and size of
the confidence interval. The gap between Northwestern and the
upper bound of the confidence intervals is larger when Northwestern is not used
in the creation of the confidence interval.
Additional Reading:
Small
sample confidence intervals around means explained by Khan Academy:
No comments:
Post a Comment