Thursday, June 2, 2016

A Logistic Regression for Graduation Rates in California

SAT Scores and Graduation Rates at 21 Large California Schools


Question:   What is the relationship between graduation rate and SAT scores at large public universities in California?


Data:  Data on the graduation rate and verbal and math SAT scores for 21 large four-year universities in the state of California are reported in the table below.



SAT Scores For 21 Large Four-Year Public Universities in California
Graduation Rate
Verbal 25
Verbal 75
Math 25
Math 75
Cal Poly
71
550
650
590
690
UC Berkley
91
590
720
630
770
UCLA
91
560
680
600
760
UCSD
86
550
660
620
730
San Jose State
48
440
550
470
600
UC Davis
81
510
640
560
680
UC Irvine
86
460
600
530
670
SDSU
66
480
590
500
610
Cal State Poly
52
460
570
490
620
Cal State Sacramento
42
410
520
430
540
UCSB
80
530
650
560
690
San Francisco State University
46
430
550
450
560
Cal state Chico
57
450
550
460
570
Cal State Long Branch
59
440
550
460
590
Cal State Fullerton
52
450
550
470
480
Cal State Los Angeles
36
380
480
390
510
University of California Riverside
66
470
580
500
630
Cal State Northridge
47
400
510
400
530
Cal State San Bernandino
42
390
490
400
510
Cal State Fresno
48
400
510
410
530
UC Santa Cruz
74
470
610
490
620


The source of the data is the web site CollegeScorecard as queried on June 2, 2016.



I set state to California, degree type four years, school public university, and size large.

Discussion of data:  

The graduation rate in this study is the rate of graduation after six years at schools that offer four-year degrees for students that were enrolled full time in their first year.


Test scores were at all schools that report their test scores.   The scores listed are the 25th and 75th percentile of verbal and math SAT scores.


Descriptive Statistics Graduation Rates and SAT Scores California Schools
Graduation Rate
Verbal
25
Verbal
75
Math
 25
Math
75
Mean
62.9
467.6
581.4
495.7
613.8
STD
17.8
60.3
65.9
74.1
84.6
Min
36.0
380.0
480.0
390.0
480.0
Max
91.0
590.0
720.0
630.0
770.0

Logistic Regression Results:

I estimated logistic regression models for the graduation rate variable.   The dependent variable in the logistic rate model is the log of the odds of the graduation rate.   I estimated several models with various SAT scores as explanatory variables.  The SAT variable used in the model presented below is the average of four SAT scores  - verbal 25th percentile and 75th percentile, and math 25th and 75th percentile.



Logistic Regression of Graduation Rates on SAT Information
Coeff.
t-stat
Adjusted R2
Average Four SAT Scores
0.0119542
11.09
0.8591
Constant Term
-5.807844
-9.9


Observation on Results:  The average of the four SAT scores is highly significantly related to graduation rate at the 21 large public universities in California.  These results are not sensitive to the choice of SAT statistic used as an explanatory variable.  Results are in fact extremely robust.

A thought about the importance of tests:  These results suggest that colleges with smart kids, as measured by SAT performance, also have high graduation rates.    However, the ability to test well may not be the cause of the higher graduation rate.   The model needs to be expanded to hold other economic and socio-economic variables constant in order to say more about this issue. 

Further Work:  I am interested in this topic because of potential policy implications and because it is an interesting statistical problem.

I would like to extend the model to consider other issues of potential interest to both policy makers and consumers of education.

How does the impact of SAT scores differ for large schools versus small schools?

How does the graduation rate SAT relationship differ for large public schools in Texas compare to large public schools in California?

Is the SAT as an important determinant of graduation rate at private universities?

I will use these other issues to teach more about a number of statistical issues including  -- multi-collinearity, interpreting logistic regression models, and hypothesis testing.


More work on these topics will follow.

No comments:

Post a Comment