Saturday, August 13, 2016

Regression Analysis: Explaining Graduation Rates at Public and Private Colleges

Regression Analysis:  Explaining Graduation Rates at Public and Private Colleges

Question:   A database on my statistical resource blog has information on the graduation rate and SAT score for 31 public universities and 20 private universities in the state of California.  Use the information in this database to determine the impact of SAT scores and private versus public university status on the on-time graduation rate for these universities.

In particular consider:

After holding constant SAT scores do private universities have a better on-time graduation rate than public universities?

Does the impact of SAT score on on-time graduation rate differ for private and public universities?

Go to this blog on the statistical resource page to get the raw data:

Analysis:   The variable of interest in this study is a proportion, specifically the proportion of students who start at a university who graduate within six years.    Proportions are bounded between 0 and 1 so the error term of a regression where the proportion is the dependent variable is not likely to be normally distributed.  This affects the standard error of the regression coefficients.   Also, predictions from such a model could be below 0 or above 1.   To avoid these problems we transform the graduation on-time proportion into the log of the odds where


 where p is the proportion of students who graduate within six years.

SAT scores vary from 200 to 800.  We transform the SAT measure in the database by subtracting 200.   This transformation affects the estimation and interpretation of the constant term of the regression.   By transforming the SAT measure in this manner we can interpret the constant term as the likely graduation rate when SAT measure is 200.

An aside to teachers of statistics:  Put the formulas for regression coefficients on the blackboard and have your students explain why the subtraction of 200 from the SAT measure impacts the constant term in the regression but does not impact the slope of the SAT measure variable.

Back to the analysis:

Analysis of the impact of private school status on on-time graduation rates holding constant SAT scores:

Dependent variable is log of the odds of graduation on time.

Explanatory variables are SATMEAS from database minus 200 and a dummy variable set to 1 if the school is private and set to 0 otherwise.

Graduation Rate as a function of SAT Measure
 and Private School Dummy
Constant Term
Adjusted R2


SATMEAS is a highly significant explainer of graduation rate.

Private school status is not significantly impact graduation rate in a model which includes the SAT variable.

Questions for students:

The coefficient of the SATMEAS gives the impact of the SAT score on the log of the odds.  Can you figure out how to get the impact of SATMEAS on the on-time graduation proportion?

How might you improve this model?

Discuss the meaning of the constant term.   Are you troubled that this value is negative?

Analysis of differences in the impact of SAT scores on graduation rates in private and public schools:

In order to evaluate differences in the impact of SAT scores on public and private graduation rates I estimate two separate regression models one for private schools and the other for public schools.   Results are presented below.

Graduation Rate Models for Public and Private Universities in California
Public Schools
Private Schools
Constant Term
Adjusted R2


Graduation rate is a significant explainer in on-time graduation rate for both public and private colleges in California.  

However, the coefficient of the SATMEAS is 25% higher for public universities.

Concluding Question:  This post ends with a question.   Why is the impact of SAT scores on graduation rates so much higher for the public schools than for the private schools?  I suspect that other factors perhaps student debt, socio-economic characteristics of the student population, or the percent of students attending part time play a role.   I may collect more data and analyze these questions further.   

No comments:

Post a Comment