Impact of SAT and
School Size on Graduation Rates in 31 State Universities in California
Question: The table below contains information on
graduation rates and SAT scores for large state universities and midsize state
universities in the state of California.
Estimate a regression where graduation rate is a function of SAT score
and a school size dummy variable. Are
students in midsize schools more or less likely to graduate than students in
large schools, when SAT score is held constant?
Data:
The data to analyze impact of school size and SAT measure on
odds of graduating from a public university in California are presented in the
table below.
Information on Graduation Rates, SAT Performance and School Size
for Fouryear Public Universities in California


Public Universities in California

Odds of Graduating On Time

SAT Measure

Large School Dummy

Cal Poly

2.45

620

1

UC Berkley

10.11

677.5

1

UCLA

10.11

650

1

UCSD

6.14

640

1

San Jose State

0.92

515

1

UC Davis

4.26

597.5

1

UC Irvine

6.14

565

1

SDSU

1.94

545

1

Cal State Poly

1.08

535

1

Cal State Sacramento

0.72

475

1

UCSB

4.00

607.5

1

San Francisco State University

0.85

497.5

1

Cal state Chico

1.33

507.5

1

Cal State Long Branch

1.44

510

1

Cal State Fullerton

1.08

487.5

1

Cal State Los Angeles

0.56

440

1

University of California Riverside

1.94

545

1

Cal State Northridge

0.89

460

1

Cal State San Bernadino

0.72

447.5

1

Cal State Fresno

0.92

462.5

1

UCAl Santa Cruz

2.85

547.5

1

Cal State East Bay

0.64

455

0

Cal State San Marcos

0.85

482.5

0

Sonoma State University

1.17

502.5

0

Cal State Channel Islands

1.04

477.5

0

Cal State Bakersfield

0.64

452.5

0

Cal State Dominquez Hill

0.39

425

0

Cal State Monterey Bay

0.61

485

0

Cal State Stanislaus

1.00

460

0

Humbolt State University

0.67

507.5

0

University of California Merced

1.33

510

0

The SAT measure is the average of four numbers the 25^{th}
and 75^{th} percentiles of both the math and verbal SAT score.
The large school dummy is set to 1.0 if the school has more
than 15,000 undergraduates and is set to 0 if school has between 2,000 and
15,000 undergraduates.
Regression Results: I ran a regression model where the dependent
variable is the log of the odds that a person graduates within six years of
leaving school.
The explanatory variables used in the model are the SAT
measures and the dummy variable set to 1 if the school has more than 15,000
undergraduates and 0 otherwise.
The regression results are laid out in the table below.
Regression Results for
Graduation Rate Equation


variable

Coeff.

tstat

SAT

0.005

11.9

LARGE

0.057

0.95

CONSTANT

2.53

12.1

R^{2}

86.5

Observations and
Comments:
SAT score is significantly relate to log of the odds of the
graduation rate. High SAT scores are
associated with higher levels of graduation on time.
School size is NOT significantly related to graduation rate.
The constant term is highly significant and negative. The constant term is the value of the
graduation rate for smaller schools when the SAT is zero. The SAT measure can never be zero because the
minimum value of the SAT is 200. It is
difficult to interpret the meaning of the negative constant term in the
estimated regression.
Should I remove the
constant term from the regression?
The existence of the significant constant term in this
regression suggests to me that the model is missing important variables and the
results may not be very robust.
Literature on whether regressions should be estimated
without the constant term included is mixed.
Here are some links to this topic.
Since a significant constant term indicates to me that the
model may be misspecified and in particular some variables related to the
graduation rate may have been omitted I reran the regression with the constant
term omitted. I also omitted the SIZE
variable because it was not significant in the original regression.
When I reran the model with the constant term and the size
variable omitted I got a positive but insignificant coefficient for the SAT
variable.
Concluding Thoughts: Simply glancing at the data indicates that
high SAT schools have higher graduation rates.
However, model results are not incredibly robust. I believe other variables are as important as
the SAT average including (1) the socioeconomic status of the students at the
school and (2) the percent of students who attend part time.
Also, the sample size used to construct this model is really
small.
More work will follow probably in August of 2016.
