Sports is a great way to teach statistics. This example teaches hypothesis testing on proportions with home win loss percentages from three sports.
Home Field Advantage in Three Sports
This post attempts to assess the importance of the home
field in three sports – professional basketball, football and baseball. The analysis in this post involves an
evaluation of all games played by all teams in the league. The data for
baseball and football cover the 2015 regular season. The data for basketball covers the 2014 to
2015 regular season.
Data: The table below contains information on games
won by the home team and games won by the road team in three professional
sports.
Home Wins and Losses
Across Sports


Games Won by Home Team

Games Won By Away Team


Baseball

1315

1114

Football

138

118

Basketball

707

523

Questions:
·
What percent of all games played did the home
team win in each sport?
·
Create a 95% confidence interval for the
homewin proportion in each sport?
·
Explain why it is unnecessary to conduct an
evaluation of the homeloss percentage in each sport if one has already
conducted an analysis of the homewin percentage?
·
Test the hypothesis that the homewin proportion
for each team is greater than 0.5 in each sport
·
Explain how the number of games played impacts
the estimated standard error used in the ttest for the hypothesis that the
homewin proportion differs from O.5.
Analysis:
What percent of all
games played did the home team win in each sport? Discuss.
The point estimates suggest the homefield advantage is
highest in basketball. This may occur
because the fans are so close to the court.
However, courts in basketball and football are identical while baseball
stadiums differ in dimensions. The
differences in stadiums could allow an organization to buy players who perform
well in certain fields. For example, a
team with a short left field would load up on righthanded power hitters.
HomeWin Proportions in
Three Sports


Games Won by Home Team

Games Won By Away Team

Total

Home win Proportion


Baseball

1315

1114

2429

0.5414

Football

138

118

256

0.5391

Basketball

707

523

1230

0.5748

Create a 95%
confidence interval for the homewin proportion in each sport?
The zscore used to construct the 95% confidence interval is
1.96. The upper and lower bounds for
the confidence interval are the obtained by adding/subtracting 1.96 x SE. The SE is
(px(1p)/n)^{0.5 }where p is the estimated homewin proportion
and n is the sample size.
Calculation of 95%
Confidence of HomeWin Proportion in Three Sports


Sport

Estimator of Home Win
Proportion

Sample Size

Standard Error

Lower Bound of CI

Upper Bound of CI

Baseball

0.5414

2429

0.0101

0.5216

0.5612

Football

0.5391

256

0.0312

0.4780

0.6001

Basketball

0.5748

1230

0.0141

0.5472

0.6024

Explain why it is
unnecessary to conduct an evaluation of the homeloss percentage in each sport
if one has already conducted an analysis of the homewin percentage?
The event win at home and the event lose at home are
complements. If a team wins 60% of home
games it loses 40% of home games. If
0.5 is outside the confidence of the home win proportion it is also outside the
confidence of the home loss proportion.
Note also the standard errors of the confidence interval for
the home win proportion is identical to the standard error of the homeloss
proportion.
Test the hypothesis
that the homewin percentage for each team is greater than 0.5 in each sport. Use a onetailed test at a 0.025 level of
significance. Explain how the number of
games played impacts the estimated standard error used in the ttest for the
hypothesis that the homewin proportion differs from O.5.
The calculations for these tests are laid out in the table
below.
Test results for the
hypothesis that home win proportion
is greater than 0.05.


Sport

Estimator of Home Win
Proportion

Value Of Proportion Under
the Null Hypothesis

Sample Size

Standard Error

TStatistic

Critical Value for 0.01
level of significance

Baseball

0.5414

0.5

2429

0.0101

4.1

2.33

Football

0.5391

0.5

256

0.0313

1.3

2.33

Basketball

0.5748

0.5

1230

0.0143

5.2

2.33

Discussion:
·
The standard error in the tstatistic is ((0.5 x
(10.5)/n)^0.5. Differences in the value
of the standard error depend solely on the sample size n since the assumed null
hypothesis probability of 0.5 is used in the construction of all three standard
errors. The standard error is a lot
larger for football because this sport has fewer games in one season.
·
The critical value for the one tailed test at a
0.0 level of significance can easily be obtained in Excel by inserting
NORMSINV(0.99).
·
The null hypothesis is rejected for baseball and
basketball but not for football. The
point estimate for the home field win proportion is fairly similar for football
and baseball but the homewin proportion is highly significant for baseball and
insignificant for football. This result
was determined by the sample size.
There is of course a small but significant relationship
between home field and game outcome in football. A test for football based on pooled data
over multiple seasons will find a significant result. Sample size often matters a lot both for
these sportmath problems and for high dollar questions like the efficacy of
different health care treatments.
Concluding Thoughts: Often when samples are small proportions are
tested with a chisquared test rather than a ttest. It is possible to test the hypothesis that
the football and baseball home field win percentages are identical by pooling
data and using the chisquare test. I
am reluctant to pool the football and baseball data from a single season
because the sample size for baseball is so much larger than the sample for
football and the pooled sample will be dominated by the larger sample.
People who are interested in comparing the chisquared and
ttests should consider the following post.
I am working on a new book on statistics and sports. People who are interested in my approach to
the world can look at my work on Kindle and on Teachers Pay Teachers.
Solving Financial Problems In Excel
Statistical Applications of Baseball
http://www.amazon.com/StatisticalApplicationsBaseballStatisticsSportsebook/dp/B006M3PQWQ
Go back to the seven homefield advantage hypothesis testing problems:
http://www.dailymathproblem.com/p/homefieldhypothesistestingproblems.html
Go back to the seven homefield advantage hypothesis testing problems:
http://www.dailymathproblem.com/p/homefieldhypothesistestingproblems.html
No comments:
Post a Comment