Wednesday, January 20, 2016

Testing for home field advantage in baseball with pooled data

 Testing for home field advantage in baseball with pooled data


Previously, we tested for the existence of home field advantage for all MLB teams separately during the 2015 regular season.   We found that 8 teams did significantly better at home than on the road.

In this post we consider and test whether the home win percentage total of all major league teams is greater than 0.50.

We compare the result obtained by pooling data from all major league teams over the entire 2015 regular season to tests on each team separately. 

We then discuss the limitations of statistical analysis with pooled cross-sectional data.

Question:    Below is information on the home win-loss record for all major league teams in the 2015 regular season.   What percent of games played in the major leagues during the 2015 regular season did the home team win?

Is this percent significantly different from 0.50?  

What do we learn from this test compared to the tests for individual teams?


Win Loss Records in Home Games
Team
Observed Win/Home
Observed Lose/Home
Toronto
53
28
New York
45
36
Baltimore
47
31
Tamp Bay
42
42
Boston
43
38
Kansas City
51
30
Minnesota
46
35
Cleveland
39
41
Chicago
40
41
Detroit
38
43
Texas
43
38
Houston
53
28
Los Angeles
49
32
Seattle
36
45
Oakland
34
47
New York
49
32
Washington
46
35
Miami
41
40
Atlanta
42
39
Philadelphia
37
44
Saint Louis
55
26
Pittsburgh
53
28
Chicago
49
32
Milwaukee
34
47
Cincinnati
34
47
Los Angeles
55
26
San Francisco
47
34
Arizona
39
42
San Diego
39
42
Colorado
36
45


Answer:  There were 2429 total games played during the 2015 MLB regular season.   The home team won 1315 or 54.1% of these games.


Is the home win percentage 54.1% significantly greater than zero?

Let’s conduct a one tailed test with a significance level of 0.01.

 The z cutoff for a one-tailed test with a critical value of 0.01 is 2.33.

The t-statistic used to test whether the home win proportion is greater than 0.5 is


Z= (p- 0.5)/ ((0.5 x (1-p)/2429))0.5

This test statistic or value is 4.1.

We reject the null hypothesis that the win likelihood for a home team is 0.5 in favor of the alternative hypothesis that this probability is greater than 0.5.


Notes:   The result presented here appears suggests the relationship between having the home field and winning is really strong.

After all the probability of winning at home is 0.541.   This means the probability of losing on the road is 0.459.

The win probability is clearly significantly different from 0.50.

But lets remember the team-specific results are not nearly as robust.  



Only eight teams have a significantly higher win probability at home than on the road.   The home win probability was not significantly higher than the road-win probability for 22 of 30 MLB teams in the 2015 regular season.

This means that if you rely on results from a test or model on pooled data you will overestimate the probability of wining at home for most teams and underestimate it for some of the teams.   

Additional insight into the impact of home field advantage requires that we look at why some teams have it and others do not.

Authors Note:  My book Statistical Applications of Baseball is a bit dated but it has some interesting problems in it and is very inexpensive.    Please consider buying it on Kindle.





No comments:

Post a Comment