## Wednesday, January 20, 2016

### Testing for home field advantage in baseball with pooled data

Testing for home field advantage in baseball with pooled data

Previously, we tested for the existence of home field advantage for all MLB teams separately during the 2015 regular season.   We found that 8 teams did significantly better at home than on the road.

In this post we consider and test whether the home win percentage total of all major league teams is greater than 0.50.

We compare the result obtained by pooling data from all major league teams over the entire 2015 regular season to tests on each team separately.

We then discuss the limitations of statistical analysis with pooled cross-sectional data.

Question:    Below is information on the home win-loss record for all major league teams in the 2015 regular season.   What percent of games played in the major leagues during the 2015 regular season did the home team win?

Is this percent significantly different from 0.50?

What do we learn from this test compared to the tests for individual teams?

 Win Loss Records in Home Games Team Observed Win/Home Observed Lose/Home Toronto 53 28 New York 45 36 Baltimore 47 31 Tamp Bay 42 42 Boston 43 38 Kansas City 51 30 Minnesota 46 35 Cleveland 39 41 Chicago 40 41 Detroit 38 43 Texas 43 38 Houston 53 28 Los Angeles 49 32 Seattle 36 45 Oakland 34 47 New York 49 32 Washington 46 35 Miami 41 40 Atlanta 42 39 Philadelphia 37 44 Saint Louis 55 26 Pittsburgh 53 28 Chicago 49 32 Milwaukee 34 47 Cincinnati 34 47 Los Angeles 55 26 San Francisco 47 34 Arizona 39 42 San Diego 39 42 Colorado 36 45

Answer:  There were 2429 total games played during the 2015 MLB regular season.   The home team won 1315 or 54.1% of these games.

Is the home win percentage 54.1% significantly greater than zero?

Let’s conduct a one tailed test with a significance level of 0.01.

The z cutoff for a one-tailed test with a critical value of 0.01 is 2.33.

The t-statistic used to test whether the home win proportion is greater than 0.5 is

Z= (p- 0.5)/ ((0.5 x (1-p)/2429))0.5

This test statistic or value is 4.1.

We reject the null hypothesis that the win likelihood for a home team is 0.5 in favor of the alternative hypothesis that this probability is greater than 0.5.

Notes:   The result presented here appears suggests the relationship between having the home field and winning is really strong.

After all the probability of winning at home is 0.541.   This means the probability of losing on the road is 0.459.

The win probability is clearly significantly different from 0.50.

But lets remember the team-specific results are not nearly as robust.

Only eight teams have a significantly higher win probability at home than on the road.   The home win probability was not significantly higher than the road-win probability for 22 of 30 MLB teams in the 2015 regular season.

This means that if you rely on results from a test or model on pooled data you will overestimate the probability of wining at home for most teams and underestimate it for some of the teams.

Additional insight into the impact of home field advantage requires that we look at why some teams have it and others do not.

Authors Note:  My book Statistical Applications of Baseball is a bit dated but it has some interesting problems in it and is very inexpensive.    Please consider buying it on Kindle.