## Sunday, August 4, 2019

### Game-Level Batting Statistics

Baseball statisticians generally look at aggregate batting averages.  This problem shows that batting statistics summarizing game outcomes for a hitter are also useful.

Question One: The Table below contains data on at bats, hits and batting averages for Tony Gwynn over 16 games at the end of 1996.

What is the aggregate batting average over these games and the standard error of the aggregate batting average over these games?

What are the daily game-level median, mode and average batting averages?

What can we learn from daily batting average statistics that we cannot learn from aggregate batting statistics?

 Batting Performances of Tony Gwynn Over 16 Games in 1996 Game # At Bats Hits Game BA 1 5 2 0.400 2 4 1 0.250 3 4 1 0.250 4 4 1 0.250 5 4 4 1.000 6 5 1 0.200 7 3 0 0 8 2 2 1.000 9 4 2 0.500 10 3 0 0 11 4 0 0 12 4 1 0.250 13 5 3 0.600 14 4 3 0.750 15 5 1 0.200 16 4 0 0 Total 64 22 0.347

Answer:  Tony Gwynn’s overall batting average over this 16-game period is simply total hits divided by total at bats or 0.347 (22/64).

The simple average of the 16 game batting averages is the sum of all 16 game batting averages divided by 16, which is 0.353.

The median of these 16 observations is obtained by averaging the two observations in the middle, the eighth and ninth largest numbers, both of which are 0.250 in this data-set.  Hence, Tony Gwynn’s median batting average over these 16 games is 0.250.

Over these 16 games, Tony Gwynn had a 0.000 batting average in 4 games, a 0.200 batting average in 2 games, a 0.250 batting average in 4 games, a 0.400 batting average in 1 game, a 0.500 batting average in 1 game, a 0.600 batting average in 1 game, a 0.750 batting average in 1 game and a 1.000 batting average in 2 games.  The observations with the highest frequency, 0.000 and 0.250, are the modes of this sample.  The distribution is bimodal.

Some observations:

The overall batting average and the average of the game batting averages are both high for this great hitter.  Both averages are pulled up by a small number of good days where Gwynn got 4 or 5 hits.

The median and the mode are both substantially lower than the averages largely because these statistics are not affected by great performances or weak pitching.

It is not uncommon for a great hitter to go hitless in a game.  This aspect of baseball is more accurately reflected by the median and mode game batting average performance than the average. The likelihood of getting shut down by solid pitching may more accurately measure the likelihood of winning than a general batting average.

Author's Note:

This problem first appeared in my book Statistical Applications of Baseball, published in 1996.   The book is somewhat dated but still very useful.  It is available at a very low price on kindle.