Baseball statisticians generally look at aggregate batting
averages. This problem shows that batting
statistics summarizing game outcomes for a hitter are also useful.
Question One: The Table below contains data on at bats, hits and
batting averages for Tony Gwynn over 16 games at the end of 1996.
What is the aggregate batting average
over these games and the standard error of the aggregate batting average over
these games?
What are the daily gamelevel median,
mode and average batting averages?
What can we learn from daily batting
average statistics that we cannot learn from aggregate batting statistics?
Batting Performances of Tony
Gwynn Over 16 Games in 1996


Game #

At Bats

Hits

Game BA

1

5

2

0.400

2

4

1

0.250

3

4

1

0.250

4

4

1

0.250

5

4

4

1.000

6

5

1

0.200

7

3

0

0

8

2

2

1.000

9

4

2

0.500

10

3

0

0

11

4

0

0

12

4

1

0.250

13

5

3

0.600

14

4

3

0.750

15

5

1

0.200

16

4

0

0

Total

64

22

0.347

Answer: Tony Gwynn’s overall batting average over this 16game
period is simply total hits divided by total at bats or 0.347 (22/64).
The simple average of the 16 game
batting averages is the sum of all 16 game batting averages divided by 16,
which is 0.353.
The median of these 16 observations is
obtained by averaging the two observations in the middle, the eighth and ninth
largest numbers, both of which are 0.250 in this dataset. Hence, Tony
Gwynn’s median batting average over these 16 games is 0.250.
Over these 16 games, Tony Gwynn had a
0.000 batting average in 4 games, a 0.200 batting average in 2 games, a 0.250
batting average in 4 games, a 0.400 batting average in 1 game, a 0.500 batting
average in 1 game, a 0.600 batting average in 1 game, a 0.750 batting average
in 1 game and a 1.000 batting average in 2 games. The observations with
the highest frequency, 0.000 and 0.250, are the modes of this sample. The distribution is bimodal.
Some observations:
The overall batting average and the
average of the game batting averages are both high for this great hitter. Both averages are pulled up by a small number
of good days where Gwynn got 4 or 5 hits.
The median and the mode are both
substantially lower than the averages largely because these statistics are not
affected by great performances or weak pitching.
It is not uncommon for a great hitter
to go hitless in a game. This aspect of
baseball is more accurately reflected by the median and mode game batting
average performance than the average. The likelihood of getting shut down by
solid pitching may more accurately measure the likelihood of winning than a
general batting average.
Author's Note:
This problem first appeared in my book Statistical Applications of Baseball, published in 1996. The book is somewhat dated but still very useful. It is available at a very low price on kindle.
No comments:
Post a Comment