Question: The post I created yesterday has a calculation of the likelihood of a 10-game streak for two hitters — one who was consistently a 300 hitter and the other who was 200 against right handers and 400 against left handers. The analysis was based on the assumption that outcomes at the plate are identical and independently distributed (iid)? Also, the analysis is based on the assumption that hitting outcomes from all days are independent and identically distributed.
The post created yesterday:
In the real world at-bat appearances and games are not iid. Do you anticipate that the actual likelihood of a hitting streak will be lower or higher than the estimate obtained under the assumption that outcome of at-bats and outcomes across games are iid and binomially distributed.
What other factors could affect the actual likelihood of a 10-game streak?
Answer: Pitchers vary in quality and effectiveness. Hitters should get a disproportionately high number of hits in games where they face ineffective pitchers and a disproportionately low number of hits when they face hitters who are relatively strong.
I have posted some data that supports this hypothesis.
Over a 16-game stretch Tony Gwynn hit 0.347 and had four games with zero hits. We compare this observed number of 0 hit games to the expected number of zero hit games over a 16-game period.
What is the likelihood that a 347 hitter will have 0 hits in a game?
if at-bats are iid and binomially distributed this probability is 0.18.
The expected number of 0 hit games over a 16 game stretch (again based on iid and binomially distributed)
is 16 x 0.18, which is 2.88. The expected number of 0 hit games is less than the actual number (2.88<4).
This suggests that the actual likelihood of a 10-game streak is lower than the likelihood that one calculates based on the assumption that at-bats and games are binomial and iid.
The likelihood calculations in the post written yesterday are based on the assumption of 4 at bats per game. In some games, the number of at bats is larger than 4 and in others less than 4. The variability in the number of at bats should result in a higher likelihood of no hits in some games and should decrease the likelihood of a streak.
Actually, only 4 games with not hits over a 16 game period is pretty good. This is probably because Tony Gwynn was a clutch hitter. A 347 batting average is astounding but not all 347 hitters are equal. There are few players I would rather have at the plate in a situation where my team is down by a run late in the game.