## Friday, December 6, 2013

### Explaining Chebyshev's theorem with a baseball example

Use of Chebyshev’s Theorem — a baseball example.

Background: An analyst who knows both the mean and the standard deviation of a set of data can create an interval around the mean which must contain at least a certain proportion of the data in the sample.  The tool used to create a lower bound for the proportion of the data that must lie in this interval is called Chebyshev’s theorem.

Chebyshev’s Theorem:  For any set of data and any constant k greater than one, at least 1-1/k2  of the data must lie within k standard deviations of the mean.

This theorem gives us a lower bound for the proportion of observations that must lie in the interval (M-kxSTD to M+kxSTD).  The lower bound for the proportion of observations in this interval is (1-1/k2).

Note that the exact proportion of observations in the interval will not in general equal this lower bound.  It may be much larger but, by definition of lower bound, it cannot be smaller.  For instance, we can be sure that at least 75% of the data lies between 2 standard deviations of the mean because (1-1/22)=0.75 and we can also be sure that at least 88.9 percent of the data lies within 3 standard deviations of the mean because (1-1/32) is 8/9 or 0.889.

Example 4.1:  Data on the average and the standard deviation of the number of hits by the 14 American and 14 National League teams in the 1996 regular season are presented below.  For each league, construct an interval, which contain at least 50.0% of team-run observations?

 AL NL Mean Runs 1575.5 1456.9 STD Runs 62 82.2

Answer to Example 4.1: In order to determine the value k that possesses at least 50.0% of observations we set 0.50=(1-1/k2) and solve for k.  Multiplying both sides of this equation by k2 gives us k2-1=.5*k2.  Rearranging the above equation .5k2=1 which is the same as k2=2 which reduces to k=1.4.  Plugging in k=1.4 gives us an interval of 1575.5 (plus or minus) 1.4x62 or (1448.2 to 1662.8) for the American League and 1456.9 (plus or minus) 1.4x82.2 or (1342 to 1572) for the National League.  We know that at least 50.0% of the teams in each league have total hits inside this interval.