Wednesday, September 28, 2016

Estimating percentiles under the assumption of normality



Estimating percentiles under the assumption of normality


This post shows how one can estimate the 5th and 95th percentile for a sample based on the assumption that the data is normally distributed around the mean, given the sample mean and the sample standard deviation from the data set.   The dataset that we use here involves 60 observations on the price of Vanguard small-cap ETF (VB).   The data was previously used to illustrate a confidence interval around the mean of the price data.

Previous Post on Confidence Interval:

This post above is also useful in that it explains how to create a confidence interval and how to use the confidence function in Excel.

Question:  The table below has the sample mean and sample standard deviation from 60 observations on Vanguard fund (VB).   The table also contains the values of the 5th percentile and the 95th percentile in the sample.

Use the data on the sample mean and sample standard deviation to estimate the value of the 5th and 95th percentile under the assumption that the data is normally distributed.  

How do the estimates of the 5th and 95th percentiles compare to the actual values of the 5th and 95th percentile obtained from the sample?

Based on this comparison, do you believe the data is positively or negatively skewed?



Sample Mean and Sample Standard Deviation for 60
 observations of Vanguard Fund VB
Mean
120.897
Std
1.995
Actual 5th Percentile
116.0
Actual 95th Percentile
123.5


Analysis:   The estimated value of the 5th and 95th percentile is obtained by multiplying the appropriate value of the normal distribution by the standard deviation.  We use the norm.inv function and find Z0.05 is -1.645 and Z0.95 is 1.645. 

We estimate the 5th percentile at 117.6 and the 95th percentile at 124.2.





Estimates of the 5th and 95th Percentiles
 Under the Assumption of Normality
norm.inv(0.05)
-1.645
norm.inv(0.95)
1.645
Standard Deviation
1.995
Average
120.897
Estimate of 5th percentile
117.615
Estimate of 95th percentile
124.179
Actual Values of 5th and 95th Percentiles
5th
116.0
95th
123.5


The actual value of the 5th percentile is 116.0 around 1.6 points lower than the estimated value under the assumption of normality.


The actual value of the 95th percentile is 123.5 around 0.7 points lower than estimated value under the assumption of normality.


These deviations between estimated and actual percentiles make me realize that the data is negatively skewed compared to the normal distribution.


The skew estimate obtained from Excel is -1.3.   There is a negative skew to this data.  


A thing for students to think about:  In what way does the calculation of estimates the 5th and 95th percentile of this sample based on the assumption of normality differ from the calculation of a confidence interval around the mean, also based on the assumption of normality.  

2 comments:

  1. .. And how much does the answer differ if one does not assume Normality, and what's a good way of calculating that? (Hint: Consider resampling.)

    ReplyDelete
  2. .. And how much does the answer differ if one does not assume Normality, and what's a good way of calculating that? (Hint: Consider resampling.)

    ReplyDelete