Thursday, December 8, 2016

Sample Size and Tests on Proportions

Sample Size and Tests on Proportions

The data used in this post is identical to the data used in a health care post comparing out-of-pocket expenses for working age Americans in 2013 to out-of-pocket expenses in 2014.



The post found that the opening of state exchanges reduced out-of-pocket health care expenses and the number of households spending more than $5,000 in a year on out-of-pocket health care costs.

This post considers some statistical issues related to differences in the reduction of people with large health care bills for low-income and high-income people.

I am interested in limitations of survey data due to sample size to answer questions for subsets of the population.

Question:  The table below has information on number on number of people with out-of-pocket expenses less than or equal to $5,000 and out-of-pocket expenses greater than $5,000 for three groups --  (1) working-age people in households with income less than 200% of the federal poverty line (FPL), (2) working age people in households with income greater than 200% FPL, and (3) all working-age people.   I have classified all people between the age of 20 and 64 as working-age people.

For each group calculate the percent of people with more than $5,000 in out-of-pocket expenses in 2013 and in 2014.

Did the opening of state exchanges result in a larger reduction in out-of-pocket expenses for the lower-income groups or the higher income group?

Conduct a hypothesis test on difference in the proportion of people with more than $5,000 of out-of-pocket expenses for the three groups?

Discuss how sample sizes impact the result of hypothesis tests?

The Data:  Tabulations on the proportion of people with more than $5,000 of out-of-pocket health care expenses were obtained from the 2013 and 2014 MEPS consolidated annual surveys.  

MEPS 2014 File:

MEPS 2013 File:


The tabulations for the three groups are presented here.


Changes in Proportion of Working-Age People with
Large Out-of-Pocket Expenses
Household Income
 <200% FPL
Out-of-Pocket Health Care Expenses
2013
2014
<=$5,000
9,065
8,331
>$5,000
63
47
Total
9,128
8,378
Household Income
>=200% FPL
Out-of-Pocket Health Care Expenses
2013
2014
<=$5,000
11,993
11,466
>$5,000
171
135
Total
12,164
11,601
All Working Age People
Regardless of Income
Out-of-Pocket Health Care Expenses
2013
2014
<=$5,000
21,058
19,797
>$5,000
234
182
Total
21,292
19,979


Analysis:  The two key pieces of information that I want you to focus on in the chart below are the difference in the percent of people with more than $5,000 in out-of-pocket health care expenses and the p-value for the hypothesis that this difference was zero for all three groups.








Change in Proportions and Hypothesis Test Results
Low Income Working Age People
High Income Working Age People
All Working Age People
Proportion of People with expenses>$5,000 in 2013
0.69%
1.41%
1.10%
Proportion of people with expenses>$5,000 in 2014
0.56%
1.16%
0.91%
Difference 2014 - 2013
0.13%
0.24%
0.19%
Difference as Percent of 2013
18.7%
17.2%
17.1%
Z Score for null hypothesis of no difference in proportion
1.08
1.64
1.90
p-value
0.281
0.100
0.057



Observations:

The change in the proportion of people with large health care expenses is larger for high-income people than low-income people.

The null hypothesis for the low-income group is not close to being significant at conventional levels of significance.  

The null hypothesis for the high-income group is basically at the cusp of significance.

The null hypothesis of the total sample is rejected at most levels of significance.


Comments:


It seems odd that we can reject the null hypothesis for the test conducted on the entire sample but cannot do so for the part of the population where differences in the proportions are largest.

Even though the difference in proportions is largest for the high-income sample the inclusion of people from the low-income sample creates a significant result.

I rightly excluded children and people over age 65 from the analysis of changes in people with large health care bill because the group most affected by the ACA was working-age adults.    The inclusion of other groups would almost certainly have created a more significant result.  The difference in proportions would likely be smaller but more significant because of the larger sample size.

I could have conducted the analysis on 5 different income groups.   I strongly suspect that the null hypothesis of differences in the proportion of people with large health care bills would have been accepted for all five groups.

The key lesson from this post is that sample size matters and even surveys that appear larger aren’t large enough to answer certain questions for subsets of the population.  This problem  is especially severe when one studies events that occur relatively infrequently.   The decrease in frequency from 1.41%to 1.64% is only 0.24 percentage points but this is over 17% of the initial base.


Authors Note:  I am working very hard on my health care blog to address some of the issues raised by Trump’s plan to repeal and replace the ACA.   Go to the following post for a list of my questions.


                                                                                                                          

No comments:

Post a Comment