Sample Size and Tests
on Proportions
The data used in this post is identical to the data used in
a health care post comparing outofpocket expenses for working age Americans
in 2013 to outofpocket expenses in 2014.
The post found that the opening of state exchanges reduced
outofpocket health care expenses and the number of households spending more
than $5,000 in a year on outofpocket health care costs.
This post considers some statistical issues related to
differences in the reduction of people with large health care bills for lowincome
and highincome people.
I am interested in limitations of survey data due to sample
size to answer questions for subsets of the population.
Question: The table below has information on number on
number of people with outofpocket expenses less than or equal to $5,000 and
outofpocket expenses greater than $5,000 for three groups  (1) workingage people in households with income
less than 200% of the federal poverty line (FPL), (2) working age people in
households with income greater than 200% FPL, and (3) all workingage
people. I have classified all people
between the age of 20 and 64 as workingage people.
For each group calculate the percent of people with more
than $5,000 in outofpocket expenses in 2013 and in 2014.
Did the opening of state exchanges result in a larger
reduction in outofpocket expenses for the lowerincome groups or the higher
income group?
Conduct a hypothesis test on difference in the proportion of
people with more than $5,000 of outofpocket expenses for the three groups?
Discuss how sample sizes impact the result of hypothesis
tests?
The Data: Tabulations on the proportion of people with
more than $5,000 of outofpocket health care expenses were obtained from the
2013 and 2014 MEPS consolidated annual surveys.
MEPS 2014 File:
MEPS 2013 File:
The tabulations for the three groups are presented here.
Changes in Proportion of
WorkingAge People with
Large OutofPocket
Expenses


Household Income
<200% FPL


OutofPocket Health Care
Expenses

2013

2014

<=$5,000

9,065

8,331

>$5,000

63

47

Total

9,128

8,378

Household Income
>=200% FPL


OutofPocket Health Care
Expenses

2013

2014

<=$5,000

11,993

11,466

>$5,000

171

135

Total

12,164

11,601

All Working Age People
Regardless of Income


OutofPocket Health Care
Expenses

2013

2014

<=$5,000

21,058

19,797

>$5,000

234

182

Total

21,292

19,979

Analysis: The two key pieces of information that I want
you to focus on in the chart below are the difference in the percent of people
with more than $5,000 in outofpocket health care expenses and the pvalue for
the hypothesis that this difference was zero for all three groups.
Change in Proportions and
Hypothesis Test Results


Low Income Working Age
People

High Income Working Age
People

All Working Age People


Proportion of People with
expenses>$5,000 in 2013

0.69%

1.41%

1.10%

Proportion of people with
expenses>$5,000 in 2014

0.56%

1.16%

0.91%

Difference 2014  2013

0.13%

0.24%

0.19%

Difference as Percent of
2013

18.7%

17.2%

17.1%

Z Score for null
hypothesis of no difference in proportion

1.08

1.64

1.90

pvalue

0.281

0.100

0.057

Observations:
The change in the proportion of people with large health
care expenses is larger for highincome people than lowincome people.
The null hypothesis for the lowincome group is not close to
being significant at conventional levels of significance.
The null hypothesis for the highincome group is basically
at the cusp of significance.
The null hypothesis of the total sample is rejected at most
levels of significance.
Comments:
It seems odd that we can reject the null hypothesis for the test
conducted on the entire sample but cannot do so for the part of the population where
differences in the proportions are largest.
Even though the difference in proportions is largest for the
highincome sample the inclusion of people from the lowincome sample creates a
significant result.
I rightly excluded children and people over age 65 from the
analysis of changes in people with large health care bill because the group
most affected by the ACA was workingage adults. The inclusion of other groups would almost
certainly have created a more significant result. The difference in proportions would likely be
smaller but more significant because of the larger sample size.
I could have conducted the analysis on 5 different income
groups. I strongly suspect that the
null hypothesis of differences in the proportion of people with large health
care bills would have been accepted for all five groups.
The key lesson from this post is that sample size matters
and even surveys that appear larger aren’t large enough to answer certain
questions for subsets of the population. This problem is especially severe when one studies events that occur relatively infrequently. The decrease in frequency from 1.41%to 1.64% is only 0.24 percentage points but this is over 17% of the initial base.
Authors Note: I am working
very hard on my health care blog to address some of the issues raised by
Trump’s plan to repeal and replace the ACA.
Go to the following post for a list of my questions.
No comments:
Post a Comment