Wednesday, October 12, 2016

Unweighted Two-Way Association Tests


Unweighted Two-Way Association Tests


A recent post on my health care blog looked at some statistics on differences in the age profile of people obtaining private health coverage through state exchanges versus people obtaining private health coverage through their employer.   

A post on the economics implications of the different age profiles of state exchanges and employment-based insurance:



This post uses data on this issue to demonstrate how to conduct an un-weighted two-way association test or the Pearson chi-squared test for a two-way contingency table.

Question One:  The table below has information on ages of people with private health insurance plan and information on whether the private health insurance plan was obtained from a state exchange or some other source.   (The primary source of private health insurance for households with working-age people is the person’s employer.)   Test the hypothesis that there is an association between the age category variable and the plan venue variable.  Use the Pearson chi-square statistic to test this null hypothesis. 




Contingency Table for Private Plan Type
 and Age Category
age_cat
Exchange Plan
Not Exchange Plan
(primarily employer-
baed)
Total
<=21
645
14,889
15,534
21<age<=26
217
3,711
3,928
26<age<=35
497
7,100
7,597
35<age<=45
559
8,211
8,770
45<age<=55
713
8,984
9,697
55<age<=65
734
8,031
8,765
Total
3365
50,926
54,291


Answer:  The test for an association of two categorical variables involves comparing the observed values to the expected values.   The observed values are above.   The expected values are equal to the product of the row total and column total divided by the total sample.



Expected Plan Types by Age Category
age_cat
Exchange Plan
Not Exchange Plan
<=21
962.8
14571.2
21<age<=26
243.5
3684.5
26<age<=35
470.9
7126.1
35<age<=45
543.6
8226.4
45<age<=55
601.0
9096.0
55<age<=65
543.3
8221.7
Total


Some observations: 


Note that actual number of young cohorts (less than or equal to age 26)  are lower than expected number for the exchange plan.   Actual values of other cohorts are higher than expected value for remaining older four age groups.  

Note that actual group size exceeds expected group size for younger groups in the not exchange plan group.   Actual is below expected for the four older cohorts in not-exchange plan group. 

The younger age of employer-sponsored plans is not entirely the result of young people not getting covered.   A disproportionate number of kids and young adults get insurance from the health plan of their parents with employer-based plans.


The calculation of the chi-square test statistic:

The chi-square test statistic used in this problem is the sum of (0-E)2/E where O is the observed cell count and E is the expected cell count.

See the table below for the calculation of the chi-square statistic.  (The top rows are values for state exchange and the bottom rows are values for not state exchanges.)


Calculation of Chi-Squared Statistic
age_cat
Observed
Expected
<=21
645
963
104.9
21<age<=26
217
243
2.9
26<age<=35
497
471
1.5
35<age<=45
559
544
0.4
45<age<=55
713
601
20.9
55<age<=65
734
543
67.0
<=21
14889
14571
6.9
21<age<=26
3711
3685
0.2
26<age<=35
7100
7126
0.1
35<age<=45
8211
8226
0.0
45<age<=55
8984
9096
1.4
55<age<=65
8031
8222
4.4
Chi Square
210.5


There are five degrees of freedom in this problem (r-1) x (c-1) or (6-1) x ((2-1).


The p-value for this test is around 0.00001.  

We reject the null hypothesis that there is no association between the row and column variables. 


Resource:  

A great resource for this type of statistical problem:



A future post will discuss how to calculate this chi-square test when the un-weighted survey data is not representative of the entire population.










No comments:

Post a Comment