## Saturday, September 3, 2016

### Paired Difference Versus Standard t-tests -- an election example

Paired Difference Versus Standard t-tests

A previous post considered differences in poll results when the respondents are asked to choose between Clinton and Trump and poll results when respondents are allowed to select a third-party candidate.

In the previous post, I employed a paired difference test and a non-parametric sign test on pairs and concluded that the Clinton-Trump margin was significantly smaller when polls included options on third-party candidates.

This post considers whether a standard t-test on means would lead to the same result.   I also discuss which test, (the paired difference test or the comparison of means) is more appropriate.

Question: Below is information on the sample averages, sample standard deviation and the sample size for two types of polls.

Test the hypothesis that the variance of the poll results from polls including only the two main candidates is equal to the variance of poll results when third-party candidates are included.

Test the hypothesis that the mean Clinton-Trump margin is identical for the two polling techniques.

Discuss whether the hypothesis test presented here is superior or inferior to the paired difference technique used in the previous post.

 Clinton-Trump Margin for Two Different Poll Types Two Man Candidates Third Candidates Included Average 6.25 4.17 Standard Deviation 2.49 1.85 Sample Size 12 12

Analysis:   The F-test for the equality of the variances is F=1.81 (2.49/1,85)2  The two-tailed p-value for this F statistic is 0.3387.   The two-tailed test is appropriate because I don’t have a clear prior as to which variance is larger.

I reject the hypothesis that the variances are equal.

I then conduct the t-test on equal means based on the assumption that variances are not equal.  I get a t-value of 2.33, which is associated with a p-value of 0.0305.

Whether or not you will reject the null hypothesis of no differences in means depends on the level of significance that you choose for the test.  If you choose a level of significance of 0.01 you will not reject the null hypothesis.   If you choose the level of significance of 0.05 you reject the null hypothesis.

The p-value in the previous post that used the paired t-test was 0.0023.  The paired t-test unambiguously rejects the null hypothesis of no difference in means at conventionally used critical regions.

What Test Is Better:  For this database the paired difference test provides an unambiguous conclusion, reject the null hypothesis that the mean difference in poll results is identical in favor of the hypothesis that the mean difference is not zero.   Results from the standard comparison of means are less conclusive.

A strong case can be made for the use of the paired-difference test in this example.   The pairs in this study occur naturally.  The pairs involve the two polling techniques on the same day by the same company.   By evaluating paired differences in this manner we are getting rid of extraneous variability caused by factors that are unrelated to the inclusion or exclusion of the third-party option.

One source of variability involves changes in the general election victory likelihoods over time.   Movements in the net Clinton Trump margin over time from +10 to +2 or, in the future, even lower, have nothing to do with whether the inclusion of questions on third-party candidates changes results.  Similarly differences in poll characteristics like sample size and whether the polls use both cell and landlines are irrelevant to the question of whether the inclusion of the third-party question impacts results.

The standard error from the unpaired test that simply compares means is extremely large because it measures variability in poll results from these other sources.

Some polling companies only have one question -- Trump versus Clinton or Trump, Clinton or a third-party candidate.   I have excluded all poll results that do not include both questions.   Many statisticians would argue that I should not be throwing away so much data.

It is possible to pair polls using different samples on the same date.   This would reduce variability based on changes in the electoral mood.  However, in my view it makes little sense to compare results from a poll of 400 people based exclusively on landlines only to results from a poll of 1000 people that uses both landlines and cell phones.

It is possible to handle this problem in a regression framework where poll result (Clinton Versus Trump margin) is the dependent variable.   The key explanatory variable in the model would be whether the poll allows for a third party option but the poll would also include questions on date of poll and other polling characteristics.   In this framework, a negative coefficient on the coefficient of the third-party dummy variable would indicate that third-party candidates appear to favor Trump.

The main advantage of the regression framework is that it allows for the use of both polls that ask only one question and polls that ask both question types.  The regression method also allows for the statistician to estimate the impact of third-party support on the Clinton-Trump margin.

Final Thoughts:  Often statisticians should use paired difference tests rather than a standard comparison of two means.  Fore example, a comparison of hourly sales at a Caribou Coffee versus Starbucks should pair the observations across hours because the change in sales over the day are irrelevant to whether one location dominates the other.   Similarly, a test comparing the number of people going to movies to the number of people watching TV should pair days because changes in habits which vary from day to day and month to month are net related to the long term differential between the number of people watching television versus the number of people watching movies.

I am planning more work on the election.  Please subscribe to my blog to get these posts.