Thursday, September 1, 2016

The impact of third party candidates on the presidential election



Question:   The data in the table below contains information on poll results for 12 polling reports on the 2016 Presidential election.  Each polling report involves two estimates on the difference in the percentage of people voting for Hillary Clinton and the percentage of people voting for Donald Trump.   One estimate provides the differential between Clinton and Trump when Clinton and Trump are the only allowable options.    The other estimate provides the differential between Clinton and Trump when third-party candidates are included as extra options.

It is possible to interpret the differential in the paired differences of polls as the impact of the third party candidates on the Clinton-Trump margin.

How does the inclusion of third party candidates impact the estimated Clinton minus Trump differential?

Provide and discuss descriptive statistics on the Clinton/Trump differential for the two different polling techniques.

Test the null hypothesis that the inclusion of the third party candidates does not impact polling results.  Discuss the test results.

The poll outcomes are rounded to an integer percentage difference.   Discuss problems caused by rounding error.

The Data:  Data from 12 polling efforts are presented below.   Each polling effort contains two estimates of the Clinton-Trump differential – one with only the two main candidates and the other with third party candidates included.



Clinton Minus Trump Percentage For National Polls With and
 Without Third Party Candidates
Poll Name
Top Two Candidates
Only
Third Party Candidates Included
Difference
31-Aug
Fox News
6
2
4
31-Aug
Reuters
1
2
-1
31-Aug
Economist You Gov
5
5
0
29-Aug
Monmouth
7
7
0
25-Aug
Quinnipac
10
7
3
25-Aug
Reuters
7
3
4
24-Aug
Economist You Gov
4
3
1
18-Aug
Reuters Ipso
5
2
3
17-Aug
Gravis
8
4
4
11-Aug
Reutuers
6
5
1
10-Aug
Bloomberg
6
4
2
9-Aug
NBC News
10
6
4


The source of this data is Real Clear Politics.


Descriptive Statistics:

Below are descriptive statistics for the Clinton Minus Trump differential with and without third party candidates included and for the difference in the two estimates.


Descriptive Statistics for the Clinton Minus Trump Differential
Two Main Candidates Only
Third-Party Candidates Included
Difference
Average
6.25
4.17
2.08
Standard Deviation
2.49
1.85
1.83
Median
6.00
4.00
2.50
25th Percentile
5.00
2.25
0.25
75th Percentile
7.25
5.25
4.00
Skew
-0.34
0.33
-0.36
Kurtosis
0.86
-1.20
-1.43


Note:   The difference column is not the difference between the two poll types.   For example, the median is the median of the 12 differenced values, which is 2.50 and is not equal to 6-4.

Observations:

Looking at the raw data it is apparent that all polls have Clinton leading Trump.   There is not a single negative differential for polls regardless of the exclusion or inclusion of third-party candidates.

The race is much closer when evaluations are based on polls with third party candidates included than when the respondents are forced to choose the main two candidates only.   The median difference is 4 points when third party candidates are included compared to 6 points when the respondent must choose between only Clinton and Trump.

The standard deviation of the Clinton-Trump differential is around 35% higher for the polls that limit choices to the two main candidates than for the polls that also include the third-party candidates.

The skew of the Clinton-Trump differential is slightly negative for polls that only include two candidates and slightly positive for polls that include third-party candidates. 

The Kurtosis statistics suggest the distribution is a lot flatter when third-party candidates are included and a lot more peaked when third party candidates are excluded.

Hypothesis Tests:

The situation presented here involves pairs of observations for all polling groups. Each polling group provides an estimate for their sample when respondents are forced to choose between two candidates and when respondents may also consider a third-party candidate.  It therefore makes sense to conduct a paired t-test.

The mean of the 12 differences is 2.08333.  The standard error (the standard deviation divided by the square root of 12) is 0.52884.   Based on these two numbers we get a t-statistic of 3.934.      The data follow the t-distribution with 11 (n-1) degrees of freedom.  The p-value for t=3.934 with 11 degrees of freedom is equal to 0.0023.   I am treating this situation as two-tailed test.

Since the sample is small and there is some evidence of non-normality it makes sense to conduct a non-parametric test.  The signed-rank test looks at the difference in the sign of poll results from the two polling questions.  It compares the expected number of positive, negative and zero sign differences with the observed number.


Observed and Expected Signs from Difference in Poll Results
Sign
Observed
Expected
Positive
9
5
Negative
1
5
Zero
2
2
All
12
12


 The p-value for the test statistic is based on the chi-squared value.  For the two-tailed test the p-value is 0.0215.


Issues Related to Rounding:  The poll results presented at Real Clear Politics the source of the data in this post round the differential between Clinton and Trump to an integer percentage.   This rounding adds measurement error to each observation and increases the variability of the point estimates.    Rounding may have altered the estimated statistics by a great deal.

Also, I would like to examine the relationship between votes for Johnson and the difference in the Clinton-Trump margin between the two polling techniques.   The rounding method increases measurement error in all variables, which will reduce the reliability of results from this type of model.

Concluding Thoughts:  The difference between polls that include third-party candidates and polls that exclude such candidates suggest that right now the third party candidates help Trump more than Clinton.   However, this relationship might not hold if Johnson takes off in the polls and makes the debate stage.  Statistical relationships that hold over one range of an explanatory variable often do not hold out of range.

I will continue to track the disparities in polls and will soon analyze the question of how the third party candidates might affect this race with a more advanced regression model.

Readers of this post should also look at my post that uses hypothesis testing to examine Senate polls.

http://dailymathproblem.blogspot.com/2016/08/hypothesis-testing-problem-for-senate.html








No comments:

Post a Comment