Question: The data in the table below contains
information on poll results for 12 polling reports on the 2016 Presidential
election. Each polling report involves
two estimates on the difference in the percentage of people voting for Hillary
Clinton and the percentage of people voting for Donald Trump. One estimate provides the differential
between Clinton and Trump when Clinton and Trump are the only allowable
options. The other estimate provides
the differential between Clinton and Trump when thirdparty candidates are
included as extra options.
It is possible to interpret the differential in the paired
differences of polls as the impact of the third party candidates on the
ClintonTrump margin.
How does the inclusion of third party candidates impact the
estimated Clinton minus Trump differential?
Provide and discuss descriptive statistics on the
Clinton/Trump differential for the two different polling techniques.
Test the null hypothesis that the inclusion of the third
party candidates does not impact polling results. Discuss the test results.
The poll outcomes are rounded to an integer percentage
difference. Discuss problems caused by
rounding error.
The Data: Data from 12 polling efforts are presented
below. Each polling effort contains two
estimates of the ClintonTrump differential – one with only the two main
candidates and the other with third party candidates included.
Clinton Minus Trump Percentage
For National Polls With and
Without Third Party Candidates


Poll Name

Top Two Candidates
Only

Third Party Candidates
Included

Difference


31Aug

Fox News

6

2

4

31Aug

Reuters

1

2

1

31Aug

Economist You Gov

5

5

0

29Aug

Monmouth

7

7

0

25Aug

Quinnipac

10

7

3

25Aug

Reuters

7

3

4

24Aug

Economist You Gov

4

3

1

18Aug

Reuters Ipso

5

2

3

17Aug

Gravis

8

4

4

11Aug

Reutuers

6

5

1

10Aug

Bloomberg

6

4

2

9Aug

NBC News

10

6

4

The source of this data is Real Clear Politics.
Descriptive Statistics:
Below are descriptive statistics for the Clinton Minus Trump
differential with and without third party candidates included and for the
difference in the two estimates.
Descriptive Statistics
for the Clinton Minus Trump Differential


Two Main Candidates Only

ThirdParty Candidates
Included

Difference


Average

6.25

4.17

2.08

Standard Deviation

2.49

1.85

1.83

Median

6.00

4.00

2.50

25th Percentile

5.00

2.25

0.25

75th Percentile

7.25

5.25

4.00

Skew

0.34

0.33

0.36

Kurtosis

0.86

1.20

1.43

Note: The difference column is not the difference
between the two poll types. For
example, the median is the median of the 12 differenced values, which is 2.50
and is not equal to 64.
Observations:
Looking at the raw data it is apparent that all polls have Clinton
leading Trump. There is not a single
negative differential for polls regardless of the exclusion or inclusion of
thirdparty candidates.
The race is much closer when evaluations are based on polls
with third party candidates included than when the respondents are forced to
choose the main two candidates only.
The median difference is 4 points when third party candidates are
included compared to 6 points when the respondent must choose between only
Clinton and Trump.
The standard deviation of the ClintonTrump differential is
around 35% higher for the polls that limit choices to the two main candidates than
for the polls that also include the thirdparty candidates.
The skew of the ClintonTrump differential is slightly
negative for polls that only include two candidates and slightly positive for
polls that include thirdparty candidates.
The Kurtosis statistics suggest the distribution is a lot
flatter when thirdparty candidates are included and a lot more peaked when
third party candidates are excluded.
Hypothesis Tests:
The situation presented here involves pairs of observations
for all polling groups. Each polling group provides an estimate for their
sample when respondents are forced to choose between two candidates and when
respondents may also consider a thirdparty candidate. It therefore makes sense to conduct a paired
ttest.
The mean of the 12 differences is 2.08333. The standard error (the standard deviation
divided by the square root of 12) is 0.52884.
Based on these two numbers we get a tstatistic of 3.934. The data follow the tdistribution with
11 (n1) degrees of freedom. The pvalue
for t=3.934 with 11 degrees of freedom is equal to 0.0023. I am treating this situation as twotailed
test.
Since the sample is small and there is some evidence of
nonnormality it makes sense to conduct a nonparametric test. The signedrank test looks at the difference
in the sign of poll results from the two polling questions. It compares the expected number of positive,
negative and zero sign differences with the observed number.
Observed and Expected
Signs from Difference in Poll Results


Sign

Observed

Expected

Positive

9

5

Negative

1

5

Zero

2

2

All

12

12

The pvalue for the
test statistic is based on the chisquared value. For the twotailed test the pvalue is
0.0215.
Issues Related to
Rounding: The poll results presented
at Real Clear Politics the source of the data in this post round the
differential between Clinton and Trump to an integer percentage. This rounding adds measurement error to each
observation and increases the variability of the point estimates. Rounding may have altered the estimated
statistics by a great deal.
Also, I would like to examine the relationship between votes
for Johnson and the difference in the ClintonTrump margin between the two polling
techniques. The rounding method
increases measurement error in all variables, which will reduce the reliability
of results from this type of model.
Concluding Thoughts: The difference between polls that include
thirdparty candidates and polls that exclude such candidates suggest that
right now the third party candidates help Trump more than Clinton. However, this relationship might not hold if
Johnson takes off in the polls and makes the debate stage. Statistical relationships that hold over one
range of an explanatory variable often do not hold out of range.
I will continue to track the disparities in polls and will
soon analyze the question of how the third party candidates might affect this
race with a more advanced regression model.
Readers of this post should also look at my post that uses hypothesis testing to examine Senate polls.
http://dailymathproblem.blogspot.com/2016/08/hypothesistestingproblemforsenate.html
Readers of this post should also look at my post that uses hypothesis testing to examine Senate polls.
http://dailymathproblem.blogspot.com/2016/08/hypothesistestingproblemforsenate.html
No comments:
Post a Comment