Thursday, February 18, 2016

Applying Median-Median Lines to Polling Data

Applying Median-Median Lines to Polling Data

This post uses the median-median line approach to organize data from multiple polls over time.  

My first attempt using this technique involves an analysis of polling data on the Sanders-Clinton margin in the New Hampshire primary.

The median-median line approach involves rearranging data into three groups based on an explanatory variable.   In this case, the polling data is sorted into three chronological groups.   Group one occurred first, group two second and group three occurred most recently.  

The slope in polling results for group one and group three is calculated. 

The slope from group one and group three is applied to the point (X, Y) where X is the average of the three time periods and Y is the average of the poll margins

This median-median line is used to obtain an estimate of the election outcome, which incorporates the trend change in polls over time.

Background on median-median lines:

I was unaware of this technique until I looked at some material in my son’s on-line algebra course from Johns Hopkins University.

I then found some material on the web that uses this method.



Some videos on median-in-median lines:




Question:   The table below has data on the Sanders versus Clinton margin for 33 polls prior to the New Hampshire Primary.   Poll number 33 is the poll closest to the election.   Poll number 1 is the poll most prior to the election for the selected polls.


Calculate the median-median line for the Sanders-Clinton margin for these 33 polls.




New Hampshire Democratic Primary Poll Results
Poll
Poll #
Sanders- Clinton Margin in NH
ARG
33
9
UMass Lowell/7News
32
16
CNN/WMUR
31
26
Emerson
30
12
ARG (Tracking)
29
12
Monmouth
28
10
ARG (Tracking)
27
11
UMass/7News (Tracking)
26
17
CNN/WMUR
25
23
Boston Herald/FPU
24
7
ARG (Tracking)
23
12
UMass/7News (Tracking)
22
14
ARG (Tracking)
21
16
UMass/7News (Tracking)
20
15
Boston Globe/Suffolk
19
9
CNN/WMUR
18
31
WBUR/MassINC
17
15
Gravis
16
16
NBC/WSJ/Marist
15
20
UMass/7News (Tracking)
14
22
ARG (Tracking)
13
16
UMass/7News (Tracking)
12
29
UMass Amherst/WBZ
11
23
UMass/7News (Tracking)
10
33
ARG
9
6
UMass/7News (Tracking)
8
31
CNN/WMUR
7
23
Boston Herald/FPU
6
20
Emerson
5
8
ARG
4
7
Boston Herald/FPU
3
16
NBC/WSJ/Marist
2
19
Suffolk
1
9

Calculation of the Median-Median Line:

Step One:  Divide the data into three groups of equal or nearly equal size.   Take median of X (poll number) and Y (poll result) for the three groups.


Median Dates and Sanders Clinton Margins for Three Groups
Median Date
Median Margin
M1
6
19
M2
17
16
M3
28
12


The medians for the three groups M3 being closest to the election and M1 being furthest for the election are presented in the table above.   These numbers give Sanders a healthy margin but suggest that there was some tightening near the election.   We know now that there was no tightening.

Step Two:  Get the slope of the median-median line.

The slope of the line is the slope defined by M1 and M3.   This slope is -7/22 or =0.31818.

The calculation for the slope is (12-19)/(28-6)

Step Three:  Get the equation for the line.

The line goes though the point MAVG



Date
Poll Margin
MAVG
17
15.67


Get the equation from the point-slope form.

(y-15.67) = -0.31818 * (x-17)

y=10..26 -  0.31818 * X.

This median-median line says that the margin between Sanders and Clinton in NH should have narrowed to single digits.


Concluding Thoughts:

The polls nearing the election indicating some narrowing of the margin were clearly incorrect.

I need to apply this method to some other primaries. Nate Silvers rates polls based on their perceived reliability.  I might be better off looking at a smaller number of more reliable polls.

More experiment with getting rid of the noise in polling data will follow.



No comments:

Post a Comment