Applying
MedianMedian Lines to Polling Data
This post uses the medianmedian line approach to organize
data from multiple polls over time.
My first attempt using this technique involves an analysis
of polling data on the SandersClinton margin in the New Hampshire primary.
The medianmedian line approach involves rearranging data
into three groups based on an explanatory variable. In this case, the polling data is sorted
into three chronological groups. Group
one occurred first, group two second and group three occurred most
recently.
The slope in polling results for group one and group three
is calculated.
The slope from group one and group three is applied to the
point (X, Y) where X is the average of the three time periods and Y is the average
of the poll margins
This medianmedian line is used to obtain an estimate of the
election outcome, which incorporates the trend change in polls over time.
Background on
medianmedian lines:
I was unaware of this technique until I looked at some material
in my son’s online algebra course from Johns Hopkins University.
I then found some material on the web that uses this method.
Some videos on medianinmedian lines:
Question: The table below has data on the Sanders
versus Clinton margin for 33 polls prior to the New Hampshire Primary. Poll number 33 is the poll closest to the
election. Poll number 1 is the poll
most prior to the election for the selected polls.
Calculate the medianmedian line for the SandersClinton
margin for these 33 polls.
New Hampshire Democratic
Primary Poll Results


Poll

Poll #

Sanders Clinton Margin
in NH

ARG

33

9

UMass Lowell/7News

32

16

CNN/WMUR

31

26

Emerson

30

12

ARG (Tracking)

29

12

Monmouth

28

10

ARG (Tracking)

27

11

UMass/7News (Tracking)

26

17

CNN/WMUR

25

23

Boston Herald/FPU

24

7

ARG (Tracking)

23

12

UMass/7News (Tracking)

22

14

ARG (Tracking)

21

16

UMass/7News (Tracking)

20

15

Boston Globe/Suffolk

19

9

CNN/WMUR

18

31

WBUR/MassINC

17

15

Gravis

16

16

NBC/WSJ/Marist

15

20

UMass/7News (Tracking)

14

22

ARG (Tracking)

13

16

UMass/7News (Tracking)

12

29

UMass Amherst/WBZ

11

23

UMass/7News (Tracking)

10

33

ARG

9

6

UMass/7News (Tracking)

8

31

CNN/WMUR

7

23

Boston Herald/FPU

6

20

Emerson

5

8

ARG

4

7

Boston Herald/FPU

3

16

NBC/WSJ/Marist

2

19

Suffolk

1

9

Calculation of the
MedianMedian Line:
Step One: Divide the data into three groups of equal or
nearly equal size. Take median of X (poll
number) and Y (poll result) for the three groups.
Median Dates and Sanders
Clinton Margins for Three Groups


Median Date

Median Margin


M1

6

19

M2

17

16

M3

28

12

The medians for the three groups M3 being closest to the
election and M1 being furthest for the election are presented in the table
above. These numbers give Sanders a
healthy margin but suggest that there was some tightening near the
election. We know now that there was no tightening.
Step Two: Get the slope of the medianmedian line.
The slope of the line is the slope defined by M1 and
M3. This slope is 7/22 or =0.31818.
The calculation for the slope is (1219)/(286)
Step Three: Get the equation for the line.
The line goes though the point MAVG

Date

Poll Margin

MAVG

17

15.67

Get the equation from the pointslope form.
(y15.67) = 0.31818 * (x17)
y=10..26  0.31818 *
X.
This medianmedian line says that the margin between Sanders
and Clinton in NH should have narrowed to single digits.
Concluding Thoughts:
The polls nearing the election indicating some narrowing of
the margin were clearly incorrect.
I need to apply this method to some other primaries. Nate
Silvers rates polls based on their perceived reliability. I might be better off looking at a smaller
number of more reliable polls.
More experiment with getting rid of the noise in polling
data will follow.
