On cherry picking

Share

I often read articles in the media where polling is casually misrepresented or misinterpreted and ponder writing something... but usually pass it by. While it annoys me, it's not really enough to make a big fuss about. It does not warrant the much sought after "UKPR Crap media reporting of polls award".

On the other hand, it bothers me that these things seep into the consciousness - if they sneak past once, then other people quote it and before you know it becomes accepted wisdom. My initial reason for starting bloging about polls in 2004 was because media reporting of them was often so slapdash and shoddy, and I should perhaps try to chase up more of this.

For that reason, here are two that I've spotted in the last few days. I hasten to add that they are not particularly unusual in these errors... they just happen to be the culprits I've spotted most recently!

In an article in the Telegraph Christina writes that the polling figures prove right her belief that the Conservative party is not struggling amongst women, citing an article by Nadine Dorries in ConHome here that claims "However, according to a You Gov survey the number of women intending to vote for the Conservative Party today in comparison to during the general election is down by just one point. The greater dip is in fact with men with a figure of minus three."

As it happens, YouGov don't show that at all (which should be slightly embarrassing for Odone, as it suggests she took what Nadine Dorries said without actually looking at the polling). The polling that shows the Conservatives down 1 point amongst women, but 3 points down amongst men does exist, but is actually aggregated data from Ipsos MORI's political monitors this year.

Of course, it really would be a bit petty to haul someone over the coals just for attributing poll findings to the wrong company. The problem here though is that YouGov data looking at differences between men and women also exists, and shows a different picture to MORI. While MORI have the Conservatives having lost marginally more support amongst men than women, YouGov show the Conservatives having a significant advantage amongst women back in 2010, which has subsequently disappeared. I looked at the two sets of data in more detail here.

Two lessons here. First - don't believe what people writing about polls claim they show, go and look at the figures yourself. Secondly, polls may have conflicting messages, so don't assume the first thing you find tells the whole story, keep looking (and certainly if there are conflicting messages, don't cherry pick the one that agrees with you!)

The second case is this story from PoliticalScrapbook, which claims that YouGov are saying the "Liberal Democrats slump to three percent with 18-24 year olds". In one of YouGov's daily polls last week in the 18 to 24 cross break the Liberal Democrats were indeed at 3 points, but this was not typical. YouGov's daily polls typically include around 120 people under the age of 25. A sample size this low has a margin of error of plus or minus 9 points, in the case of the poll that had the Lib Dems on 3%, the sample had only 87 people under 25, a margin of error of plus or minus eleven points. Suffice to say, one should not give much weight to any individual cross break with a sample size this low.

A brief glance over the last couple of weeks of YouGov polls shows the level of Lib Dem support amongst under 25s varying between that 3 points at the low end, and 16 points at the high end... exactly as one would expect to find in a cross-break with a small sample size and a high margin of error. To take the 3 points as typical is misleading at best, though of course, that's not to distract from the fact that the Lib Dems have lost a LOT of support from younger people.

Lessons here are that one should give due scepticism to figures from cross-breaks in polls, and pay particular attention to the sample sizes. A poll of 1000 people may have a margin of error of 3 points overall, but small cross breaks have much larger margins of error and are much more volatile. Secondly, be careful of cherrypicking data, in any stream of data there will be outliers at either end, and there is a temptation for those looking for sensational stories to pick out and highlight those outliers as if they are deeply meaningful. They aren't.

UPDATE: Tonight's YouGov poll for the Sunday Times has topline figures of CON 35%, LAB 41%, LDEM 9%. As usual, I'll update properly tomorrow.