A reminder about crossbreaks

Share

I wrote last November about the dangers of cherrypicking out figures in crossbreaks to come up with sensationalist stories that don't actually reflect the truth - and I spend an inordinate amount of time nagging about not paying too much attention to regional crossbreaks. Nevertheless, they never seem to go away.

On Friday, for example, the New Statesman was getting overexcited about the crossbreak for under 25s in the most recent YouGov poll, which showed Labour 47 points ahead of the Conservatives amongst young people. The figures were based on a sample of only 71 people, so the margin of error was about 12 points

(in fact, given that the figures were re-percentaged to exclude Don't knows and Won't Votes it was actually even lower - only 45 people under 25 actually gave voting intentions, giving a margin of error of plus or minus 15 points.)

If the New Statesman had taken the time to look at other recent cross-breaks for young people it should have become clear that (a) the figures were very volatile, as you'd expect from such a small sub-sample and (b) that this was an outlier. The average figure for the rest of the last week was CON 24%, LAB 49%, a lead a little over half of Friday's (this is still a very large Labour lead of course, but not unsurprising given they have a 12 point lead nationally and there tends to be a correlation between age and voting intention, with young people more Labour and older people more Conservative).

Another example this week was David Skelton at Platform 10, citing regional cross-breaks from Populus polling to demonstrate that support for gay marriage isn't just amongst a metropolitan elite, but is actually higher in blue-collar Northern areas. Now, while I suspect David's ultimate argument is correct (after all, it's not like only Southern middle class people are gay or get married), the evidence he cites doesn't really hold up. 81% of respondents in the North East did indeed tell Populus that they supported gay marriage... but it was on a sample size of 45 people, giving a margin of error of 15 points and meaning support for gay marriage in the North East was not actually significantly different to that in London.

Here's what to remember about cross-breaks

1) Cross breaks often have small sample sizes and are not internally weighted.They are hence very volatile and imprecise, especially for things like age and region where some sample sizes are below 100, and very little weight should be given to them. For a sample size of 200 the margin of error rises to plus or minus 7 points, for 100 it rises to plus or minus 10 points.

2) Where you have a regular tracker such as the YouGov daily poll, the sheer volume of data means it is inevitable that volatile crossbreaks with large margins of error will sometimes produce results that look extreme. However odd these look, unless there is a sustained pattern they are not meaningful. If the actual figure is 50%, but you've only got 70 respondents, then you ARE sometimes going to get results showing 62% or 38%... purely from random variation.

3) All this goes double or triple for voting intention polls! For most polls the precise figures don't matter - it is much the same story if 30% of people support a policy as if 40% do. In contrast, there is a world of difference between Labour being at 30% and Labour being at 40%. When it comes to voting intention, crossbreaks in a single poll should basically be ignored.

A couple of months ago Lewis Baston asked me an interesting question on Twitter. Given that regional cross-breaks on polls are so consistently misrepresented and misunderstood, should pollsters publish them at all? It does make me ponder. My starting point is always that it is good for pollsters to be as transparent as possible, unless there is a good reason not to be open, we should be.

Some crossbreaks are very useful in understanding and interpretting polls - think, for example, of how much voting intention cross-breaks help our understanding of leader approval ratings, best PM figures or my bete noire of "would policy X make you more likely to vote Y" questions. Sometimes they do show interesting things (look, for example, at the huge gender contrast you find in polls on nuclear power or nuclear weapons), or many issues where there is a clear correlation with age. Regional cross-breaks are, admittedly, less obviously useful but there are many instances when cross breaks are extremely beneficial to our understanding of polls if looked out as crude indicators of trends and correlations, rather than taken out of context.

I wouldn't want to see pollsters stop giving out data, even data of limited use, for fear of it being misunderstood. The solution is really for political journalists to better understand polling and statistics. Some people will always misunderstand or misrepresent polls...but political journalists shouldn't, they are too important a part of politics today.