YouGov and the Telegraph

Personally I try not to get involved in knocking down some of the more outre polling conspiracy theories, especially since recently lots of them have been YouGov related. On UKPR I've always tried to educate about how polls work and what the differences are, rather than campaign for the methodologies I personally rate. Certainly I try to cover all of them fairly, rather than being the defender of my own employer YouGov.

However, when they start turning up in the mainstream media I suppose I need to discuss them. Earlier on I linked to a post from Peter Kellner rebutting what looks set to be a hatchet job against YouGov in the Telegraph tomorrow. From the questions they asked Peter, I'm really not hopeful of anything sensible emerging.

In terms of the recent results, YouGov have been showing some of the narrower leads, but actually the polls have not been as contradictory as they sometimes seemed. The more established pollsters have been showing pretty much the same story - YouGov was showing a 4 point Tory lead last week, it seems to have moved up to around about 7 now. Ipsos-MORI's last poll had a 5 point lead. ComRes did not poll during the period of the real narroring leads, their only poll in March was post-budget and showed a 7 point lead, so the same as YouGov. TNS BMRB have, if anything, been showing smaller leads than YouGov.

ICM have tended to show slightly higher Tory leads than YouGov, but not by much, and nothing that can't be explained by their different approaches to likelihood to vote, and the long standing contrast between the two companies' Lib Dem scores.

The significant contrast has been with the newer online companies - Harris, Angus Reid and Opinium - who have all tended to show double point Conservative leads, though a methodology change has brought the Tory lead in Angus Reid's most recent poll down into single figures. For all we know the newer companies may turn out to know something that more established pollsters do not, but given they have little or no UK track record, the burden of proof should probably lie with them.

Secondly we come to the subject of weighting - this is a bit long I'm afraid. It is very easy to paint weighting negatively, to someone outside the industry it can easily be made to sound dodgy: the raw data showed X, but then the pollsters "weighted" it to show Y. In fact a lot of polling is counter-intuitive, asking just 1000 people and then weighting them? For anyone with nefarious intent it's always easy to make it sound suspicious.

One of my oldest pieces of advice here is do not fall in love with raw numbers. Unweighted data is not pure and unsullied - it is unrepresentative. The reason it is weighted is because pollsters can see its make up does not match the known demographics of the population, and weighting makes it representative. To give a simple example, we know that 52% of the adult population is female, so if a sample was only 50% female, all the women would be weighted upwards by 1.04 (i.e. rather than counting as one person, each female respondent would count as 1.04 people). The total weighted sample would then be 52% female, and it would be far more representative than the unweighted one.

There is a rather odd comment from Robert Winnett in his email to Peter about some Experian data not needing weighting because of its size. At the risk of boring regular readers who will have heard the tale many times before, size doesn't imply representativeness. Famously in 1936 the Literary Digest carried out a poll with a sample size of millions, and George Gallup carried out a poll of a few thousand... but used proper representive sampling techniques. Gallup correctly called the 1936 Roosevelt landslide, while the Literary Digest confidently predicted a victory for Alf Landon. What makes a poll useful is how representative it is - an unrepresentative poll, an unweighted poll, however large, is worthless.

A seperate issue is degree of weighting. To fully understand that, we need to go back to basics about how pollsters do their samples and why they need to weight. Most phone pollsters use quasi-random sampling - they rely upon the laws of probability to generate a representive sample. If a pollster could obtain a genuinely random sample it might not actually need very much weighting, in practice though samples are not actually random because of huge non-response rates, meaning that much chunkier weighting is necessary. For example, in ICM's last News of the World poll they needed to weight up 2005 Lib Dem voters to 1.41 and weight down 2005 Labour voters to 0.74

The alternative to quasi-random sampling is quota sampling - this basically equates to constructing a representative sample, rather than relying on probability to provide one. We know that 52% of adults are women and 48% are men, so you go out and interview 520 women and 480 men. Similarly, pollsters using quota sampling will try to interview the correct proportions of people in each social class and age group. This can result in very, very low levels of weighting. When MORI used to do face-to-face polling their weighting made hardly any difference at all, they didn't need to weight much by gender for example, because their interviewers deliberately chose the right proportions of men and women to start with. With quota sampling, a lot of the work of getting a representative sample goes into drawing up and filling the quotas rather than in the weighting.

With YouGov's method of sampling things are not as simple as the old face-to-face polling, or indeed as the way YouGov used to do things. Until last autumn YouGov used to invite people to specific surveys, so if they invited the correct number of people in each demographic group, not a lot of weighting was needed - all the work was done in defining and filling the quotas. Last autumn YouGov switched to inviting people to come and complete an unspecified survey, thus eliminating the problem of fast and slow respondents and allowing fast turnaround polls. The downside was that it allowed less tight control of who was invited to surveys, so more of the work needed to be done by weighting. In the months since then YouGov have been able to tweak the proportions of invites sent in order to meet the quotas more accurately in the first place and reduce the amount of weighting needed, so it is now less than it was back in January and February.

As a general rule, the less weighting you have to do the better, but what matters is that polls are weighted to the right targets, not how much weighting is needed to get there. The problem with large weightings is not that it skews samples, but that it reduces the effective sample size leading to more volatility. In practice however YouGov tends to be one of the less volutile pollsters so this does not appear to be a problem.

Moving on, the Telegraph ask a very odd question about whether Peter admits something was a rogue poll and whether he regrets publishing it, which suggests something of a misunderstanding of what a rogue poll is. A rogue poll means something specific. Too often it is used as a term of abuse, and this is wrong. When polls quote a margin of error (with a poll of 1000 people it is normally quoted as 3%), what they mean is that statistically 95% of the time the values in the poll will be within 3% of the "real" value. A rogue poll refers to the 5% of occassions when it will fall outside that margin of error.

This is an immutable part of statistical theory. 5% of ICM's polls will be rogue polls. 5% of Populus's polls will be rogue polls, 5% of MORI's will, 5% of Angus Reid's will, 5% of TNS's will and 5% of YouGov's will. No pollster is capable of defeating the laws of mathematics. Criticising a pollster for producing rogue polls is like criticising the lottery when it picks number 1 or number 49: it is the unavoidable functioning of the laws of mathematics.

The implication that pollsters shouldn't release polls that they think might turn out to be rogues is even more bonkers. If a poll produces an odd result you certainly double check everything for possible human error and anything weird or wacky in the sample, but if everything checks out you must publish. Can you imagine the response if it was discovered that a pollster had refused to release a poll because they thought the figures were too nice to the Conservatives, or too nice to Labour?

Peter has pretty much addressed the rest, though there is one thing worth noting. As I've mentioned here several times, YouGov ran daily polls for several weeks before the Sun started publishing them in order to test out the systems and parallel test it with old style polling. These polls were never intended to be published. However, since Peter has confirmed it in his post I'm now free to say there was one instance during the test polls, straight after the "Tears for Piers" interview when it showed the Conservative lead right down to a single point.

UPDATE: And here is the article, it is very light indeed. It is pretty much the sort of thing I mentioned in the article above, describing "weighting" in ominious terms that would make the naive or uneducated think something was fishy (including putting weighting in inverted commas!). YouGov also apparently use an undisclosed formula to weight data, in the sense that they regularly undisclose it in a list of the weighting targets used at the end of every published table.

YouGov and the Telegraph

Read more

What would happen if the Prime Minister lost their seat?

Teens Think Happiness and Wellbeing More Important Than Money from a Job

Latest General Election Projection: Knife Edge Coalitions

Local Elections Expectations Management