Sampling error’s more dangerous friend

As the UK election results roll in, one of the big shocks is the discrepancy between the pre-election polls and the results. All the pollsters agreed that it would be incredibly close, and they were all wrong. What gives?

Some essential psych 101 concepts come in useful here. Polls rely on sampling – the basic idea being that you don’t have to ask everyone to get a rough idea of how things are going to go. How rough that idea is depends on how many you ask. This is the issue of sampling error. We understand sampling error – you can estimate it, so as well as reducing this error by taking larger samples there are also principled ways of working out when you’ve asked enough people to get a reliable estimate (which is why polls of a country with a population of 70 million can still be accurate with samples in the thousands).

But, as Tim Harford points out in in this excellent article on sampling problems big data, with every sample there are two sources of unreliability. Sampling error, as I’ve mentioned, but also sampling bias.

sampling error has a far more dangerous friend: sampling bias. Sampling error is when a randomly chosen sample doesn’t reflect the underlying population purely by chance; sampling bias is when the sample isn’t randomly chosen at all.

The problem with sample bias is that, when you don’t know the ground truth, there is no principled way of knowing if your sample is biased. If your sample has some systematic bias in it, you can make a reliable estimate (minimising sample error), but you are still left with the sample bias – a bias you don’t know how big it is until you find out the truth. That’s my guess at what happened with the UK election. The polls converged, minimising the error, but the bias remained – a ‘shy tory‘ effect where many voters were not admitting (or not aware) that they would end up voting for the Conservative party.

The exit polls predicted the real result with surprising accuracy not because they minimised sampling error, but because they avoided the sample bias. By asking the people who actually turned up to vote how they actually voted, their sample lacked the bias of the pre-election polls.

2 thoughts on “Sampling error’s more dangerous friend”

We get the same here – many journalists (including British ones) will only cite Pew Research polls for views about opinions or voting in the US. But Pew for the longest time only dialed landlines, and mostly in the afternoon. Even now that they boast that they call cell phones, what busy young professional is going to answer such a call when they (presumably) see an 800 number? So the results look like a sample of retired folks sitting home by their phones early in the afternoon. Wouldn’t media want a cross-section of many survey organizations with varying methods?

The sampling bias issue you describe exists potentially in every poll in every election in every country, yet this particular instance seems to be more off than usual. Also requires the different pollsters to have similar biases, as the mean of uncorrelated biases is no bias.

What I’d like to know is how people would have answered the exit poll question “have you changed your mind during the last 24 hours”, grouped by actual vote. Did any exit pollster ask?