As the UK election results roll in, one of the big shocks is the discrepancy between the pre-election polls and the results. All the pollsters agreed that it would be incredibly close, and they were all wrong. What gives?
Some essential psych 101 concepts come in useful here. Polls rely on sampling – the basic idea being that you don’t have to ask everyone to get a rough idea of how things are going to go. How rough that idea is depends on how many you ask. This is the issue of sampling error. We understand sampling error – you can estimate it, so as well as reducing this error by taking larger samples there are also principled ways of working out when you’ve asked enough people to get a reliable estimate (which is why polls of a country with a population of 70 million can still be accurate with samples in the thousands).
But, as Tim Harford points out in in this excellent article on sampling problems big data, with every sample there are two sources of unreliability. Sampling error, as I’ve mentioned, but also sampling bias.
sampling error has a far more dangerous friend: sampling bias. Sampling error is when a randomly chosen sample doesn’t reflect the underlying population purely by chance; sampling bias is when the sample isn’t randomly chosen at all.
The problem with sample bias is that, when you don’t know the ground truth, there is no principled way of knowing if your sample is biased. If your sample has some systematic bias in it, you can make a reliable estimate (minimising sample error), but you are still left with the sample bias – a bias you don’t know how big it is until you find out the truth. That’s my guess at what happened with the UK election. The polls converged, minimising the error, but the bias remained – a ‘shy tory‘ effect where many voters were not admitting (or not aware) that they would end up voting for the Conservative party.
The exit polls predicted the real result with surprising accuracy not because they minimised sampling error, but because they avoided the sample bias. By asking the people who actually turned up to vote how they actually voted, their sample lacked the bias of the pre-election polls.