There is a clich√© in media stories where figures for a disease or condition are quoted followed by a statement that “the true figures may be higher”. Sampling errors mean that initial figures are equally as likely to be under-estimates as over-estimates but we only ever seem to be told that the condition is under-detected.
For example, this is from a recent (actually pretty good) New Scientist article about gender identity disorder (GID) in children, a condition where children who are biologically male feel female and vice versa:
It is unclear how common GID is among children, but many transsexual adults say they felt they were “in the wrong body” from an early age. The incidence of adult transsexualism has been estimated at about 1 in 12,000 for male-to-females, and around in 1 in 30,000 for female-to-males, although transsexual lobby groups say the true figures may be far higher.
These estimates are usually drawn from prevalence studies where a maybe a few hundred or thousand people are tested. The researchers extrapolate from the number of cases to make an estimate of how many people in the population as a whole will have the condition.
These estimates are made with statistical tests which give a margin of error, meaning that within a certain range, typically described by confidence intervals, the real figure is likely to be between a range which equally includes both higher and lower values than the quoted amount.
For any individual study you can validly say that you think the estimate is too low, or indeed, too high, and give reasons for that. For instance, you might say that your sample was mainly young people who tend to be healthier than the general public, or maybe that the diagnostic tools are known to miss some true cases.
But when we look at reporting as a whole, it almost always says the condition is likely to be much more common than the estimate.
For example, have a look at the results of this Google search:
“the true number may be higher” 20,300 hits
“the true number may be lower” 3 hits
You can try variations on the phrasing and see the same sort of pattern emerges. I’m curious as to why this bias occurs, or whether there’s another explanation for it.