The Reproducibility Project, the giant study to re-run experiments reported in three top psychology journals, has just published its results and it’s either a disaster, a triumph or both for psychology.
You can’t do better than the coverage in The Atlantic, not least as it’s written by Ed Yong, the science journalist who has been key in reporting on, and occasionally appearing in, psychology’s great replication debates.
Two important things have come out of the Reproducibility Project. The first is that psychologist, project leader and now experienced cat-herder Brian Nosek deserves some sort of medal, and his 270-odd collaborators should be given shoulder massages by grateful colleagues.
It’s been psychology’s equivalent of the large hadron collider but without the need to dig up half of Switzerland.
The second is that no-one quite knows what it means for psychology. 36% of the replications had statistically significant results and 47% had effect sizes in a comparable range although the effect sizes were typically 50% smaller than the originals.
When looking at replication by subject area, studies on cognitive psychology were more likely to reproduce than studies from social psychology.
Is this good? Is this bad? What would be a reasonable number to expect? No one’s really sure, because there are perfectly acceptable reasons why more positive results would be published in top journals but not replicate as well, alongside lots of not so acceptable reasons.
The not-so-acceptable reasons have been well-publicised: p-hacking, publication bias and at the darker end of the spectrum, fraud.
But on the flip side, effects like regression to the mean and ‘surprisingness’ are just part of the normal routine of science.
‘Regression to the mean’ is an effect where, if the first measurement of an effect is large, it is likely to be closer to the average on subsequent measurements or replications, simply because things tend to even out over time. This is not a psychological effect, it happens everywhere.
Imagine you record a high level of cosmic rays from an area of space during an experiment and you publish the results. These results are more likely to merit your attention and the attention of journals because they are surprising.
But subsequent experiments, even if they back up the general effect of high readings, are less likely to find such extreme recordings, because by definition, it was their statistically surprising nature that got them published in the first place.
The same may well be happening here. Top psychology journals currently specialise in surprising findings. The editors have shaped these journal by making a trade-off between surprisingness and stability of the findings, and currently they are tipped far more towards surprisingness. Probably unhealthily so.
This is exactly what the Reproducibility Project found. More initially surprising results were less likely to replicate.
But it’s an open question as to what’s the “right balance” of surprisingness to reliability for any particular journal or, indeed, field.
There’s also a question about reliability versus boundedness. Just because you don’t replicate the results of a particular experiment it doesn’t necessarily mean the originally reported effect was a false positive. It may mean the effect is sensitive to a particular context that isn’t clear yet. Working this out is basically the grunt work of science.
Some news outlets have wrongly reported that this study shows that ‘about two thirds of studies in psychology are not reliable’ but the Reproducibility Project didn’t sample widely enough across publications to be able to say this.
Similarly, it only looked at initially positive findings. You could easily imagine a ‘Reverse Reproducibility Project’ where a whole load of original studies that found no effect are replicated to see which subsequently do show an effect.
We know study bias tends to favour positive results but that doesn’t mean that all negative findings should be automatically accepted as the final answer either.
The main take home messages are that findings published in leading journals are not a good guide to invariant aspects of human nature. And stop with the journal worship. And let’s get more pre-registration on the go. Plus science is hard.
What is also clear, however, is that the folks from the Reproducibility Project deserve our thanks. And if you find one who still needs that shoulder massage, limber up your hands and make a start.
Link to full text of scientific paper in Science.
Link to coverage in The Atlantic.
7 thoughts on “Don’t call it a comeback”
What does invariate mean? It’s not turning up on online dictionaries.
It means I left a typo in the text. Now fixed. Thanks!
Reading your observation that certain variables have, perhaps, not been identified and correlated to for causation, reminded me of a quote that I came across on Winnower’s advice for surviving in academia. And guess what? It was from Brian Nosek himself! All theories are wrong in some important way, so don’t get caught up in defending yours. The best person to take down your theory and replace it with something better is you.
In writing, lead with the evidence, follow with the explanation. Explanations will change over time, evidence will always persist.
Getting a positive result may be the key incentive in the present academic culture, but you may learn more from your negative results. Innovation blossoms from when our expectations are violated, not when they are confirmed.
“Find ways to share all your results, positive or negative, beautiful or ugly, and how you obtained them. Someone, perhaps your future self, will thank you later.” He’s a believer in the grunt work of science. 🙂
I am academically trained in science and technology, not psychology. However, I read widely and from that reading, the recent findings from the Reproducibility Project do not surprise me. This is not a new story. One of my formative books was Martin L. Gross “The Psychological Society: The impact — and the failure — of psychiatry, psychotherapy, psychoanalysis and the psychological revolution.” [Random House, 1978]
Extensively researched and referenced, Gross’s work demonstrates amply what has been known for 60 years. Application of then-available psychological therapies in moderate psychological distress generates improvement in no more people than being placed on a six-month wait list for resident facility treatment. I strongly doubt that elaborated theories of psychological healing have much improved on that record. No matter what type of therapy you select, about 50% of all candidates will say they have improved six months later — just like those who don’t go through therapy.
Psychiatry and psychology are now undergoing a cultural crisis of confidence. The evidence is strong that a wholesale reconsideration of the evidence and revision of fundamental theory is desperately needed from the ground up. We need to burn this edifice of mythologies and shamanism to the ground and start over.
Richard, with your permisión; I put the link of the book You cited here:
Interesting, and I am looking forward “The end of sanity” too. Thanks for sharing.
No permission required, Marc. The book you have linked appears to be a second edition of the one I read, but should cover the same ground. Also useful reading is the more current treatise “Psychiatry Under the Influence – Institutional Corruption, Social Harm, and Prescriptions for Change” by Robert Whitaker and Dr. Lisa Cosgrove. The latter book assembles the extensive evidence that both academic and practicing psychiatrists have been corrupted by pharmaceutical company money, often with the help of the APA.
“Getting a positive result may be the key incentive in the present academic culture, but you may learn more from your negative results.”
Here we have the fundamental tragedy of our academic incentive system. Work that you learn from is not the same as work that advances your career. In other words, we’re rewarding work that is advertised well, rather than work that we can actually learn from.