The reproducibility of psychological science

The Reproducibility Project results have just been published in Science, a massive, collaborative, ‘Open Science’ attempt to replicate 100 psychology experiments published in leading psychology journals. The results are sure to be widely debated – the biggest result being that many published results were not replicated. There’s an article in the New York Times about the study here: Many Psychology Findings Not as Strong as Claimed, Study Says

This is a landmark in meta-science : researchers collaborating to inspect how psychological science is carried out, how reliable it is, and what that means for how we should change what we do in the future. But, it is also an illustration of the process of Open Science. All the materials from the project, including the raw data and analysis code, can be downloaded from the OSF webpage. That means that if you have a question about the results, you can check it for yourself. So, by way of example, here’s a quick analysis I ran this morning: does the number of citations of a paper predict how large the effect size will be of a replication in the Reproducibility Project? Answer: not so much

cites_vs_effectR

That horizontal string of dots along the bottom is replications with close to zero-effect size, and high citations for the original paper (nearly all of which reported non-zero and statistically significant effects). Draw your own conclusions!

Link: Reproducibility OSF project page

Link: my code for making this graph (in python)

Intuitions about free will and the brain

Libet’s classifc experiment on the neuroscience of free will tells us more about our intuition than about our actual freedom

It is perhaps the most famous experiment in neuroscience. In 1983, Benjamin Libet sparked controversy with his demonstration that our sense of free will may be an illusion, a controversy that has only increased ever since.

Libet’s experiment has three vital components: a choice, a measure of brain activity and a clock.

The choice is to move either your left or right arm. In the original version of the experiment this is by flicking your wrist; in some versions of the experiment it is to raise your left or right finger. Libet’s participants were instructed to “let the urge [to move] appear on its own at any time without any pre-planning or concentration on when to act”. The precise time at which you move is recorded from the muscles of your arm.

The measure of brain activity is taken via electrodes on the scalp. When the electrodes are placed over the motor cortex (roughly along the middle of the head), a different electrical signal appears between right and left as you plan and execute a movement on either the left or right.

The clock is specially designed to allow participants to discern sub-second changes. This clock has a single dot, which travels around the face of the clock every 2.56 seconds. This means that by reporting position you are reporting time. If we assume you can report position accurately to 5 degree angle, that means you can use this clock to report time to within 36 milliseconds – that’s 36 thousandths of a second.

Putting these ingredients together, Libet took one extra vital measurement. He asked participants to report, using the clock, exactly the point when they made the decision to move.

Physiologists had known for decades that a fraction of a second before you actually move the electrical signals in your brain change. So it was in Libet’s experiment, a fraction of a second before participants moved, a reliable change could be recorded using the electrodes. But the explosive result was when participants reported deciding to move. This occurred in between the electric change in the brain and the actual movement. This means, as sure as cause follows effect, that the feeling of deciding couldn’t be a timely report of whatever was causing the movement. The electrode recording showed that the decision had – in some sense – already been made before the participants were aware of having taken action. The brain signals were changing before the subjective experience of taking a decision occurred.

Had participants’ brains already made the decision? Was the feeling of choosing just an illusion? Controversy has raged ever since. There is far more to the discussion about neuroscience and free will than this one experiment, but its simplicity has allowed it to capture the imagination of many who think our status as biological creatures limits our free will, as well as those who argue that free will survives the challenge of our minds being firmly grounded in our biological brains.

Part of the appeal of the Libet experiment is due to two pervasive intuitions we have about the mind. Without these intuitions the experiment doesn’t seem so surprising.

The first intuition is the feeling that our minds are a separate thing from our physical selves – a natural dualism that pushes us to believe that the mind is a pure, abstract place, free from biological constraints. A moment’s thought about the last time you were grumpy because you were hungry shatters this illusion, but I’d argue that it is still a persistent theme in our thinking. Why else would we be the least surprised that it is possible to find neural correlates of mental events? If we really believed, in our heart of hearts, that the mind is based in the brain, then we would know that every mental change must have a corresponding change in the brain.

The second pervasive intuition, which makes us surprised by the Libet experiment, is the belief that we know our own minds. This is the belief that our subjective experience of making decisions is an accurate report of how that decision is made. The mind is like a machine – as long as it runs right, we are happily ignorant of how it works. It is only when mistakes or contradictions arise that we’re drawn to look under the hood: Why didn’t I notice that exit? How could I forget that person’s name? Why does the feeling of deciding come after the brain changes associated with decision making?

There’s no reason to think that we are reliable reporters of every aspect of our minds. Psychology, in fact, gives us lots of examples of where we often get things wrong. The feeling of deciding in the Libet experiment may be a complete illusion – maybe the real decision really is made ‘by our brains’ somehow – or maybe it is just that the feeling of deciding is delayed from our actual deciding. Just because we erroneously report the timing of the decision, doesn’t mean we weren’t intimately involved in it, in whatever meaningful sense that can be.

More is written about the Libet experiment every year. It has spawned an academic industry investigating the neuroscience of free will. There are many criticisms and rebuttals, with debate raging about how and if the experiment is relevant to the freedom of our everyday choices. Even supporters of Libet have to admit that the situation used in the experiment may be too artificial to be a direct model of real everyday choices. But the basic experiment continues to inspire discussion and provoke new thoughts about the way our freedom is rooted in our brains. And that, I’d argue, is due to the way it helps us confront our intuitions about the way the mind works, and to see that things are more complex than we instinctively imagine.

This is my latest column for BBC Future. The original is here. You may also enjoy this recent post on mindhacks.com Critical strategies for free will experiments

Critical strategies for free will experiments

waveBenjamin Libet’s experiment on the neuroscience of free will needs little introduction. (If you do need an introduction, it’s the topic of my latest column for BBC Future). His reports that the subjective feeling of making a choice only come after the brain signals indicating a choice has been made are famous, and have produced controversy ever since they were published in the 1980s.

For a simple experiment, Libet’s paradigm admits to a large number of interpretations, which I think is an important lesson. Here are some common, and less common, critiques of the experiment:

The Disconnect Criticism

The choice required from Libet’s participants was trivial and inconsequential. Moreover, they were specifically told to make the choice without any reason (“let the urge [to move] appear on its own at any time without any pre-planning or concentration on when to act”). A common criticism is that this kind of choice has little to tell us about everyday choices which are considered, consequential or which are actively trying to involve ourselves in.

The timing criticism(s)

Dennett discusses how the original interpretation of the experiment assumes that the choosing self exists at a particular point and at particular time – so, for example, maybe in some central ‘Cartesian Theatre’ in which information from motor cortex and visual cortex come together, but crucially, does not have direct report of (say) the information about timing gathered by the visual cortex. Even in a freely choosing self, there will be timing delays as information on the clock time is ‘connected up’ with information on when the movement decision was made. These, Dennett argues, could produce the result Libet saw without indicating a fatal compromise for free choice.

My spin on this is that the Libet result shows, minimally, that we don’t accurately know the timing of our decisions, but inaccurate judgements about the timing of decisions doesn’t mean that we don’t actually make the decisions themselves that are consequential.

Spontaneous activity

Aaron Schurger and colleagues have a nice paper in which they argue that Libet’s results can be explained by variations in spontaneous activity before actions are taken. They argue that the movement system is constantly experiencing sub-threshold variation in activity, so that at any particular point in time you are more or less close to performing any particular act. Participants in the Libet paradigm, asked to make a spontaneous act, take advantage of this variability – effectively lowering their threshold for action and waiting until the covert fluctuations are large enough to trigger a movement. Importantly, this reading weakens the link between the ‘onset’ of movements and the delayed subjective experience of making a movement. If the movement is triggered by random fluctuations (observable in the rise of the electrode signal) then there isn’t a distinct ‘decision to act’ in the motor system, so we can’t say that the subjective decision to act reliably comes afterwards.

The ‘only deterministic on average’ criticism

The specific electrode signal which is used to time the decision to move in the brain is called the readiness potential (RP). Electrode readings are highly variable, so the onset of the RP is a statistical artefact, produced by averaging over many trials (40 in Libet’s case). This means we lose the ability to detect, trial-by-trial, the relation between the brain activity related to movement and the subjective experience. Libet reports this in his original paper [1] (‘only the average RP for the whole series could be meaningfully recorded’, p634). On occasion the subjective decision time (which Libet calls W) comes before the time of even the average RP, not after (p635: “instances in which individual W time preceded onset time of averaged RP numbered zero in 26 series [out of 36] – which means that 28% of series saw at least one instance of W occurring before the RP).

The experiment showed strong reliability, but not complete reliability (the difference is described by Libet as ‘generally’ occurring and as being ‘fairly consistent’, p636). What happened next to Libet’s result is a common trick of psychologists. A statistical pattern is discovered and then reality is described as if the pattern is the complete description: “The brain change occurs before the choice”.

Although such generalities are very useful, they are misleading if we forget that they are only averagely true, not always true. I don’t think Libet’s experiment would have the imaginative hold if the result was summarised as “The brain change usually occurs before the choice”.

A consistent, but not universal, pattern in the brain before a choice has the flavour of a prediction, rather than a compulsion. Sure, before we make a choice there are antecedents in the brain – it would be weird if there weren’t – but since these don’t have any necessary consequence for what we choose, so what?

To my mind the demonstration that you can use fMRI to reproduce the Libet effect but with brain signals changing up to 10 seconds before the movement (and an above chance accuracy at predicting the movement made), only reinforces this point. We all believe that the mind has something to do with the brain, so finding patterns in the brain at one point which predict actions in the mind at a later point isn’t surprising. The fMRI result, and perhaps Libet’s experiment, rely as much on our false intuition about dualism as conclusively demonstrating anything new about freewill.

Link: my column Why do we intuitively believe we have free will?

Laughter as a window on the infant mind

What makes a baby laugh? The answer might reveal a lot about the making of our minds, says Tom Stafford.

What makes babies laugh? It sounds like one of the most fun questions a researcher could investigate, but there’s a serious scientific reason why Caspar Addyman wants to find out.

He’s not the first to ask this question. Darwin studied laughter in his infant son, and Freud formed a theory that our tendency to laugh originates in a sense of superiority. So we take pleasure at seeing another’s suffering – slapstick style pratfalls and accidents being good examples – because it isn’t us.

The great psychologist of human development, Jean Piaget, thought that babies’ laughter could be used to see into their minds. If you laugh, you must ‘get the joke’ to some degree – a good joke is balanced in between being completely unexpected and confusing and being predictable and boring. Studying when babies laugh might therefore be a great way of gaining insight into how they understand the world, he reasoned. But although he proposed this in the 1940s, this idea remains to be properly tested. Despite the fact that some very famous investigators have studied the topic, it has been neglected by modern psychology.

Addyman, of Birkbeck, University of London, is out to change that. He believes we can use laughter to get at exactly how infants understand the world. He’s completed the world’s largest and most comprehensive survey of what makes babies laugh, presenting his initial results at the International Conference on Infant Studies, Berlin, last year. Via his website he surveyed more than 1000 parents from around the world, asking them questions about when, where and why their babies laugh.The results are – like the research topic – heart-warming. A baby’s first smile comes at about six weeks, their first laugh at about three and a half months (although some took three times as long to laugh, so don’t worry if your baby hasn’t cracked its first cackle just yet). Peekaboo is a sure-fire favourite for making babies laugh (for a variety of reasons I’ve written about here), but tickling is the single most reported reason that babies laugh.

Importantly, from the very first chuckle, the survey responses show that babies are laughing with other people, and at what they do. The mere physical sensation of something being ticklish isn’t enough. Nor is it enough to see something disappear or appear suddenly. It’s only funny when an adult makes these things happen for the baby. This shows that way before babies walk, or talk, they – and their laughter – are social. If you tickle a baby they apparently laugh because you are tickling them, not just because they are being tickled.

What’s more, babies don’t tend to laugh at people falling over. They are far more likely to laugh when they fall over, rather than someone else, or when other people are happy, rather than when they are sad or unpleasantly surprised. From these results, Freud’s theory (which, in any case, was developed based on clinical interviews with adults, rather than any rigorous formal study of actual children) – looks dead wrong.

Although parents report that boy babies laugh slightly more than girl babies, both genders find mummy and daddy equally funny.

Addyman continues to collect data, and hopes that as the results become clearer he’ll be able to use his analysis to show how laughter tracks babies’ developing understanding of the world – how surprise gives way to anticipation, for example, as their ability to remember objects comes online.

Despite the scientific potential, baby laughter is, as a research topic, “strangely neglected”, according to Addyman. Part of the reason is the difficulty of making babies laugh reliably in the lab, although he plans to tackle this in the next phase of the project. But partly the topic has been neglected, he says, because it isn’t viewed as a subject for ‘proper’ science to look into. This is a prejudice Addyman hopes to overturn – for him, the study of laughter is certainly no joke.

This is my BBC Future column from Tuesday. The original is here. If you are a parent you can contribute to the science of how babies develop at Dr Addyman’s babylaughter.net (specialising in laughter) or at babylovesscience.com (which covers humour as well as other topics).

Are online experiment participants paying attention?

factoryOnline testing is sure to play a large part in the future of Psychology. Using Mechanical Turk or other crowdsourcing sites for research, psychologists can quickly and easily gather data for any study where the responses can be provided online. One concern, however, is that online samples may be less motivated to pay attention to the tasks they are participating in. Not only is nobody watching how they do these online experiments, they whole experience is framed as a work-for-cash gig, so there is pressure to complete any activity as quickly and with as low effort as possible. To the extent that the online participants are satisficing or skimping on their attention, can we trust the data?

A newly submitted paper uses data from the Many Labs 3 project, which recruited over 3000 participants from both online and University campus samples, to test the idea that online samples are different from the traditional offline samples used by academic psychologists:

The findings strike a note of optimism, if you’re into online testing (perhaps less so if you use traditional university samples):

Mechanical Turk workers report paying more attention and exerting more effort than undergraduate students. Mechanical Turk workers were also more likely to pass an instructional manipulation check than undergraduate students. Based on these results, it appears that concerns over participant inattentiveness may be more applicable to samples recruited from traditional university participant pools than from Mechanical Turk

This fits with previous reports showing high consistency when classic effects are tested online, and with reports that satisficing may have been very high in offline samples, we just weren’t testing for it.

However, an issue I haven’t seen discussed is whether, because of the relatively small pool of participants taking experiments on MTurk, online participants have an opportunity to get familiar with typical instructional manipulation checks (AKA ‘catch questions’, which are designed to check if you are paying attention). If online participants adapt to our manipulation checks, then the very experiments which set out to test if they are paying more attention may not be reliable.

Link: new paperGraduating from Undergrads: Are Mechanical Turk Workers More Attentive than Undergraduate Participants?

This paper provides a useful overview: Conducting perception research over the internet: a tutorial review

Conspiracy theory as character flaw

NatureBrainPhilosophy professor Quassim Cassam has a piece in Aeon arguing that conspiracy theorists should be understood in terms of the intellectual vices. It is a dead-end, he says, to try to understand the reasons someone gives for believing a conspiracy theory. Consider someone called Oliver who believes that 9/11 was an inside job:

Usually, when philosophers try to explain why someone believes things (weird or otherwise), they focus on that person’s reasons rather than their character traits. On this view, the way to explain why Oliver believes that 9/11 was an inside job is to identify his reasons for believing this, and the person who is in the best position to tell you his reasons is Oliver. When you explain Oliver’s belief by giving his reasons, you are giving a ‘rationalising explanation’ of his belief.

The problem with this is that rationalising explanations take you only so far. If you ask Oliver why he believes 9/11 was an inside job he will, of course, be only too pleased to give you his reasons: it had to be an inside job, he insists, because aircraft impacts couldn’t have brought down the towers. He is wrong about that, but at any rate that’s his story and he is sticking to it. What he has done, in effect, is to explain one of his questionable beliefs by reference to another no less questionable belief.

So the problem is not their beliefs as such, but why the person came to have the whole set of (misguided) beliefs in the first place. The way to understand conspiracists is in terms of their intellectual character, Cassam argues, the vices and virtues that guide as us as thinking beings.

A problem with this account is that – looking at the current evidence – character flaws don’t seem that strong a predictor of conspiracist beliefs. The contrast is with the factors that have demonstrable influence on people’s unusual beliefs. For example, we know that social influence and common cognitive biases have a large, and measurable, effect on what we believe. The evidence isn’t so good on how intellectual character traits such as closed/open-mindedness, skepticism/gullibility are constituted and might affect conspiracist beliefs. That could be because the personality/character trait approach is inherently limited, or just that there is more work to do. One thing is certain, whatever the intellectual vices are that lead to conspiracy theory beliefs, they are not uncommon. One study suggested that 50% of the public endorse at least one conspiracy theory.

Link : Bad Thinkers by Quassim Cassam

Paper on personality and conspiracy theories: Unanswered questions: A preliminary investigation of personality and individual difference predictors of 9/11 conspiracist beliefs

Paper on widespread endorsement of conspiracy theories: Conspiracy Theories and the Paranoid Style(s) of Mass Opinion

Previously on Mindhacks.com That’s what they want you to believe

And a side note, this view that the problem with conspiracy theorists isn’t the beliefs helps explain why throwing facts at them doesn’t help, better to highlight the fallacies in how they are thinking.

For argument’s sake

ebook cover
I have (self) published an ebook For argument’s sake: evidence that reason can change minds. It is the collection of two essays that were originally published on Contributoria and The Conversation. I have revised and expanded these, and added a guide to further reading on the topic. There are bespoke illustrations inspired by Goya (of owls), and I’ve added an introduction about why I think psychologists and journalists both love stories that we’re irrational creatures incapable of responding to reasoned argument. Here’s something from the book description:

Are we irrational creatures, swayed by emotion and entrenched biases? Modern psychology and neuroscience are often reported as showing that we can’t overcome our prejudices and selfish motivations. Challenging this view, cognitive scientist Tom Stafford looks at the actual evidence. Re-analysing classic experiments on persuasion, as well as summarising more recent research into how arguments change minds, he shows why persuasion by reason alone can be a powerful force.

All in, it’s close to 7000 words and available from Amazon now