Serendipity in psychological research

micDorothy Bishop has an excellent post ‘Ten serendipitous findings in psychology’, in which she lists ten celebrated discoveries which occurred by happy accident.

Each discovery is interesting in itself, but Prof Bishop puts the discoveries in the context of the recent discussion about preregistration (declaring in advance what you are looking for and how you’ll look). Does preregistration hinder serendipity? Absolutely not says Bishop, not least because the context of ‘discovery’ is never a one-off experiment.

Note that, in all cases, having made the initial unexpected observation – either from unstructured exploratory research, or in the course of investigating something else – the researchers went on to shore up the findings with further, hypothesis-driven experiments. What they did not do is to report just the initial observation, embellished with statistics, and then move on, as if the presence of a low p-value guaranteed the truth of the result.

(It’s hard not to read into these comments a criticism of some academic journals which seem happy to publish single experiments reporting surprising findings.)

Bishop’s list contains 3 findings from electrophysiology (recording brain cell activity directly with electrodes), which I think is notable. In these cases neural recording acts in the place of a microscope, allowing fairly direct observation of the system the scientist is investigating at a level of detail hitherto unavailable. It isn’t surprising to me that given a new tool of observation, the prepared mind of the scientists will make serendipitous discoveries. The catch is whether, for the rest of psychology, such observational tools exist. Many psychologists use their intuition to decide where to look, and experiments to test whether their intuition is correct. The important serendipitous discoveries from electrophysiology suggest that measures which are new ways of observing, rather than merely tests of ideas, must also be important for psychological discoveries. Do such observational measures exist?

Good tests make children fail – here’s why

Many parents and teachers are critical of the Standardised Assessment Tests (SATs) that have recently been taken by primary school children. One common complaint is that they are too hard. Teachers at my son’s school sent children home with example questions to quiz their parents on, hoping to show that getting full marks is next to impossible.

Invariably, when parents try out these tests, they focus on the most difficult or confusing items. Some parents and teachers can be heard complaining on social media that if they get questions wrong, surely the tests are too hard for ten-year-olds.

But how hard should tests for children be?

As a psychologist, I know we have some well-developed principles that can help us address the question. If we look at the SATs as measures of some kind of underlying ability, then we can turn to one of the oldest branches of psychology – “psychometrics” – for some guidance.

Getting it just right

A good test shouldn’t be too hard. If most people get most questions wrong, then you have what is called a “floor effect”. The result is that you can’t tell any difference in ability between the people taking the test.

If we started the school sports day high jump with the bar at two metres high (close to the world record), then we’d finish sports day with everybody getting the same – zero successful jumps – and no information about how good anyone is at the high jump.

But at the same time, a good test shouldn’t be too easy. If most people get everything right, then the effect is, as you might expected, called a “ceiling effect”. If everybody gets everything right then again we don’t get any information from the test.

The key idea is that tests must discriminate. In psychometric terms, the value of a test is about the match between the thing it is supposed to measure and the difficulty of the items on the test. If I wanted to gauge maths ability in six-year-olds and I gave them all an A-Level paper, we can presume that nearly everyone would score zero. Although the A-Level paper might be a good test, it is completely uninformative if it is badly matched to the ability of the people taking the test.

Here’s the rub: for a test to be sensitive to differences in ability, it must contain items which people get wrong. Actually, there’s a precise answer to the proportion that you should get wrong – in the most sensitive test it should be half of the items. Questions which you are 50% likely to get right are the ones which are most informative.

How we feel about measuring and labelling children according to their skill at taking these tests is a big issue, but it is important that we recognise that this is what tests do. A well designed test will make all children get some items wrong – it is inherent in their design. It is up to us how we conceptualise that: whether tests are an unnecessary distraction from true education, or a necessary challenge we all need to be exposed to.

Better tests?

If you adopt this psychometric perspective, it becomes clear that the tests we use are an inefficient way of measuring any individual child’s particular ability to do the test. Most children will be asked a bunch of questions which are too easy for them, before they get to the informative ones which are at the edge of their ability. Then they will go on to attempt a bunch of questions which are far too hard. And pity the people for who the test is poorly matched to their ability and consists mostly of questions they’ll get wrong – which is both uninformative in psychometric terms, and dispiriting emotionally.

A hundred years ago, when we began our modern fixation with testing and measuring, it was hard to avoid the waste where many uninformative and potentially depressing questions were asked. This was simply because all children had to take the same exam paper.

Nowadays, however, examiners can administer tests via computer, and algorithmically identify the most informative questions for each child’s ability – making the tests shorter, more accurate, and less focused on the experience of failure. You could throw in enough easy questions that no child would ever have the experience of getting most of the questions wrong. But still there’s no getting around the fact that an informative test has to contain questions most people sitting it will get wrong.

Even a good test can measure an educationally irrelevant ability (such as merely the ability to do the test, or memorise abstract grammar rules), or be used in ways that harm children. But having difficult items isn’t a problem with the SATs, it’s a problem with all tests.

The Conversation

This article was originally published on The Conversation. Read the original article.

The search for the terrorist ‘type’

BBC World Service has an excellent radio documentary on the history and practice of terrorist profiling.

Unlike many pieces on the psychology of terrorism, which tend to take a Hollywood view of the problem, it’s an insightful, critical and genuinely enlightening piece on the false promises and possibilities of applied psychology in the service of stopping terrorists.

Crucially, it looks at how the practice developed over time and how it’s been affected by the ‘war on terror’.

For decades researchers, academics and psychologists have wanted to know what kind of person becomes a terrorist. If there are pre-existing traits which make someone more likely to kill for their beliefs – well, that would be worth knowing… It’s a story which begins decades ago. But, with the threat from killers acting for so-called Islamic State, finding an answer has never felt more pressing.

Link to programme webpage, streaming and mp3.

A brief hallucinatory twilight

CC Licensed Photo by Flickr user Risto Kuulasmaa. Click for source.I’ve got an article in The Atlantic on the hypnagogic state – the brief hallucinatory period between wakefulness and sleep – and how it is being increasingly used as a tool to make sense of consciousness.

There is a brief time, between waking and sleep, when reality begins to warp. Rigid conscious thought starts to dissolve into the gently lapping waves of early stage dreaming and the world becomes a little more hallucinatory, your thoughts a little more untethered. Known as the hypnagogic state, it has received only erratic attention from researchers over the years, but a recent series of studies have renewed interest in this twilight period, with the hope it can reveal something fundamental about consciousness itself.

The hypnagogic state has been better dealt with by artists and writers over the years – Colderidge’s poem Kubla Khan apparently emerged out of hypnagogic reverie – albeit fuelled by opium

It has received only occasional attention from scientists, however. More recently, a spate of studies has come out showing some genuine mainstream interest in understanding hypnagogia as an interesting source of information about how consciousness is deconstructed as we enter sleep.


Link to article in The Atlantic on the hypnagogic state.

Genetics is rarely just about genes

If you want a crystal clear introduction to the role genetics can play in human nature, you can’t do much better than an article in The Guardian’s Sifting the Evidence blog by epidemiologist Marcus Munafo.

It’s been giving a slightly distracting title – but ignore that – and just read the main text.

Are we shaped more by our genes or our environment – the age-old question of nature and nurture? This is really a false dichotomy; few, if any, scientists working in the area of human behaviour would adhere to either an extreme nature or extreme nurture position. But what do we mean when we say that our behaviours are influenced by genetic factors? And how do we know?

It will be one of the most useful 20 minutes you’ll spend this week.

Link to excellent introduction to genetics and human behaviour.

3 salvoes in the reproducibility crisis

cannonThe reproducibility crisis in Psychology rumbles on. For the uninitiated, this is the general brouhaha we’re having over how reliable published psychological research is. I wrote a piece on this in 2013, which now sounds a little complacent, and unnecessarily focussed on just one area of psychology, given the extent of the problems since uncovered in the way research is manufactured (or maybe not, see below). Anyway, in the last week or so there have been three interesting developments


Michael Inzlicht blogged his ruminations on the state of the field of social psychology, and they’re not rosy : “We erred, and we erred badly“, he writes. It is a profound testament to the depth of the current concerns about the reliability of psychology when such a senior scientist begins to doubt the reality of some of the phenomenon upon which he has built his career investigating.

As someone who has been doing research for nearly twenty years, I now can’t help but wonder if the topics I chose to study are in fact real and robust. Have I been chasing puffs of smoke for all these years?

Don’t panic!

But not everyone is worried. A team of Harvard A-listers, including Timothy Wilson and Daniel Gilbert, have released press release announcing a commentary on the “Reproducibility Project: Psychology”. This was an attempt to estimate the reliability of a large sample of phenomena from the psychology literature (Short introduction in Nature here). The paper from this project was picked as one of the most important of 2015 by the journal Science.

There project is a huge effort, which is open to multiple interpretations. The Harvard team’s press release is headlined “No evidence of a replicability crisis in psychological science” and claimed “reproducibility of psychological science is indistinguishable from 100%”, as well as calling from the project to put effort into repairing the damage done to the reputation of psychological research. I’d link to the press release, but it looks like between me learning of it yesterday and coming to write about it today this material has been pulled from the internet. The commentary announced was due to be released on March the 4th, so we wait with baited breath for the good news about why we don’t need to worry about the reliability of psychology research. Come on boys, we need some good news.

UPDATE 3rd March: The website is back! No Evidence for a Replicability Crisis in Psychological Science. Commentary here, and response

…But whatever you do, optimally weight evidence

Speaking of the Reproducibility Project, Alexander Etz produced a great Bayesian reanalysis of the data from that project (possible because it is all open access, via the Open Science Framework). This take on the project is a great example of how open science allows people to more easily build on your results, as well as being a vital complement to the original report – not least because it stops you naively accepting any simple statistical report of the what the reproducibility project ‘means’ (e.g. “30% of studies do not replicate” etc). Etz and Joachim Vandekerckhove have now upgraded the analysis to a paper, which is available (open access, natch) in PLoS One : “A Bayesian Perspective on the Reproducibility Project: Psychology“. And their interpretation of the reliability of psychology, as informed by the reproducibility project?

Overall, 75% of studies gave qualitatively similar results in terms of the amount of evidence provided. However, the evidence was often weak …The majority of the studies (64%) did not provide strong evidence for either the null or the alternative hypothesis in either the original or the replication…We conclude that the apparent failure of the Reproducibility Project to replicate many target effects can be adequately explained by overestimation of effect sizes (or overestimation of evidence against the null hypothesis) due to small sample sizes and publication bias in the psychological literature

Psychotherapies and the space between us

Public domain image from pixabay. Click for source.There’s an in-depth article at The Guardian revisiting an old debate about cognitive behavioural therapy (CBT) versus psychoanalysis that falls into the trap of asking some rather clichéd questions.

For those not familiar with the world of psychotherapy, CBT is a time-limited treatment based on understanding how interpretations, behaviour and emotions become unhelpfully connected to maintain psychological problems while psychoanalysis is a Freudian psychotherapy based on the exploration and interpretation of unhelpful processes in the unconscious mind that remain from unresolved conflicts in earlier life.

I won’t go into the comparisons the article makes about the evidence for CBT vs psychoanalysis except to say that in comparing the impact of treatments, both the amount and quality of evidence are key. Like when comparing teams using football matches, pointing to individual ‘wins’ will tell us little. In terms of randomised controlled trials or RCTs, psychoanalysis has simply played far fewer matches at the highest level of competition.

But the treatments are often compared due to them aiming to treat some of the same problems. However, the comparison is usually unhelpfully shallow.

Here’s how the cliché goes: CBT is evidence-based but superficial, the scientific method applied for a quick fix that promises happiness but brings only light relief. The flip-side of this cliché says that psychoanalysis is based on apprenticeship and practice, handed down through generations. It lacks a scientific seal of approval but examines the root of life’s struggles through a form of deep artisanal self-examination.

Pitching these two clichés against each other, and suggesting the ‘old style craftsmanship is now being recognised as superior’ is one of the great tropes in mental health – and, as it happens, 21st Century consumerism – and there is more than a touch of marketing about this debate.

Which do you think is portrayed as commercial, mass produced, and popular, and which is expensive, individually tailored, and only available to an exclusive clientèle? Even mental health has its luxury goods.

But more widely discussed (or perhaps, admitted to) are the differing models of the mind that each therapy is based on. But even here simple comparisons fall flat because many of the concepts don’t easily translate.

One of the central tropes is that psychoanalysis deals with the ‘root’ of the psychological problem while CBT only deals with its surface effects. The problem with this contrast is that psychoanalysis can only be seen to deal with the ‘root of the problem’ if you buy into to the psychoanalytic view of where problems are rooted.

Is your social anxiety caused by the projection of unacceptable feelings of hatred based in unresolved conflicts from your earliest childhood relationships – as psychoanalysis might claim? Or is your social anxiety caused by the continuation of a normal fear response to a difficult situation that has been maintained due to maladaptive coping – as CBT might posit?

These views of the internal world, are, in many ways, the non-overlapping magisteria of psychology.

Another common claim is that psychoanalysis assumes an unconscious whereas CBT does not. This assertion collapses on simple examination but the models of the unconscious are so radically different that it is hard to see how they easily translate.

Psychoanalysis suggests that the unconscious can be understood in terms of objects, drives, conflicts and defence mechanisms that, despite being masked in symbolism, can ultimately be understood at the level of personal meaning. In contrast, CBT draws on its endowment from cognitive psychology and claims that the unconscious can often only be understood at the sub-personal level because meaning as we would understand it consciously is unevenly distributed across actions, reactions and interpretations rather than being embedded within them.

But despite this, there are also some areas of shared common ground that most critics miss. CBT equally cites deep structures of meaning acquired through early experience that lie below the surface to influence conscious experience – but calls them core beliefs or schemas – rather than complexes.

Perhaps the most annoying aspect of the CBT vs psychoanalysis debate is it tends to ask ‘which is best’ in a general and over-vague manner rather than examining the strengths and weaknesses of each approach for specific problems.

For example, one of the central areas that psychoanalysis excels at is in conceptualising the therapeutic relationship as being a dynamic interplay between the perception and emotions of therapist and patient – something that can be a source of insight and change in itself.

Notably, this is the core aspect that’s maintained in its less purist and, quite frankly, more sensible version, psychodynamic psychotherapy.

CBT’s approach to the therapeutic relationship is essentially ‘be friendly and aim for cooperation’ – the civil service model of psychotherapy if you will – which works wonderfully except for people whose central problem is itself cooperation and the management of personal interactions.

It’s no accident that most extensions of CBT (schema therapy, DBT and so on) add value by paying additional attention to the therapeutic relationship as a tool for change for people with complex interpersonal difficulties.

Because each therapy assumes a slightly different model of the mind, it’s easy to think that they are somehow battling over the ‘what it means to be human’ and this is where the dramatic tension from most of these debates comes from.

Mostly though, models of the mind are just maps that help us get places. All are necessarily stylised in some way to accentuate different aspects of human nature. As long as they sufficiently reflect the territory, this highlighting helps us focus on what we most need to change.