Serendipity in psychological research

micDorothy Bishop has an excellent post ‘Ten serendipitous findings in psychology’, in which she lists ten celebrated discoveries which occurred by happy accident.

Each discovery is interesting in itself, but Prof Bishop puts the discoveries in the context of the recent discussion about preregistration (declaring in advance what you are looking for and how you’ll look). Does preregistration hinder serendipity? Absolutely not says Bishop, not least because the context of ‘discovery’ is never a one-off experiment.

Note that, in all cases, having made the initial unexpected observation – either from unstructured exploratory research, or in the course of investigating something else – the researchers went on to shore up the findings with further, hypothesis-driven experiments. What they did not do is to report just the initial observation, embellished with statistics, and then move on, as if the presence of a low p-value guaranteed the truth of the result.

(It’s hard not to read into these comments a criticism of some academic journals which seem happy to publish single experiments reporting surprising findings.)

Bishop’s list contains 3 findings from electrophysiology (recording brain cell activity directly with electrodes), which I think is notable. In these cases neural recording acts in the place of a microscope, allowing fairly direct observation of the system the scientist is investigating at a level of detail hitherto unavailable. It isn’t surprising to me that given a new tool of observation, the prepared mind of the scientists will make serendipitous discoveries. The catch is whether, for the rest of psychology, such observational tools exist. Many psychologists use their intuition to decide where to look, and experiments to test whether their intuition is correct. The important serendipitous discoveries from electrophysiology suggest that measures which are new ways of observing, rather than merely tests of ideas, must also be important for psychological discoveries. Do such observational measures exist?

Good tests make children fail – here’s why

Many parents and teachers are critical of the Standardised Assessment Tests (SATs) that have recently been taken by primary school children. One common complaint is that they are too hard. Teachers at my son’s school sent children home with example questions to quiz their parents on, hoping to show that getting full marks is next to impossible.

Invariably, when parents try out these tests, they focus on the most difficult or confusing items. Some parents and teachers can be heard complaining on social media that if they get questions wrong, surely the tests are too hard for ten-year-olds.

But how hard should tests for children be?

As a psychologist, I know we have some well-developed principles that can help us address the question. If we look at the SATs as measures of some kind of underlying ability, then we can turn to one of the oldest branches of psychology – “psychometrics” – for some guidance.

Getting it just right

A good test shouldn’t be too hard. If most people get most questions wrong, then you have what is called a “floor effect”. The result is that you can’t tell any difference in ability between the people taking the test.

If we started the school sports day high jump with the bar at two metres high (close to the world record), then we’d finish sports day with everybody getting the same – zero successful jumps – and no information about how good anyone is at the high jump.

But at the same time, a good test shouldn’t be too easy. If most people get everything right, then the effect is, as you might expected, called a “ceiling effect”. If everybody gets everything right then again we don’t get any information from the test.

The key idea is that tests must discriminate. In psychometric terms, the value of a test is about the match between the thing it is supposed to measure and the difficulty of the items on the test. If I wanted to gauge maths ability in six-year-olds and I gave them all an A-Level paper, we can presume that nearly everyone would score zero. Although the A-Level paper might be a good test, it is completely uninformative if it is badly matched to the ability of the people taking the test.

Here’s the rub: for a test to be sensitive to differences in ability, it must contain items which people get wrong. Actually, there’s a precise answer to the proportion that you should get wrong – in the most sensitive test it should be half of the items. Questions which you are 50% likely to get right are the ones which are most informative.

How we feel about measuring and labelling children according to their skill at taking these tests is a big issue, but it is important that we recognise that this is what tests do. A well designed test will make all children get some items wrong – it is inherent in their design. It is up to us how we conceptualise that: whether tests are an unnecessary distraction from true education, or a necessary challenge we all need to be exposed to.

Better tests?

If you adopt this psychometric perspective, it becomes clear that the tests we use are an inefficient way of measuring any individual child’s particular ability to do the test. Most children will be asked a bunch of questions which are too easy for them, before they get to the informative ones which are at the edge of their ability. Then they will go on to attempt a bunch of questions which are far too hard. And pity the people for who the test is poorly matched to their ability and consists mostly of questions they’ll get wrong – which is both uninformative in psychometric terms, and dispiriting emotionally.

A hundred years ago, when we began our modern fixation with testing and measuring, it was hard to avoid the waste where many uninformative and potentially depressing questions were asked. This was simply because all children had to take the same exam paper.

Nowadays, however, examiners can administer tests via computer, and algorithmically identify the most informative questions for each child’s ability – making the tests shorter, more accurate, and less focused on the experience of failure. You could throw in enough easy questions that no child would ever have the experience of getting most of the questions wrong. But still there’s no getting around the fact that an informative test has to contain questions most people sitting it will get wrong.

Even a good test can measure an educationally irrelevant ability (such as merely the ability to do the test, or memorise abstract grammar rules), or be used in ways that harm children. But having difficult items isn’t a problem with the SATs, it’s a problem with all tests.

The Conversation

This article was originally published on The Conversation. Read the original article.

information theory and psychology

I have read a good deal more about information theory and psychology than I can or care to remember. Much of it was a mere association of new terms with old and vague ideas. Presumably the hope was that a stirring in of new terms would clarify the old ideas by a sort of sympathetic magic.

From: John R. Piece’s 1961 An introduction to information theory: symbols, signals and noise. Plus ça change.

Pierce’s book is really quite wonderful and contains lots of chatty asides and examples, such as:

Gottlob Burmann, a German poet who lived from 1737 to 1805, wrote 130 poems, including a total of 20,000 words, without once using the letter R. Further, during the last seventeen years of his life, Burmann even omitted the letter from his daily conversation.

The two word games that trick almost everyone

270px-Cowicon.svgPlaying two classic schoolyard games can help us understand everything from sexism to the power of advertising.

There’s a word game we used to play at my school, or a sort of trick, and it works like this. You tell someone they have to answer some questions as quickly as possible, and then you rush at them the following:

“What’s one plus four?!”
“What’s five plus two?!”
“What’s seven take away three?!”
“Name a vegetable?!”

Nine times out of 10 people answer the last question with “Carrot”.

Now I don’t think the magic is in the maths questions. Probably they just warm your respondent up to answering questions rapidly. What is happening is that, for most people, most of the time, in all sorts of circumstances, carrot is simply the first vegetable that comes to mind.

This seemingly banal fact reveals something about how our minds organise information. There are dozens of vegetables, and depending on your love of fresh food you might recognise a good proportion. If you had to list them you’d probably forget a few you know, easily reaching a dozen and then slowing down. And when you’re pressured to name just one as quickly as possible, you forget even more and just reach for the most obvious vegetable you can think of – and often that’s a carrot.

In cognitive science, we say the carrot is “prototypical” – for our idea of a vegetable, it occupies the centre of the web of associations which defines the concept. You can test prototypicality directly by timing how long it takes someone to answer whether the object in question belongs to a particular category. We take longer to answer “yes” if asked “is a penguin a bird?” than if asked “is a robin a bird?”, for instance. Even when we know penguins are birds, the idea of penguins takes longer to connect to the category “bird” than more typical species.

So, something about our experience of school dinners, being told they’ll help us see in the dark, the 37 million tons of carrots the world consumes each year, and cartoon characters from Bugs Bunny to Olaf the Snowman, has helped carrots work their way into our minds as the prime example of a vegetable.

The benefit to this system of mental organisation is that the ideas which are most likely to be associated are also the ones which spring to mind when you need them. If I ask you to imagine a costumed superhero, you know they have a cape, can probably fly and there’s definitely a star-shaped bubble when they punch someone. Prototypes organise our experience of the world, telling us what to expect, whether it is a superhero or a job interview. Life would be impossible without them.

The drawback is that the things which connect together because of familiarity aren’t always the ones which should connect together because of logic. Another game we used to play proves this point. You ask someone to play along again and this time you ask them to say “Milk” 20 times as fast as they can. Then you challenge them to snap-respond to the question “What do cows drink?”. The fun is in seeing how many people answer “milk”. A surprising number do, allowing you to crow “Cows drink water, stupid!”. We drink milk, and the concept is closely connected to the idea of cows, so it is natural to accidentally pull out the answer “milk” when we’re fishing for the first thing that comes to mind in response to the ideas “drink” and “cow”.

Having a mind which supplies ready answers based on association is better than a mind which never supplies ready answers, but it can also produce blunders that are much more damaging than claiming cows drink milk. Every time we assume the doctor is a man and the nurse is woman, we’re falling victim to the ready answers of our mental prototypes of those professions. Such prototypes, however mistaken, may also underlie our readiness to assume a man will be a better CEO, or a philosophy professor won’t be a woman. If you let them guide how the world should be, rather than what it might be, you get into trouble pretty quickly.

Advertisers know the power of prototypes too, of course, which is why so much advertising appears to be style over substance. Their job isn’t to deliver a persuasive message, as such. They don’t want you to actively believe anything about their product being provably fun, tasty or healthy. Instead, they just want fun, taste or health to spring to mind when you think of their product (and the reverse). Worming their way into our mental associations is worth billions of dollars to the advertising industry, and it is based on a principle no more complicated than a childhood game which tries to trick you into saying “carrots”.

This is my BBC Future column from last week. The original is here. And, yes, I know that baby cows actually do drink milk.

The memory trap

CC Licensed Photo by Flickr user greeblie. Click for source.I had a piece in the Guardian on Saturday, ‘The way you’re revising may let you down in exams – and here’s why. In it I talk about a pervasive feature of our memories: that we tend to overestimate how much of a memory is ‘ours’, and how little is actually shared with other people, or the environment (see also the illusion of explanatory depth). This memory trap can combine with our instinct to make things easy for ourselves and result in us thinking we are learning when really we’re just flattering our feeling of familiarity with a topic.

Here’s the start of the piece:

Even the most dedicated study plan can be undone by a failure to understand how human memory works. Only when you’re aware of the trap set for us by overconfidence, can you most effectively deploy the study skills you already know about.
… even the best [study] advice can be useless if you don’t realise why it works. Understanding one fundamental principle of human memory can help you avoid wasting time studying the wrong way.

I go on to give four evidence-based pieces of revision advice, all of which – I hope – use psychology to show that some of our intuitions about how to study can’t be trusted.

Link: The way you’re revising may let you down in exams – and here’s why

Previously at the Guardian by me:

The science of learning: five classic studies

Five secrets to revising that can improve your grades

The Devil’s Wager: when a wrong choice isn’t an error

Devil faceThe Devil looks you in the eyes and offers you a bet. Pick a number and if you successfully guess the total he’ll roll on two dice you get to keep your soul. If any other number comes up, you go to burn in eternal hellfire.

You call “7” and the Devil rolls the dice.

A two and a four, so the total is 6 — that’s bad news.

But let’s not dwell on the incandescent pain of your infinite and inescapable future, let’s think about your choice immediately before the dice were rolled.

Did you make a mistake? Was choosing “7” an error?

In one sense, obviously yes. You should have chosen 6.

But in another important sense you made the right choice. There are more combinations of dice outcomes that add to 7 than to any other number. The chances of winning if you bet 7 are higher than for any other single number.

The distinction is between a particular choice which happens to be wrong, and a choice strategy which is actually as good as you can do in the circumstances. If we replace the Devil’s Wager with the situations the world presents you, and your choice of number with your actions in response, then we have a handle on what psychologists mean when they talk about “cognitive error” or “bias”.

In psychology, the interesting errors are not decisions that just happen to turn out wrong. The interesting errors are decisions which people systematically get wrong, and get wrong in a particular way. As well as being predictable, these errors are interesting because they must be happening for a reason.

If you met a group of people who always bet “6” when gambling with the Devil, you’d be an incurious person if you assumed they were simply idiots. That judgement doesn’t lead anywhere. Instead, you’d want to find out what they believe that makes them think that’s the right choice strategy. Similarly, when psychologists find that people will pay more to keep something than they’d pay to obtain it or are influenced by irrelevant information in the judgements of risk, there’s no profit to labelling this “irrationality” and leaving it at that. The interesting question is why these choices seem common to so many people. What is it about our minds that disposes us to make these same errors, to have in common the same choice strategies?

You can get traction on the shape of possible answers from the Devil’s Wager example. In this scenario, why would you bet “6” rather than “7”? Here are three possible general reasons, and their explanations in the terms of the Devil’s Wager, and also a real example.

 

1. Strategy is optimised for a different environment

If you expected the Devil to role a single loaded die, rather than a fair pair of dice, then calling “6” would be the best strategy, rather than a sub-optimal one.
Analogously, you can understand a psychological bias by understanding which environment is it intended to match. If I love sugary foods so much it makes me fat, part of the explanation may be that my sugar cravings evolved at a point in human history when starvation was a bigger risk than obesity.

 

2. Strategy is designed for a bundle of choices

If you know you’ll only get to pick one number to cover multiple bets, your best strategy is to pick a number which works best over all bets. So if the Devil is going to give you best of ten, and most of the time he’ll roll a single loaded die, and only some times roll two fair dice, then “6” will give you the best total score, even though it is less likely to win for the two-fair-dice wager.

In general, what looks like a poor choice may be the result of strategy which treats a class of decisions as the same, and produces a good answer for that whole set. It is premature to call our decision making irrational if we look at a single choice, which is the focus of the psychologist’s experiment, and not the related set of choice of which it is part.

An example from the literature may be the Mere Exposure Effect, where we favour something we’ve seen before merely because we’ve seen it before. In experiments, this preference looks truly arbitrary, because the experiment decided which stimuli to expose us to and which to withhold, but in everyday life our familiarity with things tracks important variables such as how common, safe or sought out things are. The Mere Exposure Effect may result from a feature of our minds that assumes, all other things being equal, that familiar things are preferable, and that’s probably a good general strategy.

 

3. Strategy uses a different cost/benefit analysis

Obviously, we’re assuming everyone wants to save their soul and avoid damnation. If you felt like you didn’t deserve heaven, harps and angel wings, or that hellfire sounds comfortably warm, then you might avoid making the bet-winning optimal choice.

By extension, we should only call a choice irrational or suboptimal if we know what people are trying to optimise. For example, it looks like people systematically under-explore new ways of doing things when learning skills. Is this reliance on habit, similar to confirmation bias when exploring competing hypotheses, irrational? Well, in the sense that it slows your learning down, it isn’t optimal, but if it exists because exploration carries a risk (you might get the action catastrophically wrong, you might hurt yourself), or that the important thing is to minimise the cost of acting (and habitual movements require less energy), then it may in fact be better than reckless exploration.

 

So if we see a perplexing behaviour, we might reach for one of these explanations to explain it: The behaviour is right for a different environment, a wider set of choices, or a different cost/benefit analysis. Only when we are confident that we understand the environment (either evolutionary, or of training) which drives the behaviour, and the general class of choices of which it is part, and that we know which cost-benefit function the people making the choices are using, should we confidently say a choice is an error. Even then it is pretty unprofitable to call such behaviour irrational – we’d want to know why people make the error. Are they unable to calculate the right response? Mis-perceiving the situation?

A seemingly irrational behaviour is a good place to start investigating the psychology of decision making, but labelling behaviour irrational is a terrible place to stop. The topic really starts to get interesting when we start to ask why particular behaviours exist, and try to understand their rationality.

 

Previously/elsewhere:

Irrational? Decisions and decision making in context
My ebook: For argument’s sake: evidence that reason can change minds, which explores our over-enthusiasm for evidence that we’re irrational.

Irrational? Decisions and decision making in context

IMG_0034Nassim Nicholas Taleb, author of Fooled by Randomness:

Finally put my finger on what is wrong with the common belief in psychological findings that people “irrationally” overestimate tail probabilities, calling it a “bias”. Simply, these experimenters assume that people make a single decision in their lifetime! The entire field of psychology of decisions missed the point.

His argument seems to be that risks seem different if you view them from a lifetime perspective, where you might make choices about the same risk again and again, rather than consider as one-offs. What might be a mistake for a one-off risk could be a sensible strategy for the same risk repeated in a larger set.

He goes on to take a swipe at ‘Nudges’, the idea that you can base policies around various phenomena from the psychology of decision making. “Clearly”, he adds, “psychologists do not know how to use ‘probability'”.

This is maddeningly ignorant, but does have a grain of truth to it. The major part of the psychology of decision making is understanding why things that look like bias or error exist. If a phenomenon, such as overestimating low probability events, is pervasive, it must be for a reason. A choice that looks irrational when considered on its own might be the result of a sensible strategy when considered over a lifetime, or even over evolutionary time.

Some great research in decision making tries to go beyond simple bias phenomenon and ask what underlying choice is being optimised by our cognitive architecture. This approach gives us the Simple Heuristics Which Make Us Smart of Gerd Gigerenzer (which Taleb definitely knows about since he was a visiting fellow in Gigerenzer’s lab), as well as work which shows that people estimate risks differently if they experience the outcomes rather than being told about them, work which shows that our perceptual-motor system (which is often characterised as an optimal decision maker) has the same amount of bias as our more cognitive decisions; and work which shows that other animals, with less cognitive/representational capacity, make analogues of many classic decision making errors. This is where the interesting work in decision making is happening, and it all very much takes account of the wider context of individual decisions. So saying that the entire field missed the point seems…odd.

But the grain of truth the accusation is that the psychology of decision making has been popularised in a way that focusses on one-off decisions. The nudges of behavioural economics tend to be drammatic examples of small interventions which have large effects in one-off measures, such as giving people smaller plates makes them eat less. The problem with these interventions is that even if they work in the lab, they tend not to work long-term outside the lab. People are often doing what they do for a reason – and if you don’t affect the reasons you get the old behaviour reasserting itself as people simply adapt to any nudge you’ve introduced Although the British government is noted for introducing a ‘Nudge Unit‘ to apply behavioural science in government policies, less well known is a House of Lords Science and Technology Committee report ‘Behavioural Change’, which highlights the limitations of this approach (and is well worth reading to get an idea of the the importance of ideas beyond ‘nudging’ in behavioural change).

Taleb is right that we need to drop the idea that biases in decision making automatically attest to our irrationality. As often as not they reflect a deeper rationality in how our minds deal with risk, choice and reward. What’s sad is that he doesn’t recognise how much work on how to better understand bias already exists.