Fifty psychological terms to just, well, be aware of

CC Licensed Photo by Flickr user greeblie. Click for source.Frontiers in Psychology has just published an article on ‘Fifty psychological and psychiatric terms to avoid’. These sorts of “here’s how to talk about” articles are popular but themselves can often be misleading, and the same applies to this one.

The article supposedly contains 50 “inaccurate, misleading, misused, ambiguous, and logically confused words and phrases”.

The first thing to say is that by recommending that people avoid certain words or phrases, the article is violating its own recommendations. That may seem like a trivial point but it isn’t when you’re giving advice about how to use language in scientific discussion.

It’s fine to use even plainly wrong terms to discuss how they’re used, the multiple meanings and misconceptions behind them. In fact, a lot of scientific writing does exactly this. When there are misconceptions that may cloud people’s understanding, it’s best to address them head on rather than avoid them.

Sometimes following the recommendations for ‘phrases to avoid’ would actually hinder this process.

For example, the piece recommends you avoid the term ‘autism epidemic’ as there is no good evidence that there is an actual epidemic. But this is not advice about language, it’s just an empirical point. According to this list, all the research that has used the term, to discuss the actual evidence in contrary to the popular idea, should have avoided the term and presumably referred to it as ‘the concept that shall not be named’.

The article also recommends against using ‘ambiguous’ words but this recommendation would basically kill the English language as many words have multiple meanings – like the word ‘meaning’ for example – but that doesn’t mean you should avoid them.

If you’re a fan of pedantry you may want to go through the article and highlight where the authors have used other ambiguous psychological phrases (starter for 10, “memory”) and post it to some obscure corner of the internet.

Many of the recommendations also rely on you agreeing with the narrow definition and limits of use that the authors premise their argument on. Do you agree that “antidepressant medication” means that the medication has a selective and specific effect on depression and no other conditions – as the authors suggest? Or do you think this just describes a property of the medication? This is exactly how medication description works throughout medicine. Aspirin is an analgesic medication and an anti-inflammatory medication, as well as having other properties. No banning needed here.

And in fact, this sort of naming is just a property of language. If you talk about an ‘off-road vehicle’, and someone pipes up to tell you “actually, off-road vehicles can also go on-road so I recommend you avoid that description” I recommend you ignore them.

The same applies to many of the definitions in this list. The ‘chemical imbalance’ theory of depression is not empirically supported, so don’t claim it is, but feel free to use the phrase if you want to discuss this misconception. Some conditions genuinely do involve a chemical imbalance though – like the accumulation of copper in Wilson’s disease, so you can use the phrase accurately in this case, being aware of how its misused in other contexts. Don’t avoid it, just use it clearly.

With ‘Lie detector test’ no accurate test has ever been devised to detect lies. But you may be writing about research which is trying to develop one or research that has tested the idea. ‘No difference between groups’ is fine if there is genuinely no difference in your measure between the groups (i.e. they both score exactly the same).

Some of the recommendations are essentially based on the premise that you ‘shouldn’t use the term except for how it was first defined or defined where we think is the authoritative source’. This is just daft advice. Terms evolve over time. Definitions shift and change. The article recommends against using ‘Fetish’ except for in its DSM-5 definition, despite the fact this is different to how it’s used commonly and how it’s widely used in other academic literature. ‘Splitting’ is widely used in a form to mean ‘team splitting’ which the article says is ‘wrong’. It isn’t wrong – the term has just evolved.

I think philosophers would be surprised to hear ‘reductionism’ is a term to be avoided – given the massive literature on reductionism. Similarly, sociologists might be a bit baffled by ‘medical model’ being a banned phrase, given the debates over it and, unsurprisingly, its meaning.

Some of the advice is just plain wrong. Don’t use “Prevalence of trait X” says the article because apparently prevalence only applies to things that are either present or absent and “not dimensionally distributed in the population, such as personality traits and intelligence”. Many traits are defined by cut-off scores along dimensionally defined constructs, making them categorical. If you couldn’t talk about the prevalence in this way, we’d be unable to talk about prevalence of intellectual disability (widely defined as involving an IQ of less than 70) or dementia – which is diagnosed by a cut-off score on dimensionally varying neuropsychological test performance.

Some of the recommended terms to avoid are probably best avoided in most contexts (“hard-wired”, “love molecule”) and some are inherently self-contradictory (“Observable symptom”, “Hierarchical stepwise regression”) but again, use them if you want to discuss how they’re used.

I have to say, the piece reminds me of Stephen Pinker’s criticism of ‘language mavens’ who have come up with rules for their particular version of English which they decide others must follow.

To be honest, I think the Frontiers in Psychology article is well-worth reading. It’s a great guide to how some concepts are used in different ways, but it’s not good advice for things to avoid.

The best advice is probably: communicate clearly, bearing in mind that terms and concepts can have multiple meanings and your audience may not be aware of which you want to communicate, so make an effort to clarify where needed.
 

Link to Frontiers in Psychology article.

Are online experiment participants paying attention?

factoryOnline testing is sure to play a large part in the future of Psychology. Using Mechanical Turk or other crowdsourcing sites for research, psychologists can quickly and easily gather data for any study where the responses can be provided online. One concern, however, is that online samples may be less motivated to pay attention to the tasks they are participating in. Not only is nobody watching how they do these online experiments, they whole experience is framed as a work-for-cash gig, so there is pressure to complete any activity as quickly and with as low effort as possible. To the extent that the online participants are satisficing or skimping on their attention, can we trust the data?

A newly submitted paper uses data from the Many Labs 3 project, which recruited over 3000 participants from both online and University campus samples, to test the idea that online samples are different from the traditional offline samples used by academic psychologists:

The findings strike a note of optimism, if you’re into online testing (perhaps less so if you use traditional university samples):

Mechanical Turk workers report paying more attention and exerting more effort than undergraduate students. Mechanical Turk workers were also more likely to pass an instructional manipulation check than undergraduate students. Based on these results, it appears that concerns over participant inattentiveness may be more applicable to samples recruited from traditional university participant pools than from Mechanical Turk

This fits with previous reports showing high consistency when classic effects are tested online, and with reports that satisficing may have been very high in offline samples, we just weren’t testing for it.

However, an issue I haven’t seen discussed is whether, because of the relatively small pool of participants taking experiments on MTurk, online participants have an opportunity to get familiar with typical instructional manipulation checks (AKA ‘catch questions’, which are designed to check if you are paying attention). If online participants adapt to our manipulation checks, then the very experiments which set out to test if they are paying more attention may not be reliable.

Link: new paperGraduating from Undergrads: Are Mechanical Turk Workers More Attentive than Undergraduate Participants?

This paper provides a useful overview: Conducting perception research over the internet: a tutorial review

Computation is a lens

CC Licensed Photo from Flickr user Jared Tarbell. Click for source.“Face It,” says psychologist Gary Marcus in The New York Times, “Your Brain is a Computer”. The op-ed argues for understanding the brain in terms of computation which opens up to the interesting question – what does it mean for a brain to compute?

Marcus makes a clear distinction between thinking that the brain is built along the same lines as modern computer hardware, which is clearly false, while arguing that its purpose is to calculate and compute. “The sooner we can figure out what kind of computer the brain is,” he says, “the better.”

In this line of thinking, the mind is considered to be the brain’s computations at work and should be able to be described in terms of formal mathematics.

The idea that the mind and brain can be described in terms of information processing is the main contention of cognitive science but this raises a key but little asked question – is the brain a computer or is computation just a convenient way of describing its function?

Here’s an example if the distinction isn’t clear. If you throw a stone you can describe its trajectory using calculus. Here we could ask a similar question: is the stone ‘computing’ the answer to a calculus equation that describes its flight, or is calculus just a convenient way of describing its trajectory?

In one sense the stone is ‘computing’. The physical properties of the stone and its interaction with gravity produce the same outcome as the equation. But in another sense, it isn’t, because we don’t really see the stone as inherently ‘computing’ anything.

This may seem like a trivial example but there are in fact a whole series of analog computers that use the physical properties of one system to give the answer to an entirely different problem. If analog computers are ‘really’ computing, why not our stone?

If this is the case, what makes brains any more or less of a computer than flying rocks, chemical reactions, or the path of radio waves? Here the question just dissolves into dust. Brains may be computers but then so is everything, so asking the question doesn’t tell us anything specific about the nature of brains.

One counter-point to this is to say that brains need to algorithmically adjust to a changing environment to aid survival which is why neurons encode properties (such as patterns of light stimulation) in another form (such as neuronal firing) which perhaps makes them a computer in a way that flying stones aren’t.

But this definition would also include plants that also encode physical properties through chemical signalling to allow them to adapt to their environment.

It is worth noting that there are other philosophical objections to the idea that brains are computers, largely based on the the hard problem of consciousness (in brief – could maths ever feel?).

And then there are arguments based on the boundaries of computation. If the brain is a computer based on its physical properties and the blood is part of that system, does the blood also compute? Does the body compute? Does the ecosystem?

Psychologists drawing on the tradition of ecological psychology and JJ Gibson suggest that much of what is thought of as ‘information processing’ is actually done through the evolutionary adaptation of the body to the environment.

So are brains computers? They can be if you want them to be. The concept of computation is a tool. Probably the most useful one we have, but if you say the brain is a computer and nothing else, you may be limiting the way you can understand it.
 

Link to ‘Face It, Your Brain Is a Computer’ in The NYT.

Power analysis of a typical psychology experiment

Understanding statistical power is essential if you want to avoid wasting your time in psychology. The power of an experiment is its sensitivity – the likelihood that, if the effect tested for is real, your experiment will be able to detect it.

Statistical power is determined by the type of statistical test you are doing, the number of people you test and the effect size. The effect size is, in turn, determined by the reliability of the thing you are measuring, and how much it is pushed around by whatever you are manipulating.

Since it is a common test, I’ve been doing a power analysis for a two-sample (two-sided) t-test, for small, medium and large effects (as conventionally defined). The results should worry you.

power_analysis2

This graph shows you how many people you need in each group for your test to have 80% power (a standard desirable level of power – meaning that if your effect is real you’ve an 80% chance of detecting it).

Things to note:

  • even for a large (0.8) effect you need close to 30 people (total n = 60) to have 80% power
  • for a medium effect (0.5) this is more like 70 people (total n = 140)
  • the required sample size increases drammatically as effect size drops
  • for small effects, the sample required for 80% is around 400 in each group (total n = 800).

What this means is that if you don’t have a large effect, studies with between groups analysis and an n of less than 60 aren’t worth running. Even if you are studying a real phenomenon you aren’t using a statistical lens with enough sensitivity to be able to tell. You’ll get to the end and won’t know if the phenomenon you are looking for isn’t real or if you just got unlucky with who you tested.

Implications for anyone planning an experiment:

  • Is your effect very strong? If so, you may rely on a smaller sample (For illustrative purposes the effect size of male-female heigh difference is ~1.7, so large enough to detect with small sample. But if your effect is this obvious, why do you need an experiment?)
  • You really should prefer within-sample analysis, whenever possible (power analysis of this left as an exercise)
  • You can get away with smaller samples if you make your measure more reliable, or if you make your manipulation more impactful. Both of these will increase your effect size, the first by narrowing the variance within each group, the second by increasing the distance between them

Technical note: I did this cribbing code from Rob Kabacoff’s helpful page on power analysis. Code for the graph shown here is here. I use and recommend Rstudio.

Cross-posted from www.tomstafford.staff.shef.ac.uk where I irregularly blog things I think will be useful for undergraduate Psychology students.

Irregularities in Science

Olympus_CH2_microscope_1A paper in the high-profile journal Science has been alleged to be based on fraudulent data, with the PI calling for it to be retracted. The original paper purported to use survey data to show that people being asked about gay marriage changed their attitudes if they were asked the survey questions by someone who was gay themselves. That may still be true, but the work of a team that set out to replicate the original study seems to show that the data reported in that paper was never collected in the way reported, and at least partly fabricated.

The document containing these accusations is interesting for a number of reasons. It contains a detailed timeline showing how the authors were originally impressed with study and set out to replicate it, gradually uncovering more and more elements that concerned them and let them to investigate how the original data was generated. The document also reports the exemplary way in which they shared their concerns with the authors of the original paper, and the way the senior author responded. The speed of all this is notable – the investigators only started work on this paper in January, and did most of the analysis substantiating their concerns this month.

As we examined the study’s data in planning our own studies, two features surprised us: voters’ survey responses exhibit much higher test-retest reliabilities than we have observed in any other panel survey data, and the response and reinterview rates of the panel survey were significantly higher than we expected. We set aside our doubts about the study and awaited the launch of our pilot extension to see if we could manage the same parameters. LaCour and Green were both responsive to requests for advice about design details when queried.

So on the one hand this is a triumph for open science, and self-correction in scholarship. The irony being that any dishonesty that led to publication in a high-impact journal, also attracted people with the desire and smarts to check if what was reported holds up. But the tragedy is the circumstances that led the junior author of the original study, himself a graduate student at the time, to do what he did. No statement from him is available at this point, as far as I’m aware.

The original: When contact changes minds: An experiment on transmission of support for gay equality

The accusations and retraction request: Irregularities in LaCour (2014)

Sampling error’s more dangerous friend

CROSSAs the UK election results roll in, one of the big shocks is the discrepancy between the pre-election polls and the results. All the pollsters agreed that it would be incredibly close, and they were all wrong. What gives?

Some essential psych 101 concepts come in useful here. Polls rely on sampling – the basic idea being that you don’t have to ask everyone to get a rough idea of how things are going to go. How rough that idea is depends on how many you ask. This is the issue of sampling error. We understand sampling error – you can estimate it, so as well as reducing this error by taking larger samples there are also principled ways of working out when you’ve asked enough people to get a reliable estimate (which is why polls of a country with a population of 70 million can still be accurate with samples in the thousands).

But, as Tim Harford points out in in this excellent article on sampling problems big data, with every sample there are two sources of unreliability. Sampling error, as I’ve mentioned, but also sampling bias.

sampling error has a far more dangerous friend: sampling bias. Sampling error is when a randomly chosen sample doesn’t reflect the underlying population purely by chance; sampling bias is when the sample isn’t randomly chosen at all.

The problem with sample bias is that, when you don’t know the ground truth, there is no principled way of knowing if your sample is biased. If your sample has some systematic bias in it, you can make a reliable estimate (minimising sample error), but you are still left with the sample bias – a bias you don’t know how big it is until you find out the truth. That’s my guess at what happened with the UK election. The polls converged, minimising the error, but the bias remained – a ‘shy tory‘ effect where many voters were not admitting (or not aware) that they would end up voting for the Conservative party.

The exit polls predicted the real result with surprising accuracy not because they minimised sampling error, but because they avoided the sample bias. By asking the people who actually turned up to vote how they actually voted, their sample lacked the bias of the pre-election polls.

Trauma is more complex than we think

I’ve got an article in The Observer about how the official definition of trauma keeps changing and how the concept is discussed as if it were entirely intuitive and clear-cut, when it’s actually much more complex.

I’ve become fascinated by how the concept of ‘trauma’ is used in public debate about mental health and the tension that arises between the clinical and rhetorical meanings of trauma.

One unresolved issue, which tests mental health professionals to this day, is whether ‘traumatic’ should be defined in terms of events or reactions.

Some of the confusion arises when we talk about “being traumatised”. Let’s take a typically horrifying experience – being caught in a war zone as a civilian. This is often described as a traumatic experience, but we know that most people who experience the horrors of war won’t develop post-traumatic stress disorder or PTSD – the diagnosis designed to capture the modern meaning of trauma. Despite the fact that these sorts of awful experiences increase the chances of acquiring a range of mental health problems – depression is actually a more common outcome than PTSD – it is still the case that most people won’t develop them. Have you experienced trauma if you have no recognisable “scar in the psyche”? This is where the concept starts to become fuzzy.

We have the official diagnosis of posttraumatic stress disorder or PTSD but actually lots of mental health problems can appear after awful events, and yet there is no ‘posttraumatic depression’ or ‘posttraumatic social phobia’ diagnoses.

To be clear, it’s not that trauma doesn’t exist but that it’s less fully developed as a concept than people think and, as a result, often over-simplified during debates.

Full article at the link below.
 

Link to Observer article on the shifting sands of trauma.