After the methods crisis, the theory crisis

This thread started by Ekaterina Damer has prompted many recommendations from psychologists on twitter.

Here are most of the recommendations, with their recommender in brackets. I haven’t read these, but wanted to collate them in one place. Comments are open if you have your own suggestions.

(Iris van Rooij)
“How does it work?” vs. “What are the laws?” Two conceptions of psychological explanation. Robert Cummins

(Ed Orehek)
Theory Construction in Social Personality Psychology: Personal Experiences and Lessons Learned: A Special Issue of Personality and Social Psychology Review

(Djouria Ghilani)
Personal Reflections on Theory and Psychology
Gerd Gigerenzer,

Selected Works of Barry N. Markovsky

(pretty much everyone, but Tal Yarkoni put it like this)
“Meehl said most of what there is to say about this”

  • Theory-testing in psychology and physics: A methodological paradox
  • Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it
  • Why summaries of research on psychological theories are often uninterpretable
  • (Which reminds me, PsychBrief has been reading Meehl and provides extensive summaries here: Paul Meehl on philosophy of science: video lectures and papers)

    (Burak Tunca)
    What Theory is Not by Robert I. Sutton & Barry M. Staw

    (Joshua Skewes)
    Valerie Gray Hardcastle’s “How to build a theory in cognitive science”.

    (Randy McCarthy)
    Chapter 1 of Gawronski, B., & Bodenhausen, G. V. (2015). Theory and explanation in social psychology. Guilford Publications.

    (Kimberly Quinn)
    McGuire, W. J. (1997). Creative hypothesis generating in psychology: Some useful heuristics. Annual review of psychology, 48(1), 1-30.

    (Daniël Lakens)
    Jaccard, J., & Jacoby, J. (2010). Theory Construction and Model-building Skills: A Practical Guide for Social Scientists. Guilford Press.

    Fiedler, K. (2004). Tools, toys, truisms, and theories: Some thoughts on the creative cycle of theory formation. Personality and Social Psychology Review, 8(2), 123–131.

    (Tom Stafford)
    Roberts and Pashler (2000). How persuasive is a good fit? A comment on theory testing

    From the discussion it is clear that the theory crisis will be every bit as rich and full of dissent as the methods crisis.

    Updates 16 August 2018

    (Richard Prather)
    Simmering et al (2010). To Model or Not to Model? A Dialogue on the Role of Computational Modeling in Developmental Science

    (Brett Buttliere: we made a Facebook group to talk about theory)
    Psychological Theory Discussion Group

    (Eric Morris)
    Wilson, K. G. (2001). Some notes on theoretical constructs: types and validation from a contextual behavioral perspective

    (Michael P. Grosz)
    Theoretical Amnesia by Denny Borsboom

    (Ivan Grahek)
    Fiedler (2017). What Constitutes Strong Psychological Science? The (Neglected) Role of Diagnosticity and A Priori Theorizing

    (Iris van Rooij)
    More suggestions in these two theads (one, two)

    Open Science Essentials: Preprints

    Open science essentials in 2 minutes, part 4

    Before a research article is published in a journal you can make it freely available for anyone to read. You could do this on your own website, but you can also do it on a preprint server, such as psyarxiv.com, where other researchers also share their preprints, which is supported by the OSF so will be around for a while, and which allows you to find others’ research easily.

    Preprint servers have been used for decades in physics, but are now becoming more common across academia. Preprints allow rapid dissemination of your research, which is especially important for early career researchers. Preprints can be cited and indexing services like Google Scholar will join your preprint citations with the record of your eventual journal publication.

    Preprints also mean that work can be reviewed (and errors-caught) before final publication.

    What happens when my paper is published?

    Your work is still available in preprint form, which means that there is a non-paywalled version and so more people will read and cite it. If you upload a version of the manuscript after it has been accepted for publication that is called a post-print.

    What about copyright?

    Mostly journals own the formatted, typeset version of your published manuscript. This is why you often aren’t allowed to upload the PDF of this to your own website or a preprint server, but there’s nothing stopping you uploading a version with the same text (so the formatting will be different, but the information is the same).

    Will journals refuse my paper if it is already “published” via a preprint?

    Most journals allow, or even encourage preprints. A diminishing minority don’t. If you’re interested you can search for specific journal policies here.

    Will I get scooped?

    Preprints allow you to timestamp your work before publication, so they can act to establish priority on a findings which is protection against being scooped. Of course, if you have a project where you don’t want to let anyone know you are working in that area until you’re published, preprints may not be suitable.

    When should I upload a preprint?

    Upload a preprint at the point of submission to a journal, and for each further submission and upon acceptance (making it a postprint).

    What’s to stop people uploading rubbish to a preprint server?

    There’s nothing to stop this, but since your reputation for doing quality work is one of the most important things a scholar has I don’t recommend it.

    Useful links:

    Part of a series:

    1. Pre-registration
    2. The Open Science Framework
    3. Reproducibility

    Open Science Essentials: Reproducibility

    Open science essentials in 2 minutes, part 3

    Let’s define it this way: reproducibility is when your experiment or data analysis can be reliably repeated. It isn’t replicability, which we can define as reproducing an experiment and subsequent analysis and getting qualitatively similar results with the new data. (These aren’t universally accepted definitions, but they are common, and enough to get us started).

    Reproducibility is a bedrock of science – we all know that our methods section should contain enough detail to allow an independent researcher to repeat our experiment. With the increasing use of computational methods in psychology, there’s increasing need – and increasing ability – for us to share more than just a description of our experiment or analysis.

    Reproducible methods

    Using sites like the Open Science Framework you can share stimuli and other materials. If you use open source experiment software like PsychoPy or Tatool you can easily share the full scripts which run your experiment and people on different platforms and without your software licenses can still run your experiment.

    Reproducible analysis

    Equally important is making your analysis reproducible. You’d think that with the same data, another person – or even you in the future – would get the same results. Not so! Most analyses include thousands of small choices. A mis-step in any of these small choices – lost participants, copy/paste errors, mis-labeled cases, unclear exclusion criteria – can derail an analysis, meaning you get different results each time (and different results from what you’ve published).

    Fortunately a solution is at hand! You need to use analysis software that allows you to write a script to convert your raw data into your final output. That means no more Excel sheets (no history of what you’ve done = very bad – don’t be these guys) and no more point-and-click SPSS analysis.

    Bottom line: You must script your analysis – trust me on this one

    Open data + code

    You need to share and document your data and your analysis code. All this is harder work than just writing down the final result of an analysis once you’ve managed to obtain it, but it makes for more robust analysis, and allows someone else to reproduce your analysis easily in the future.

    The most likely beneficiary is you – you most likely collaborator in the future is Past You, and Past You doesn’t answer email. Every analysis I’ve ever done I’ve had to repeat, sometimes years later. It saves time in the long run to invest in making a reproducible analysis first time around.

    Further Reading

    Nick Barnes: Publish your computer code: it is good enough

    British Ecological Society: Guide to Reproducible Code

    Gael Varoquaux : Computational practices for reproducible science

    Advanced

    Reproducible Computational Workflows with Continuous Analysis

    Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research

    Part of a series for graduate students in psychology.
    Part 1: pre-registration.
    Part 2: the Open-Science Framework.

    Part 3: Reproducibility

    Open Science Essentials: The Open Science Framework

    Open science essentials in 2 minutes, part 2

    The Open Science Framework (osf.io) is a website designed for the complete life-cycle of your research project – designing projects; collaborating; collecting, storing and sharing data; sharing analysis scripts, stimuli, results and publishing results.

    You can read more about the rationale for the site here.

    Open Science is fast becoming the new standard for science. As I see it, there are two major drivers of this:

    1. Distributing your results via a slim journal article dates from the 17th century. Constraints on the timing, speed and volume of scholarly communication no longer apply. In short, now there is no reason not to share your full materials, data, and analysis scripts.

    2. The Replicability crisis means that how people interpret research is changing. Obviously sharing your work doesn’t automatically make it reliable, but since it is a costly signal, it is a good sign that you take the reliability of your work seriously.

    You could share aspects of your work in many ways, but the OSF has many benefits

    • the OSF is backed by serious money & institutional support, so the online side of your project will be live many years after you publish the link
    • It integrates with various other platform (github, dropbox, the PsyArXiv preprint server)
    • Totally free, run for scientists by scientists as a non-profit

    All this, and the OSF also makes easy things like version control and pre-registration.

    Good science is open science. And the fringe benefit is that making materials open forces you to properly document everything, which makes you a better collaborator with your number one research partner – your future self.

    Cross-posted at tomstafford.staff.shef.ac.uk.  Part of a series aimed at graduate students in psychology. Part 1: pre-registration.

     

    Open Science Essentials: pre-registration

    Open Science essentials in 2 minutes, part 1

    The Problem

    As a scholarly community we allowed ourselves to forget the distinction between exploratory vs confirmatory research, presenting exploratory results as confirmatory, presenting post-hoc rationales as predictions. As well as being dishonest, this makes for unreliable science.

    Flexibility in how you analyse your data (“researcher degrees of freedom“) can invalidate statistical inferences.

    Importantly, you can employ questionable research practices like this (“p-hacking“) without knowing you are doing it. Decide to stop an analysis because the results are significant? Measure 3 dependent variables and use the one that “works”? Exclude participants who don’t respond to your manipulation? All justified in exploratory research, but mean you are exploring a garden of forking paths in the space of possible analysis – when you arrive at a significant result, you won’t be sure you got there because of the data, or your choices.

    The solution

    There is a solution – pre-registration. Declare in advance the details of your method and your analysis: sample size, exclusion conditions, dependent variables, directional predictions.

    You can do this

    Pre-registration is easy. There is no single, universally accepted, way to do it.

    • you could write your data collection and analysis plan down and post it on your blog.
    • you can use the Open Science Framework to timestamp and archive a pre-registration, so you can prove you made a prediction ahead of time.
    • you can visit AsPredicted.org which provides a form to complete, which will help you structure your pre-registration (making sure you include all relevant information).
    • Registered Reports“: more and more journals are committing to published pre-registered studies. They review the method and analysis plan before data collection and agree to publish once the results are in (however they turn out).

    You should do this

    Why do this?

    • credibility – other researchers (and journals) will know you predicted the results before you got them.
    • you can still do exploratory analysis, it just makes it clear which is which.
    • forces you to think about the analysis before collecting the data (a great benefit).
    • more confidence in your results.

    Further reading

     

    Addendum 14/11/17

    As luck would have it, I stumbled across a bunch of useful extra resources in the days after publishing this post

    Cross-posted on at tomstafford.staff.shef.ac.uk.  Part of a series aimed at graduate students in psychology. Part 2: The Open Science Framework

    Why we need to get better at critiquing psychiatric diagnosis

    This piece is based on my talk to the UCL conference ‘The Role of Diagnosis in Clinical Psychology’. It was aimed at an audience of clinical psychologists but should be of interest more widely.

    I’ve been a longterm critic of psychiatric diagnoses but I’ve become increasingly frustrated by the myths and over-generalisations that get repeated and recycled in the diagnosis debate.

    So, in this post, I want to tackle some of these before going on to suggest how we can critique diagnosis more effectively. I’m going to be referencing the DSM-5 but the examples I mention apply more widely.

    “There are no biological tests for psychiatric diagnoses”

    “The failure of decades of basic science research to reveal any specific biological or psychological marker that identifies a psychiatric diagnosis is well recognised” wrote Sami Timini in the International Journal of Clinical and Health Psychology. “Scientists have not identified a biological cause of, or even a reliable biomarker for, any mental disorder” claimed Brett Deacon in Clinical Psychology Review. “Indeed”, he continued “not one biological test appears as a diagnostic criterion in the current DSM-IV-TR or in the proposed criteria sets for the forthcoming DSM-5”. Jay Watts writing in The Guardian states that “These categories cannot be verified with objective tests”.

    Actually there are very few DSM diagnoses for which biological tests are entirely irrelevant. Most use medical tests for differential diagnosis (excluding other causes), some DSM diagnoses require them as one of a number of criteria, and a handful are entirely based on biological tests. You can see this for yourself if you take the radical scientific step of opening the DSM-5 and reading what it actually says.

    There are some DSM diagnoses (the minority) for which biological tests are entirely irrelevant. Body dysmorphic disorder (p242), for example, a diagnosis that describes where people become overwhelmed with the idea that a part of their body is misshapen or unattractive, is purely based on reported experiences and behaviour. No other criteria are required or relevant.

    For most common DSM diagnoses, biological tests are relevant but for the purpose of excluding other causes. For example, in many DSM diagnoses there is a general exclusion that the symptoms must be not attributable to the physiological effects of a substance or another medical condition (this appears in schizophrenia, OCD, generalized anxiety disorder and many many others). On occasion, very specific biological tests are mentioned. For example, to make a confident diagnosis of panic disorder (p208), the DSM-5 recommends testing serum calcium levels to exclude hyperparathyroidism – which can produce similar symptoms.

    Additionally, there are a range of DSM diagnoses for which biomedical tests make up one or more of the formally listed criteria but aren’t essential to make the diagnosis. The DSM diagnosis of narcolepsy (p372) is one example, which has two such criteria: “Hypocretin deficiency, as measured by cerebrospinal fluid (CSF) hypocretin-1 immunoreactivity values of one-third or less of those obtained in healthy subjects using the same assay, or 110 pg/mL or less” and polysomnography showing REM sleep latency of 15 minutes or less. Several other diagnoses work along these lines – where a biomedical tests results are listed but are not necessary to make the diagnosis: the substance/medication-induced mental disorders, delirium, neuroleptic malignant syndrome, neurocognitive disorders, and so on.

    There are also a range of DSM diagnoses that are not solely based on biomedical tests but for which positive test results are necessary for the diagnosis. Anorexia nervosa (p338) is the most obvious, which requires the person to have a BMI of less than 17, but this applies to various sleep disorders (e.g. REM sleep disorder which requires a positive polysomnography or actigraphy finding) and some disorders due to other medical conditions. For example, neurocognitive disorder due to prion disease (p634) requires a brain scan or blood test.

    There are some DSM diagnoses which are based exclusively on biological test results. These are a number of sleep disorders (obstructive sleep apnea hypopnea, central sleep apnea and sleep-related hypoventilation, all diagnosed with polysomnography).

    “Psychiatric diagnoses ‘label distress'”

    The DSM, wrote Peter Kinderman and colleagues in Evidence-Based Mental Health is a “franchise for the classification and diagnosis of human distress”. The “ICD is based on exactly the same principles as the DSM” argued Lucy Johnstone, “Both systems are about describing people’s distress in terms of medical diagnosis”

    In reality, some psychiatric diagnoses do classify distress, some don’t.

    Here is a common criterion in many DSM diagnoses: “The symptoms cause clinical significant distress or impairment in social, occupational or other important areas of functioning”

    The theory behind this is that some experiences or behaviours are not considered of medical interest unless they cause you problems, which is defined as distress or impairment. Note however, that it is one or the other. It is still possible to be diagnosed if you’re not distressed but still find these experiences or behaviours get in the way of everyday life.

    However, there are a whole range of DSM diagnoses for which distress plays no part in making the diagnosis.

    Here is a non-exhaustive list: Schizophrenia, Tic Disorders, Delusional Disorder, Developmental Coordination Disorder, Brief Psychotic Disorder, Schizophreniform Disorder, Manic Episode, Hypomanic Episode, Schizoid Personality Disorder, Antisocial Personality Disorder, and so on. There are many more.

    Does the DSM ‘label distress’? Sometimes. Do all psychiatric diagnoses? No they don’t.

    “Psychiatric diagnoses are not reliable”

    The graph below shows the inter-rater reliability results from the DSM-5 field trial study. They use a statistical test called Cohen’s Kappa to test how well two independent psychiatrists, assessing the same individual through an open interview, agree on a particular diagnosis. A score above 0.8 is usually considered gold standard, they rate anything above 0.6 in the acceptable range.

    The results are atrocious. This graph is often touted as evidence that psychiatric diagnoses can’t be made reliably.

    However, here are the results from a study that tested diagnostic agreement on a range of DSM-5 diagnoses when psychiatrists used a structured interview assessment. Look down the ‘κ’ column for the reliability results. Suddenly they are much better and are all within the acceptable to excellent range.

    This is well-known in mental health and medicine as a whole. If you want consistency, you have to use a structured assessment method.

    While we’re here, let’s tackle an implicit assumption that underlies many of these critiques: supposedly, psychiatric diagnoses are fuzzy and unreliable, whereas the rest of medicine makes cut-and-dry diagnoses based on unequivocal medical test results.

    This is a myth based on ignorance about how medical diagnoses are made – almost all involve human judgement. Just look at the between-doctor agreement results for some diagnoses in the rest of medicine (which include the use of biomedical tests):

    Diagnosis of infection at the site of surgery (0.44), features of spinal tumours (0.19 – 0.59), bone fractures in children (0.71), rectal bleeding (0.42), paediatric stroke (0.61), osteoarthritis in the hand (0.60 – 0.82). There are many more examples in the medical literature which you can see for yourself.

    The reliability of DSM-5 diagnoses is typically poor for ‘off the top of the head’ diagnosis but this can be markedly improved by using a formal diagnostic assessment. This doesn’t seem to be any different from the rest of medicine.

    “Psychiatric diagnoses are not valid because they are decided by a committee”

    I’m sorry to break it to you, but all medical diagnoses are decided by committee.

    These committees shift the boundaries, revise, reject and resurrect diagnoses across medicine. The European Society of Cardiology revise the diagnostic criteria for heart failure and related problems on a yearly basis. The International League Against Epilepsy revise their diagnoses of different epilepsies frequently – they just published their revised manual earlier this year. In 2014 they broadened the diagnostic criteria for epilepsy meaning more people are now classified as having epilepsy. Nothing changed in people’s brains, they just made a group decision.

    In fact, if you look at the medical literature, it’s abuzz with committees deciding, revising and rejecting diagnostic criteria for medical problems across the board.

    Humans are not cut-and-dry. Neither are most illnesses, diseases and injuries, and decisions about what a particular diagnosis should include is always a trade-off between measurement accuracy, suffering, outcome, and the potential benefits of intervention. This gets revised by a committee who examine the best evidence and come to a consensus on what should count as a medically-relevant problem.

    These committees aren’t perfect. They sometimes suffer from fads and group think, and pharmaceutical industry conflicts of interest are a constant concern, but the fact that a committee decides a diagnosis does not make it invalid. I would argue that psychiatry is more prone to fads and pressure from pharmaceutical company interests than some other areas of medicine although it’s probably not the worst (surgery is notoriously bad in this regard). However, having a diagnosis decided by committee doesn’t make it invalid. Actually, on balance, it’s probably the least worst way of doing it.

    “Psychiatric diagnoses are not valid because they’re based on experience, behaviour or value judgements”

    We’ve discussed above how DSM diagnoses rely on medical tests to varying degrees. But the flip side of this, is that there are many non-psychiatric diagnoses which are also only based on classifying experience and/or behaviour. If you think this makes a diagnosis invalid or ‘not a real illness’ I look forward to your forthcoming campaigning to remove the diagnoses of tinnitus, sensory loss, many pain syndromes, headache, vertigo and the primary dystonias, for example.

    To complicate things further, we know some diseases have a clear basis in terms of tissue damage but the diagnosis is purely based on experience and/or behaviour. The diagnosis of Parkinson’s disease, for example, is made this way and there are no biomedical tests that confirm the condition, despite the fact that studies have shown it occurs due to a breakdown of dopamine neurons in the nigrostriatal pathway of the brain.

    At this point, someone usually says “but no one doubts that HIV or tuberculosis are diseases, whereas psychiatric diagnosis involves arbitrary decisions about what is considered pathological”. Cranks aside, the first part is true. It’s widely accepted – rightly so – that HIV and tuberculosis are diseases. However, it’s interesting how many critics of psychiatric diagnosis seem to have infectious diseases as their comparison for what constitutes a ‘genuine medical condition’ when infectious diseases are only a small minority of the diagnoses in medicine.

    Even here though, subjectivity still plays a part. Rather than focusing on a single viral or bacterial infection, think of all viruses and bacteria. Now ask, which should be classified as diseases? This is not as cut-and-dry as you might think because humans are awash with viruses and bacteria, some helpful, some unhelpful, some irrelevant to our well-being. Ed Yong’s book I Contain Multitudes is brilliant on this if you want to know more about the massive complexity of our microbiome and how it relates to our well-being.

    So the question for infectious disease experts is at what point does an unhelpful virus or bacteria become a disease? This involves making judgements about what should be considered a ‘negative effect’. Some are easy calls to make – mortality statistics are a fairly good yardstick. No one’s argued over the status of Ebola as a disease. But some cases are not so clear. In fact, the criteria for what constitutes a disease, formally discussed as how to classify the pathogenicity of microorganisms, can be found as a lively debate in the medical literature.

    So all diagnoses in medicine involve a consensus judgement about what counts as ‘bad for us’. There is no biological test that which can answer this question in all cases. Value judgements are certainly more common in psychiatry than infectious diseases but probably less so than in plastic surgery, but no diagnosis is value-free.

    “Psychiatric diagnosis isn’t valid because of the following reasons…”

    Debating the validity of diagnoses is a good thing. In fact, it’s essential we do it. Lots of DSM diagnoses, as I’ve argued before, poorly predict outcome, and sometimes barely hang together conceptually. But there is no general criticism that applies to all psychiatric diagnoses. Rather than going through all the diagnoses in detail, look at the following list of DSM-5 diagnoses and ask yourself whether the same commonly made criticisms about ‘psychiatric diagnosis’ could be applied to them all:

    Tourette’s syndrome, Insomnia, Erectile Disorder, Schizophrenia, Bipolar, Autism, Dyslexia, Stuttering, Enuerisis, Catatonia, PTSD, Pica, Sleep Apnea, Pyromania, Medication-Induced Acute Dystonia, Intermittent Explosive Disorder

    Does psychiatric diagnosis medicalise distress arising from social hardship? Hard to see how this applies to stuttering and Tourette’s syndrome. Is psychiatric diagnosis used to oppress people who behave differently? If this applies to sleep apnea, I must have missed the protests. Does psychiatric diagnosis privilege biomedical explanations? I’m not sure this applies to PTSD.

    There are many good critiques on the validity of specific psychiatric diagnoses, it’s impossible to see how they apply to all diagnoses.

    How can we criticise psychiatric diagnosis better?

    I want to make clear here that I’m not a ‘defender’ of psychiatric diagnosis. On a personal basis, I’m happy for people to use whatever framework they find useful to understand their own experiences. On a scientific basis, some diagnoses seem reasonable but many are a really poor guide to human nature and its challenges. For example, I would agree with other psychosis researchers that the days of schizophrenia being a useful diagnosis are numbered. By the way, this is not a particularly radical position – it has been one of the major pillars of the science of cognitive neuropsychiatry since it was founded.

    However, I would like to think I am a defender of actually engaging with what you’re criticising. So here’s how I think we could move the diagnosis debate on.

    Firstly, RTFM. Read the fucking manual. I’m sorry, but I’ve got no time for criticisms that can be refuted simply by looking at the thing you’re criticising. Saying there are no biological tests for DSM diagnoses is embarrassing when some are listed in the manual. Saying the DSM is about ‘labelling distress’ when many DSM diagnoses do not will get nothing more than an eye roll from me.

    Secondly, we need be explicit about what we’re criticising. If someone is criticising ‘psychiatric diagnosis’ as a whole, they’re almost certainly talking nonsense because it’s a massively diverse field. Our criticisms about medicalisation, poor predictive validity and biomedical privilege may apply very well to schizophrenia, but they make little sense when we’re talking about sleep apnea or stuttering. Diagnosis can really only be coherently criticised on a case by case basis or where you have demonstrated that a particular group of diagnoses share particular characteristics – but you have to establish this first.

    As an aside, restricting our criticisms to ‘functional psychiatric diagnosis’ will not suddenly make these arguments coherent. ‘Functional psychiatric diagnoses’ include Tourette’s syndrome, stuttering, dyslexia, erectile disorder, enuerisis, pica and insomnia to name but a few. Throwing them in front of the same critical cross-hairs as borderline personality disorder makes no sense. I did a whole talk on this if you want to check it out.

    Thirdly, let’s stop pretending this isn’t about power and inter-professional rivalries. Many people have written very lucidly about how diagnosis is one of the supporting pillars in the power structure of psychiatry. This is true. The whole point of structural analysis is that concept, practice and power are intertwined. We criticise diagnosis, we are attacking the social power of psychiatry. This is not a reason to avoid it, and doesn’t mean this is the primary motivation, but we need to be aware of what we’re doing. Pretending we’re criticising diagnosis but not taking a swing at psychiatry is like calling someone ugly but saying it’s nothing against them personally. We should be working for a better and more equitable approach to mental health – and that includes respectful and conscious awareness of the wider implications of our actions.

    Also, let’s not pretend psychology isn’t full of classifications. Just because they’re not published by the APA, doesn’t mean they’re any more valid or have the potential to be any more damaging (or indeed, the potential to be any more liberating). And if you are really against classifying experience and behaviour in any way, I recommend you stop using language, because it relies on exactly this.

    Most importantly though, this really isn’t about us as professionals. The people most affected by these debates are ultimately people with mental health problems, often with the least power to make a difference to what’s happening. This needs to change and we need to respect and include a diversity of opinion and lived experience concerning the value of diagnosis. Some people say that having a psychiatric diagnosis is like someone holding their head below water, others say it’s the only thing that keeps their head above water. We need a system that supports everyone.

    Finally, I think we’d be better off if we treated diagnoses more like tools, and less like ideologies. They may be more or less helpful in different situations, and at different times, and for different people, and we should strive to ensure a range of options are available to people who need them, both diagnostic and non-diagnostic. Each tested and refined with science, meaning, lived experience, and ethics.

    Serendipity in psychological research

    micDorothy Bishop has an excellent post ‘Ten serendipitous findings in psychology’, in which she lists ten celebrated discoveries which occurred by happy accident.

    Each discovery is interesting in itself, but Prof Bishop puts the discoveries in the context of the recent discussion about preregistration (declaring in advance what you are looking for and how you’ll look). Does preregistration hinder serendipity? Absolutely not says Bishop, not least because the context of ‘discovery’ is never a one-off experiment.

    Note that, in all cases, having made the initial unexpected observation – either from unstructured exploratory research, or in the course of investigating something else – the researchers went on to shore up the findings with further, hypothesis-driven experiments. What they did not do is to report just the initial observation, embellished with statistics, and then move on, as if the presence of a low p-value guaranteed the truth of the result.

    (It’s hard not to read into these comments a criticism of some academic journals which seem happy to publish single experiments reporting surprising findings.)

    Bishop’s list contains 3 findings from electrophysiology (recording brain cell activity directly with electrodes), which I think is notable. In these cases neural recording acts in the place of a microscope, allowing fairly direct observation of the system the scientist is investigating at a level of detail hitherto unavailable. It isn’t surprising to me that given a new tool of observation, the prepared mind of the scientists will make serendipitous discoveries. The catch is whether, for the rest of psychology, such observational tools exist. Many psychologists use their intuition to decide where to look, and experiments to test whether their intuition is correct. The important serendipitous discoveries from electrophysiology suggest that measures which are new ways of observing, rather than merely tests of ideas, must also be important for psychological discoveries. Do such observational measures exist?

    Good tests make children fail – here’s why

    Many parents and teachers are critical of the Standardised Assessment Tests (SATs) that have recently been taken by primary school children. One common complaint is that they are too hard. Teachers at my son’s school sent children home with example questions to quiz their parents on, hoping to show that getting full marks is next to impossible.

    Invariably, when parents try out these tests, they focus on the most difficult or confusing items. Some parents and teachers can be heard complaining on social media that if they get questions wrong, surely the tests are too hard for ten-year-olds.

    But how hard should tests for children be?

    As a psychologist, I know we have some well-developed principles that can help us address the question. If we look at the SATs as measures of some kind of underlying ability, then we can turn to one of the oldest branches of psychology – “psychometrics” – for some guidance.

    Getting it just right

    A good test shouldn’t be too hard. If most people get most questions wrong, then you have what is called a “floor effect”. The result is that you can’t tell any difference in ability between the people taking the test.

    If we started the school sports day high jump with the bar at two metres high (close to the world record), then we’d finish sports day with everybody getting the same – zero successful jumps – and no information about how good anyone is at the high jump.

    But at the same time, a good test shouldn’t be too easy. If most people get everything right, then the effect is, as you might expected, called a “ceiling effect”. If everybody gets everything right then again we don’t get any information from the test.

    The key idea is that tests must discriminate. In psychometric terms, the value of a test is about the match between the thing it is supposed to measure and the difficulty of the items on the test. If I wanted to gauge maths ability in six-year-olds and I gave them all an A-Level paper, we can presume that nearly everyone would score zero. Although the A-Level paper might be a good test, it is completely uninformative if it is badly matched to the ability of the people taking the test.

    Here’s the rub: for a test to be sensitive to differences in ability, it must contain items which people get wrong. Actually, there’s a precise answer to the proportion that you should get wrong – in the most sensitive test it should be half of the items. Questions which you are 50% likely to get right are the ones which are most informative.

    How we feel about measuring and labelling children according to their skill at taking these tests is a big issue, but it is important that we recognise that this is what tests do. A well designed test will make all children get some items wrong – it is inherent in their design. It is up to us how we conceptualise that: whether tests are an unnecessary distraction from true education, or a necessary challenge we all need to be exposed to.

    Better tests?

    If you adopt this psychometric perspective, it becomes clear that the tests we use are an inefficient way of measuring any individual child’s particular ability to do the test. Most children will be asked a bunch of questions which are too easy for them, before they get to the informative ones which are at the edge of their ability. Then they will go on to attempt a bunch of questions which are far too hard. And pity the people for who the test is poorly matched to their ability and consists mostly of questions they’ll get wrong – which is both uninformative in psychometric terms, and dispiriting emotionally.

    A hundred years ago, when we began our modern fixation with testing and measuring, it was hard to avoid the waste where many uninformative and potentially depressing questions were asked. This was simply because all children had to take the same exam paper.

    Nowadays, however, examiners can administer tests via computer, and algorithmically identify the most informative questions for each child’s ability – making the tests shorter, more accurate, and less focused on the experience of failure. You could throw in enough easy questions that no child would ever have the experience of getting most of the questions wrong. But still there’s no getting around the fact that an informative test has to contain questions most people sitting it will get wrong.

    Even a good test can measure an educationally irrelevant ability (such as merely the ability to do the test, or memorise abstract grammar rules), or be used in ways that harm children. But having difficult items isn’t a problem with the SATs, it’s a problem with all tests.

    The Conversation

    This article was originally published on The Conversation. Read the original article.

    The search for the terrorist ‘type’

    BBC World Service has an excellent radio documentary on the history and practice of terrorist profiling.

    Unlike many pieces on the psychology of terrorism, which tend to take a Hollywood view of the problem, it’s an insightful, critical and genuinely enlightening piece on the false promises and possibilities of applied psychology in the service of stopping terrorists.

    Crucially, it looks at how the practice developed over time and how it’s been affected by the ‘war on terror’.

    For decades researchers, academics and psychologists have wanted to know what kind of person becomes a terrorist. If there are pre-existing traits which make someone more likely to kill for their beliefs – well, that would be worth knowing… It’s a story which begins decades ago. But, with the threat from killers acting for so-called Islamic State, finding an answer has never felt more pressing.

    Recommended.
     
    Link to programme webpage, streaming and mp3.

    A brief hallucinatory twilight

    CC Licensed Photo by Flickr user Risto Kuulasmaa. Click for source.I’ve got an article in The Atlantic on the hypnagogic state – the brief hallucinatory period between wakefulness and sleep – and how it is being increasingly used as a tool to make sense of consciousness.

    There is a brief time, between waking and sleep, when reality begins to warp. Rigid conscious thought starts to dissolve into the gently lapping waves of early stage dreaming and the world becomes a little more hallucinatory, your thoughts a little more untethered. Known as the hypnagogic state, it has received only erratic attention from researchers over the years, but a recent series of studies have renewed interest in this twilight period, with the hope it can reveal something fundamental about consciousness itself.

    The hypnagogic state has been better dealt with by artists and writers over the years – Colderidge’s poem Kubla Khan apparently emerged out of hypnagogic reverie – albeit fuelled by opium

    It has received only occasional attention from scientists, however. More recently, a spate of studies has come out showing some genuine mainstream interest in understanding hypnagogia as an interesting source of information about how consciousness is deconstructed as we enter sleep.

     

    Link to article in The Atlantic on the hypnagogic state.

    Genetics is rarely just about genes

    If you want a crystal clear introduction to the role genetics can play in human nature, you can’t do much better than an article in The Guardian’s Sifting the Evidence blog by epidemiologist Marcus Munafo.

    It’s been giving a slightly distracting title – but ignore that – and just read the main text.

    Are we shaped more by our genes or our environment – the age-old question of nature and nurture? This is really a false dichotomy; few, if any, scientists working in the area of human behaviour would adhere to either an extreme nature or extreme nurture position. But what do we mean when we say that our behaviours are influenced by genetic factors? And how do we know?

    It will be one of the most useful 20 minutes you’ll spend this week.
     

    Link to excellent introduction to genetics and human behaviour.

    3 salvoes in the reproducibility crisis

    cannonThe reproducibility crisis in Psychology rumbles on. For the uninitiated, this is the general brouhaha we’re having over how reliable published psychological research is. I wrote a piece on this in 2013, which now sounds a little complacent, and unnecessarily focussed on just one area of psychology, given the extent of the problems since uncovered in the way research is manufactured (or maybe not, see below). Anyway, in the last week or so there have been three interesting developments

    Despair

    Michael Inzlicht blogged his ruminations on the state of the field of social psychology, and they’re not rosy : “We erred, and we erred badly“, he writes. It is a profound testament to the depth of the current concerns about the reliability of psychology when such a senior scientist begins to doubt the reality of some of the phenomenon upon which he has built his career investigating.

    As someone who has been doing research for nearly twenty years, I now can’t help but wonder if the topics I chose to study are in fact real and robust. Have I been chasing puffs of smoke for all these years?

    Don’t panic!

    But not everyone is worried. A team of Harvard A-listers, including Timothy Wilson and Daniel Gilbert, have released press release announcing a commentary on the “Reproducibility Project: Psychology”. This was an attempt to estimate the reliability of a large sample of phenomena from the psychology literature (Short introduction in Nature here). The paper from this project was picked as one of the most important of 2015 by the journal Science.

    There project is a huge effort, which is open to multiple interpretations. The Harvard team’s press release is headlined “No evidence of a replicability crisis in psychological science” and claimed “reproducibility of psychological science is indistinguishable from 100%”, as well as calling from the project to put effort into repairing the damage done to the reputation of psychological research. I’d link to the press release, but it looks like between me learning of it yesterday and coming to write about it today this material has been pulled from the internet. The commentary announced was due to be released on March the 4th, so we wait with baited breath for the good news about why we don’t need to worry about the reliability of psychology research. Come on boys, we need some good news.

    UPDATE 3rd March: The website is back! No Evidence for a Replicability Crisis in Psychological Science. Commentary here, and response

    …But whatever you do, optimally weight evidence

    Speaking of the Reproducibility Project, Alexander Etz produced a great Bayesian reanalysis of the data from that project (possible because it is all open access, via the Open Science Framework). This take on the project is a great example of how open science allows people to more easily build on your results, as well as being a vital complement to the original report – not least because it stops you naively accepting any simple statistical report of the what the reproducibility project ‘means’ (e.g. “30% of studies do not replicate” etc). Etz and Joachim Vandekerckhove have now upgraded the analysis to a paper, which is available (open access, natch) in PLoS One : “A Bayesian Perspective on the Reproducibility Project: Psychology“. And their interpretation of the reliability of psychology, as informed by the reproducibility project?

    Overall, 75% of studies gave qualitatively similar results in terms of the amount of evidence provided. However, the evidence was often weak …The majority of the studies (64%) did not provide strong evidence for either the null or the alternative hypothesis in either the original or the replication…We conclude that the apparent failure of the Reproducibility Project to replicate many target effects can be adequately explained by overestimation of effect sizes (or overestimation of evidence against the null hypothesis) due to small sample sizes and publication bias in the psychological literature

    Psychotherapies and the space between us

    Public domain image from pixabay. Click for source.There’s an in-depth article at The Guardian revisiting an old debate about cognitive behavioural therapy (CBT) versus psychoanalysis that falls into the trap of asking some rather clichéd questions.

    For those not familiar with the world of psychotherapy, CBT is a time-limited treatment based on understanding how interpretations, behaviour and emotions become unhelpfully connected to maintain psychological problems while psychoanalysis is a Freudian psychotherapy based on the exploration and interpretation of unhelpful processes in the unconscious mind that remain from unresolved conflicts in earlier life.

    I won’t go into the comparisons the article makes about the evidence for CBT vs psychoanalysis except to say that in comparing the impact of treatments, both the amount and quality of evidence are key. Like when comparing teams using football matches, pointing to individual ‘wins’ will tell us little. In terms of randomised controlled trials or RCTs, psychoanalysis has simply played far fewer matches at the highest level of competition.

    But the treatments are often compared due to them aiming to treat some of the same problems. However, the comparison is usually unhelpfully shallow.

    Here’s how the cliché goes: CBT is evidence-based but superficial, the scientific method applied for a quick fix that promises happiness but brings only light relief. The flip-side of this cliché says that psychoanalysis is based on apprenticeship and practice, handed down through generations. It lacks a scientific seal of approval but examines the root of life’s struggles through a form of deep artisanal self-examination.

    Pitching these two clichés against each other, and suggesting the ‘old style craftsmanship is now being recognised as superior’ is one of the great tropes in mental health – and, as it happens, 21st Century consumerism – and there is more than a touch of marketing about this debate.

    Which do you think is portrayed as commercial, mass produced, and popular, and which is expensive, individually tailored, and only available to an exclusive clientèle? Even mental health has its luxury goods.

    But more widely discussed (or perhaps, admitted to) are the differing models of the mind that each therapy is based on. But even here simple comparisons fall flat because many of the concepts don’t easily translate.

    One of the central tropes is that psychoanalysis deals with the ‘root’ of the psychological problem while CBT only deals with its surface effects. The problem with this contrast is that psychoanalysis can only be seen to deal with the ‘root of the problem’ if you buy into to the psychoanalytic view of where problems are rooted.

    Is your social anxiety caused by the projection of unacceptable feelings of hatred based in unresolved conflicts from your earliest childhood relationships – as psychoanalysis might claim? Or is your social anxiety caused by the continuation of a normal fear response to a difficult situation that has been maintained due to maladaptive coping – as CBT might posit?

    These views of the internal world, are, in many ways, the non-overlapping magisteria of psychology.

    Another common claim is that psychoanalysis assumes an unconscious whereas CBT does not. This assertion collapses on simple examination but the models of the unconscious are so radically different that it is hard to see how they easily translate.

    Psychoanalysis suggests that the unconscious can be understood in terms of objects, drives, conflicts and defence mechanisms that, despite being masked in symbolism, can ultimately be understood at the level of personal meaning. In contrast, CBT draws on its endowment from cognitive psychology and claims that the unconscious can often only be understood at the sub-personal level because meaning as we would understand it consciously is unevenly distributed across actions, reactions and interpretations rather than being embedded within them.

    But despite this, there are also some areas of shared common ground that most critics miss. CBT equally cites deep structures of meaning acquired through early experience that lie below the surface to influence conscious experience – but calls them core beliefs or schemas – rather than complexes.

    Perhaps the most annoying aspect of the CBT vs psychoanalysis debate is it tends to ask ‘which is best’ in a general and over-vague manner rather than examining the strengths and weaknesses of each approach for specific problems.

    For example, one of the central areas that psychoanalysis excels at is in conceptualising the therapeutic relationship as being a dynamic interplay between the perception and emotions of therapist and patient – something that can be a source of insight and change in itself.

    Notably, this is the core aspect that’s maintained in its less purist and, quite frankly, more sensible version, psychodynamic psychotherapy.

    CBT’s approach to the therapeutic relationship is essentially ‘be friendly and aim for cooperation’ – the civil service model of psychotherapy if you will – which works wonderfully except for people whose central problem is itself cooperation and the management of personal interactions.

    It’s no accident that most extensions of CBT (schema therapy, DBT and so on) add value by paying additional attention to the therapeutic relationship as a tool for change for people with complex interpersonal difficulties.

    Because each therapy assumes a slightly different model of the mind, it’s easy to think that they are somehow battling over the ‘what it means to be human’ and this is where the dramatic tension from most of these debates comes from.

    Mostly though, models of the mind are just maps that help us get places. All are necessarily stylised in some way to accentuate different aspects of human nature. As long as they sufficiently reflect the territory, this highlighting helps us focus on what we most need to change.

    No more Type I/II error confusion

    Type I and Type II errors are, respectively, when you allow a statistical test to convinces you of a false effect, and when you allow a statistical test to convince you to dismiss a true effect. Despite being fundamentally important concepts, they are terribly named. Who can ever remember which way around the two errors go? Well now I can, thanks to a comment from a friend I thought so useful I made it into a picture:

    Boycriedwolfbarlow

    Twelve minutes of consciousness

    The Economist has an excellent video on consciousness, what it is, why and how it evolved.

    The science section of The Economist has long had some of the best science reporting in the mainstream press and this video is a fantastic introduction to the science of consciousness.

    It’s 12 minutes long and it’s worth every second of your time.

    The reproducibility of psychological science

    The Reproducibility Project results have just been published in Science, a massive, collaborative, ‘Open Science’ attempt to replicate 100 psychology experiments published in leading psychology journals. The results are sure to be widely debated – the biggest result being that many published results were not replicated. There’s an article in the New York Times about the study here: Many Psychology Findings Not as Strong as Claimed, Study Says

    This is a landmark in meta-science : researchers collaborating to inspect how psychological science is carried out, how reliable it is, and what that means for how we should change what we do in the future. But, it is also an illustration of the process of Open Science. All the materials from the project, including the raw data and analysis code, can be downloaded from the OSF webpage. That means that if you have a question about the results, you can check it for yourself. So, by way of example, here’s a quick analysis I ran this morning: does the number of citations of a paper predict how large the effect size will be of a replication in the Reproducibility Project? Answer: not so much

    cites_vs_effectR

    That horizontal string of dots along the bottom is replications with close to zero-effect size, and high citations for the original paper (nearly all of which reported non-zero and statistically significant effects). Draw your own conclusions!

    Link: Reproducibility OSF project page

    Link: my code for making this graph (in python)