I’ve been puzzling over this tweet from Jeff Rouder:
Surely, I thought, psychology is built out of effects. What could be wrong with focussing on testing which ones are reliable?
But I think I’ve got it now. The thing about effects is that they show you – an experimental psychologist – can construct a situation where some factor you are interested in is important, relative to all the other factors (which you have managed to hold constant).
To see why this might be a problem, consider this paper by Tsay (2013): “Sight over sound in the judgment of music performance”. This was a study which asked people to select the winners of a classical music competition from 6 second clips of them performing. Some participants got the audio, so they could only hear the performance; others got the video, so they could only see the performance; and some got both audio and video. Only those participants who watched the video, without sound, could select the actual competition winners at above chance level. This demonstrates a significant bias effect of sight in judgements of music performance.
To understand the limited importance of this effect, contrast with the overclaims made by the paper: “people actually depend primarily on visual information when making judgments about music performance” (in the abstract) and “[Musicians] relegate the sound of music to the role of noise” (the concluding line). Contrary to these claims the study doesn’t show that looks dominate sound in how we assess music. It isn’t the case that our musical taste is mostly determined by how musicians look.
The Tsay studies took the 3 finalists from classical music competitions – the best of the best of expert musicians – and used brief clips of their performances as stimuli. By my reckoning, this scenario removes almost all differences in quality of the musical performance. Evidence in support for this is that Tsay didn’t find any difference in performance between non-expert participants and professional musicians. This fact strongly suggests that she has designed a task in which it is impossible to bring any musical knowledge to bear. musical knowledge isn’t an important factor.
This is why it isn’t reasonable to conclude that people are making judgments about musical performance in general. The clips don’t let you judge relative musical quality, but – for these equally almost equally matched performances – they do let you reflect the same biases as the judges, biases which include an influence of appearance as well as sound. The bias matters, not least because it obviously affects who won, but proving it exists is completely separate from the matter of whether the overall judgements of music, is affected more by sight or sound.
Further, there’s every reason to think that the conclusion from the study of the bias effect gives the opposite conclusion to the study of overall importance. In these experiments sight dominates sound, because differences due to sound have been controlled out. In most situations where we decide our music preferences, sounds is obviously massively more important.
Many psychological effects are impressive tribute to the skill of experimenters in designing situations where most factors are held equal, allowing us to highlight the role of subtle psychological factors. But we shouldn’t let this blind us to the fact that the existence of an effect due to a psychological factor isn’t the same as showing how important this factor is relative to all others, nor is it the same as showing that our effect will hold when all these other factors start varying.
Link: Are classical music competitions judged on looks? – critique of Tsay (2013) written for The Conversation
Link: A good twitter thread on the related issue of effect size – and yah-boo to anyone who says you can’t have a substantive discussion on social media
UPDATE: The paper does give evidence that the sound stimuli used do influence people’s judgements systemmatically – it was incorrect of me to say that differences due to sound have been removed. I have corrected the post to reflect what I believe the study shows: that differences due to sound have been minimised, so that differences in looks are emphasised.
11 thoughts on “Distraction effects”
Nice analysis: the point that sound quality differences have already been excluded is excellent.
FWIW, a story going around in the classical world in the early 1970s claimed that when they went to blind auditions for positions in the major US symphonies, the number of women in the orchestras skyrocketed. Could be apocryphal, but it sure sounded right.
Not apocryphal! The best reference for this is, I believe:
Goldin, C. & Rouse, C. (2000). Orchestrating Impartiality: The Impact Of ‘Blind’ Auditions On Female Musicians. American Economic Review, 2000, 90, 715-741.
And nicely underlines the point that bias effects can be very important – if you are losing a competition or missing out on a job, for example, they really matter. But we just have to be important on how we interpret them
Great overall point and the other issue I see is that there’s not just 2 factors (social judgement, sound) but also (3) technique. Granted 6 seconds is not a lot but correct classical performance does rely heavily on proper placement of the hands, movement of the hands and posture. You’d have to then know how long it takes a music judge to size up the musician based on technique alone.
Oh and by the way WHEN is somebody going to write a manifesto on the ideal way to use Twitter. Some people get it, some don’t. Funny, concise, clever. There has to be a linguistic formula. Somewhere.
It just occured to me that the same argument applies to arguments claiming the overall importance of unconscious processing, based on experiments which only test differences.
I wrote about this at more length here
Stafford, T. (2014) The perspectival shift: how experiments on unconscious processing don’t justify the claims made for them. Frontiers in Psychology, 5, 1067. doi:10.3389/fpsyg.2014.01067
Now you’ve got me wondering: How many different priming effects is the average person exposed to in a single day? How many are theoretically on the same sort of behavioral DV?
Tom, Thank you for the consideration of my tweet. It is a humbling experience. My intent, not well expressed in 140 characters, is to ask whether we wish t to make effects central. The alternative is to make the opposite of effects, that is invariance, conservations, lawfulness, more central. One of my heroes is Kepler. If you look at the motion of the planets, there are effects for all variables. Planets not only change in position, they change in declination, velocity, even direction. All the Fs would be large, and ps small. Yet, Keppler extracted three invariances that dont change in the relations among these variables, and these three invariances go as Kepler’s Laws today. Included among them are that all planets follow an ellipse with the sun at once foci.
How about psychology? People actually do think in invariances sometimes. Miller, and then later Cowan asked whether working memory is limited to a fixed number of items if there is no chunking My colleagues and I have wondered whether all RT distributions have the same shape (Rouder et al., 2010, Psych Rev). Once you center then at zero and make the variance 1.0, they are all the same. Another example is whether performance is a mixtures of responses from stimulus-driven and guessing states. We tend to look at confidence ratings or production=response profiles and ask whether these can be decomposed into mixtures. Such a model instantiates a critical invariance (Province & Rouder, 2012, PNAS).
Thanks Jeff! It’s obvious I took “invariances” to mean something slightly different from your intent – I was contrasting differences in performance with the contribution of factors to total performance, so thanks for clarifying and expanding on this point
I like your point about experimenters having essentially controlled away any chance that subjects could have judged the music for its musicality at anything above chance. What I noticed right away was that the experimental description failed to notify us if the original judges were able to see the musical performers they were judging, or whether they got to listen to more than six seconds of performance themselves. Good grief, unless someone really screws up, or is truly awful, its very difficult to say anything about a musical performance in six seconds of listening. It’s easy to make a snap decision based on a quick glance, but a musical performance can only be judged fairly in its entirety. So, is the real problem here the excessive skill of the experimenters in emphasizing effects, or just really, really bad experimental design? This in not to say that experimental psychologists tendency to go mining for novel ‘effects’ isn’t a problem as well.
Excellent post. The parallel to this (within psychology even) is efficacy and effectiveness research with interventions.
I think that there is a more general problem of the experimental approach that is often overlooked but flagged in this commentary. In trying to “drill down” onto the factor that are believed to be relevant by controlling for and eliminating the influence of assumed extraneous variables, one is creating an artificial scenario that does not reflect the complexity of real life. This not only raise the problem of how to interpret any findings in terms of strength but also the extent to which variables interact in unpredictable ways. In other words, removing the woods in order to see the individual trees misses the complexity of mechanisms that must be operating.