Online testing is sure to play a large part in the future of Psychology. Using Mechanical Turk or other crowdsourcing sites for research, psychologists can quickly and easily gather data for any study where the responses can be provided online. One concern, however, is that online samples may be less motivated to pay attention to the tasks they are participating in. Not only is nobody watching how they do these online experiments, they whole experience is framed as a work-for-cash gig, so there is pressure to complete any activity as quickly and with as low effort as possible. To the extent that the online participants are satisficing or skimping on their attention, can we trust the data?
A newly submitted paper uses data from the Many Labs 3 project, which recruited over 3000 participants from both online and University campus samples, to test the idea that online samples are different from the traditional offline samples used by academic psychologists:
The findings strike a note of optimism, if you’re into online testing (perhaps less so if you use traditional university samples):
Mechanical Turk workers report paying more attention and exerting more effort than undergraduate students. Mechanical Turk workers were also more likely to pass an instructional manipulation check than undergraduate students. Based on these results, it appears that concerns over participant inattentiveness may be more applicable to samples recruited from traditional university participant pools than from Mechanical Turk
This fits with previous reports showing high consistency when classic effects are tested online, and with reports that satisficing may have been very high in offline samples, we just weren’t testing for it.
However, an issue I haven’t seen discussed is whether, because of the relatively small pool of participants taking experiments on MTurk, online participants have an opportunity to get familiar with typical instructional manipulation checks (AKA ‘catch questions’, which are designed to check if you are paying attention). If online participants adapt to our manipulation checks, then the very experiments which set out to test if they are paying more attention may not be reliable.
Link: new paperGraduating from Undergrads: Are Mechanical Turk Workers More Attentive than Undergraduate Participants?
This paper provides a useful overview: Conducting perception research over the internet: a tutorial review
This is both interesting & useful.
But, until the science takes the long and hard step of replicating all findings over a long period of time, we simply have a long tail distribution of odd facts.
Michael, that’s an extreme claim, and I’m wondering if you’re referring to all of psychology with your comment? Yes, there are a handful of flashy effects in the field that someone found with one or two studies and left it at that. However, the popular coverage of this “replication crisis” seems to ignore how important replication has always been. To be taken seriously in published journals and programs of research, an effect needs to be shown many times, spanning the theoretical implications. So to call the literature a “long tail distribution of odd facts” seems to ignore the replicability embedded in the current literature.
To the point of the original post, I do think that the writer’s concern is valid. I think online samples have more incentive to appear like they’re paying attention, whether or not they’re offering a valid answer. As with anything, though, the hope is that the true patterns can emerge amongst the random noise.
I just had to reply….
Think about what you’re saying with “true patterns”….
What exactly ARE those “true patterns”?
Isn’t the article suggesting that answers to research surveys *can be* affected by the media/medium?….
Is no allowance made for variations in answers when 2 otherwise identical questions are asked, one online, and one in person?
After all, from a perspective large and long enough in both space and time, ALL of HUMANITY
is simply RANDOM NOISE…..
@Andy;
The phrase is not mine, it is from a paper by the Nobel Prize winner Vernon Smith.
I can probably get you the exact cite, if you want to mail me: michael@franchise-info.ca
In short, the quality of experimentation in revealed preference theory has gone down over the last 25 years.
There is no longer careful attention to which axioms may be violated.
(I blame Dan Ariely because he writes very well.)
Well, at least I learned a new word today. “Satisficing” is a combination of “satisfy” and “suffice”.
It’s amazing to me, at 56, what you kids today find interesting.
And it saddens me, just how poorly-educated the average college student truly is.
I used to think that the “idiot box” – the TV set – was the height of technological stupidity.
But the way the internet is in 2015, we’ll get to the promised land seen in the movie “Idiocracy” sooner, rather than later.
Thanks for the nice summary of the paper!
Previous experience with attention checks may have something to do with it. I mention this briefly in the discussion section at the bottom of page 8. This is also discussed more comprehensively in Hauser & Schwarz (in press): http://www.ncbi.nlm.nih.gov/pubmed/25761395 Nevertheless, differences in attention and effort were still found for self-report items, participation in the Many Labs 3 study wasn’t restricted to MTurk workers with high reputations or lots of previous experience, and an adapted version was used for the attention check.
Thanks again for sharing the paper and findings!
Interesting paper – and a really interesting set of questions.
Schober et al. have a great new paper that looks at the effects of communication modality (voice vs. sms) and interviewer (automated vs. human) on survey response reliability.
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0128337
They found that people interviewed on iPhones at a time and place that is convenient for them – whether or not they are multitasking – can give more accurate answers than those in traditional interviews.
It would be useful to compare the reliability and accuracy of responses by MTurk users at their computers at specific work-times to responses on mobile devices (I think crowdflower do an MTurk-compatible app? I don’t seem to be able to find many mobile MTurk apps for some reason).