One of the computational linguists who applied forensic text analysis to JK Rowling’s books to uncover her as the author of The Cuckoo’s Calling describes the science behind his investigation in a post for Language Log.
It seems Rowling’s authorship was originally leaked by her law firm and a UK newspaper turned to two academics who specialise forensic text analysis to back up their suspicions.
On of those academics, computer scientist Patrick Juola, wrote a piece for Language Log to describe how this sort of text analysis works.
Of the 11 sections of Cuckoo, six were closest (in distribution of word lengths) to Rowling, five to James. No one else got a mention.
Another feature I used were the 100 most common words. What percentage of the document were “the,” what were “of,” and so on. Again, a rich data set that is easy to extract by computer. Using an otherwise similar analysis (including cosine distance again), four of the sections were Rowling-like, four were McDermid-like, and the other three split between James and Rendell.
I ran two tests based on authorial vocabulary. The first was on the distribution of character 4-grams, groups of four adjacent characters. These could be words, parts of words (like four letters “nsid” that would be inside the word “inside”) or even parts of two words (like the four letters “n th” as part of the phrase “in the”)… I also ran on word bigrams, pairs of adjacent words, again a feature with a good track record.
The character 4-grams showed a preference for McDermid, with 8 sections close to her. Three were Rowling-like, and no one else was mentioned. The word pairs, on the other hand, were clearly Rowling-like (9 sections, against 2 by McDermid, no one else mentioned).
If you want to play around with some of the technology behind both Juola’s authorship attribution work, or that of Peter Millican – the other academic contacted by the press to do an analysis – you can actually download them both from the net.
Juola’s JGAAP programme is available here while you can get Millican’s at this page.
Rumours that Mind Hacks is actually written by Natalie Portman will be strictly denied.
Link to Juola’s post on Language Log.
3 thoughts on “The Mystery of The Cuckoo’s Calling”
I once played with online software that analysed writing styles. I fed in the opening pages of Dostoyevsky’s ‘Crime and Punishment’ (in translation, to be fair) = and the software concluded it was ‘25% like Tolstoy.’ Close but so cigar.
I still believe Natalie Portman write Mind Hacks. 🙂
Cool. And…..thanks for helping me win the debate for why real books are better than electronic.