Um, almost the entire Scots Wikipedia was written by someone with no idea of the language – 10,000s of articles

None of you trained an AI on this data set, right? Right?

229 Reg comments Got Tips?

In an extraordinary and somewhat devastating discovery, it turns out virtually the entire Scots version of Wikipedia, comprising more than 57,000 articles, was written, edited or overseen by a netizen who clearly had nae the slightest idea about the language.

The user is not only a prolific contributor, they are an administrator of sco.wikipedia.org, having created, modified or guided the vast majority of its pages in more than 200,000 edits. The result is tens of thousands of articles in English with occasional, and often ridiculous, letter changes – such as replacing a “y” with “ee.”

That’s right, someone doing a bad impression of a Scottish accent and then writing it down phonetically is the chief maintainer of the online encyclopedia's Scots edition. And although this has been carrying on for the best part of a decade, the world was mostly oblivious to it all – until today, when one Redditor finally had enough of reading terrible Scots and decided to look behind the curtain.

“People embroiled in linguistic debates about Scots often use it as evidence that Scots isn’t a language, and if it was an accurate representation, they’d probably be right,” noted the Reddit sleuth, Ultach. “It uses almost no Scots vocabulary, what little it does use is usually incorrect, and the grammar always conforms to standard English, not Scots.”

While very nearly all Scottish people speak English, the Scots language was apparently still spoken, read, or otherwise understood by nearly 30 per cent of Scotland's population according to those responding to a 2011 census. The language got a memorable boost, too, when Scots-writing novelist Irvine Welsh's Trainspotting became a silver-screen sensation.

Ha! whaur ye gaun, ye crowlin ferlie?

The Scots Wikipedia, however, reads like the work of a tourist who attended a Burns Night, had one too many, and started channeling their imagined ancestors all over the internet while occasionally glancing at the Online Scots Dictionary in another browser tab.

Just two examples:

A veelage is a clustered human settlement or community, larger than a hamlet but smawer than a toun, wi a population rangin frae a few hunder tae a few thoosand (sometimes tens o thoosands).

In Greek meethology, the Minotaur wis a creatur wi the heid o a bull an the body o a man or, as describit bi Roman poet Ovid, a being "pairt man an pairt bull". The Minotaur wis eventually killed bi the Athenian hero Theseus.

The sheer size and scope is something to behold: look at virtually any of the 57,000-plus pages and you’ll find a nonsensical mishmash of English and Scots. Ultach spent some time going through the articles, and reached a conclusion: the editor's just doing a copy-paste job from the English edition with words they think are Scots equivalents searched'n'replaced.

Skylark Micro has left the Earth. Briefly

A Song of Iceland Fire: Scotland's Skyrora launches Skylark Micro rocket from volcanic viking outpost

READ MORE

“They do use some elements of Scots that would require a look up, they just use them completely incorrectly. For example, they consistently translate “also” as “an aw” in every context. So, Charles V would be ‘king o the Holy Roman Empire and an aw Spain [sic]’, and ‘Pascal an aw wrote in defence o the scienteefic method [sic]’. I think they did this because when you type ‘also’ into the Online Scots Dictionary, ‘an aw’ is the first thing that comes up.”

It’s not clear whether the Wikipedian – who identifies as a Christian furry living in the US – has spent the past near-decade creating thousands of fake posts as some kind of incredible practical joke, or that they honestly felt they were doing a good job. There have been occasional interactions with real Scottish folk taking exception to pages, and the administrator has responded in a dead-pan fashion.

Amazingly, the dreadful quality of the Scots-language version was the focus on an article five years ago when Slate noted that “at first glance, the Scots Wikipedia page reads like a transcription of a person with a Scottish accent,” while covering a request to Wikipedia that the entire sco.wikipedia.org archive be deleted.

You’re all bum and parsley

“Joke project. Funny for a few minutes, but inappropriate use of resources,” argued the proposer for its deletion, before they were attacked by angry Wikipedians who pointed out it was a “real language, lots of people who speak it,” and, they noted, there was “decent activity” on the pages. Decent, it turns out, because it was all being made up by someone who can’t speak a word of Scots. “Proposer should educate him/herself in linguistic diversity,” the Wikipedia collective sniffly concluded.

But while it is extremely funny on one level that an entire arm of the web encyclopedia comprises phony lingo, there is a potentially serious impact.

“This is going to sound incredibly hyperbolic and hysterical,” noted Ultach, “but I think this person has possibly done more damage to the Scots language than anyone else in history. They engaged in cultural vandalism on a hitherto unprecedented scale.

Wikipedia is one of the most visited websites in the world. Potentially tens of millions of people now think that Scots is a horribly mangled rendering of English

"Wikipedia is one of the most visited websites in the world. Potentially tens of millions of people now think that Scots is a horribly mangled rendering of English rather than being a language or dialect of its own, all because they were exposed to a mangled rendering of English being called Scots by this person and by this person alone.”

That view was backed up by the chief scientist at text analytics company Luminoso, Robyn Speer, who noted that several large language detectors use Scots Wikipedia as a reference.

“I believe that the cld2, cld3, and fastText language detectors all have Scots (sco) as one of the languages they claim to detect, and all of them are getting their belief about what Scots is from Wikipedia,” she noted.

In other words, fake Scots language is rapidly becoming real Scots online. And all because of a prolific apparent non-Scot. The jig may be up though: the administrator’s talk page has been taken over by some extremely non-plused folk: “Please stop before you cause anymore harm. Embarrassing,” says one.

Another: “Scots is not just English spelled differently. Even if well intended this is basically cultural vandalism and contributes to misunderstanding and claims that Scots isn't a language of its own. All these contributions should be taken down or edited by someone who can actually speak the Scots dialect, this is incredibly damaging.”

And if you think those comments are surprisingly polite, that’s almost certainly because Scotland was asleep when the news broke. Just wait when the Weegies wake up. ®

Bootnote

As we were about to publish this article, we saw that Ultach updated their Reddit dossier to note: "I've been told that the editor I've written about has received some harassment for what they've done. This should go without saying but I don't condone this at all. They screwed up and I'm sure they know that by now.

"They seem like a nice enough person who made a mistake when they were a young child, a mistake which nobody ever bothered to correct, so it's hardly their fault. They're clearly very passionate and dedicated, and with any luck maybe they can use this as an opportunity to learn the language properly and make a positive contribution. If you're reading this I hope you're doing alright and that you're not taking it too personally."

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER


Biting the hand that feeds IT © 1998–2020