Um, almost the entire Scots Wikipedia was written by someone with no idea of the language – 10,000s of articles

None of you trained an AI on this data set, right? Right?


In an extraordinary and somewhat devastating discovery, it turns out virtually the entire Scots version of Wikipedia, comprising more than 57,000 articles, was written, edited or overseen by a netizen who clearly had nae the slightest idea about the language.

The user is not only a prolific contributor, they are an administrator of sco.wikipedia.org, having created, modified or guided the vast majority of its pages in more than 200,000 edits. The result is tens of thousands of articles in English with occasional, and often ridiculous, letter changes – such as replacing a “y” with “ee.”

That’s right, someone doing a bad impression of a Scottish accent and then writing it down phonetically is the chief maintainer of the online encyclopedia's Scots edition. And although this has been carrying on for the best part of a decade, the world was mostly oblivious to it all – until today, when one Redditor finally had enough of reading terrible Scots and decided to look behind the curtain.

“People embroiled in linguistic debates about Scots often use it as evidence that Scots isn’t a language, and if it was an accurate representation, they’d probably be right,” noted the Reddit sleuth, Ultach. “It uses almost no Scots vocabulary, what little it does use is usually incorrect, and the grammar always conforms to standard English, not Scots.”

While very nearly all Scottish people speak English, the Scots language was apparently still spoken, read, or otherwise understood by nearly 30 per cent of Scotland's population according to those responding to a 2011 census. The language got a memorable boost, too, when Scots-writing novelist Irvine Welsh's Trainspotting became a silver-screen sensation.

Ha! whaur ye gaun, ye crowlin ferlie?

The Scots Wikipedia, however, reads like the work of a tourist who attended a Burns Night, had one too many, and started channeling their imagined ancestors all over the internet while occasionally glancing at the Online Scots Dictionary in another browser tab.

Just two examples:

A veelage is a clustered human settlement or community, larger than a hamlet but smawer than a toun, wi a population rangin frae a few hunder tae a few thoosand (sometimes tens o thoosands).

In Greek meethology, the Minotaur wis a creatur wi the heid o a bull an the body o a man or, as describit bi Roman poet Ovid, a being "pairt man an pairt bull". The Minotaur wis eventually killed bi the Athenian hero Theseus.

The sheer size and scope is something to behold: look at virtually any of the 57,000-plus pages and you’ll find a nonsensical mishmash of English and Scots. Ultach spent some time going through the articles, and reached a conclusion: the editor's just doing a copy-paste job from the English edition with words they think are Scots equivalents searched'n'replaced.

Skylark Micro has left the Earth. Briefly

A Song of Iceland Fire: Scotland's Skyrora launches Skylark Micro rocket from volcanic viking outpost

READ MORE

“They do use some elements of Scots that would require a look up, they just use them completely incorrectly. For example, they consistently translate “also” as “an aw” in every context. So, Charles V would be ‘king o the Holy Roman Empire and an aw Spain [sic]’, and ‘Pascal an aw wrote in defence o the scienteefic method [sic]’. I think they did this because when you type ‘also’ into the Online Scots Dictionary, ‘an aw’ is the first thing that comes up.”

It’s not clear whether the Wikipedian – who identifies as a Christian furry living in the US – has spent the past near-decade creating thousands of fake posts as some kind of incredible practical joke, or that they honestly felt they were doing a good job. There have been occasional interactions with real Scottish folk taking exception to pages, and the administrator has responded in a dead-pan fashion.

Amazingly, the dreadful quality of the Scots-language version was the focus on an article five years ago when Slate noted that “at first glance, the Scots Wikipedia page reads like a transcription of a person with a Scottish accent,” while covering a request to Wikipedia that the entire sco.wikipedia.org archive be deleted.

You’re all bum and parsley

“Joke project. Funny for a few minutes, but inappropriate use of resources,” argued the proposer for its deletion, before they were attacked by angry Wikipedians who pointed out it was a “real language, lots of people who speak it,” and, they noted, there was “decent activity” on the pages. Decent, it turns out, because it was all being made up by someone who can’t speak a word of Scots. “Proposer should educate him/herself in linguistic diversity,” the Wikipedia collective sniffly concluded.

But while it is extremely funny on one level that an entire arm of the web encyclopedia comprises phony lingo, there is a potentially serious impact.

“This is going to sound incredibly hyperbolic and hysterical,” noted Ultach, “but I think this person has possibly done more damage to the Scots language than anyone else in history. They engaged in cultural vandalism on a hitherto unprecedented scale.

Wikipedia is one of the most visited websites in the world. Potentially tens of millions of people now think that Scots is a horribly mangled rendering of English

"Wikipedia is one of the most visited websites in the world. Potentially tens of millions of people now think that Scots is a horribly mangled rendering of English rather than being a language or dialect of its own, all because they were exposed to a mangled rendering of English being called Scots by this person and by this person alone.”

That view was backed up by the chief scientist at text analytics company Luminoso, Robyn Speer, who noted that several large language detectors use Scots Wikipedia as a reference.

“I believe that the cld2, cld3, and fastText language detectors all have Scots (sco) as one of the languages they claim to detect, and all of them are getting their belief about what Scots is from Wikipedia,” she noted.

In other words, fake Scots language is rapidly becoming real Scots online. And all because of a prolific apparent non-Scot. The jig may be up though: the administrator’s talk page has been taken over by some extremely non-plused folk: “Please stop before you cause anymore harm. Embarrassing,” says one.

Another: “Scots is not just English spelled differently. Even if well intended this is basically cultural vandalism and contributes to misunderstanding and claims that Scots isn't a language of its own. All these contributions should be taken down or edited by someone who can actually speak the Scots dialect, this is incredibly damaging.”

And if you think those comments are surprisingly polite, that’s almost certainly because Scotland was asleep when the news broke. Just wait when the Weegies wake up. ®

Bootnote

As we were about to publish this article, we saw that Ultach updated their Reddit dossier to note: "I've been told that the editor I've written about has received some harassment for what they've done. This should go without saying but I don't condone this at all. They screwed up and I'm sure they know that by now.

"They seem like a nice enough person who made a mistake when they were a young child, a mistake which nobody ever bothered to correct, so it's hardly their fault. They're clearly very passionate and dedicated, and with any luck maybe they can use this as an opportunity to learn the language properly and make a positive contribution. If you're reading this I hope you're doing alright and that you're not taking it too personally."


Linus Torvalds issues early Linux Kernel update to fix swapfile SNAFU

‘Subtle and very nasty bug’ meant 5.12 rc1 could trash entire filesystems

Linux overlord Linus Torvalds has rushed out a new release candidate of Linux 5.12 after the first in the new series was found to include a ‘subtle and very nasty bug’ that was so serious he marked rc1 as unsuitable for use.

“We had a very innocuous code cleanup and simplification that raised no red flags at all, but had a subtle and very nasty bug in it: swap files stopped working right. And they stopped working in a particularly bad way: the offset of the start of the swap file was lost,” Torvalds wrote in a March 3rd post to the Linux Kernel Mailing List.

“Swapping still happened, but it happened to the wrong part of the filesystem, with the obvious catastrophic end results.”

Continue reading

Just when you thought it was safe to enjoy a beer: Beware the downloaded patch applied in haste

Let us tell you a tale of the Mailman's Apprentice

Who, Me? The weekend is over and Monday is here. Celebrate your IT prowess with another there-but-for-the-grace confession from the Who, Me? archives.

Our tale, from a reader the Regomiser has elected to dub "Simon", takes us back to the early part of this century and to an anonymous antipodean institution of learning.

Simon was working at the local Student Union (or "guild" as the locals called it), which was having problems with uppity education staff censoring the emissions of students. Simon was therefore commissioned to set up a fully independent newsletter.

Continue reading

Remember that day in March 2020 when you were asked to get the business working from home – tomorrow, if possible? Here's how that worked out

IT pros from orgs large and small tell The Reg the tech delivered, mostly, but couriers and home Wi-Fi suddenly became your problem

Covid Logfile Brianna Haley was given one day to be ready to roll out Zoom for 13,000 users at over 1,000 sites.

Haley* is a project analyst for a large healthcare provider that, as COVID-19 marched across the world in March 2020, realised imminent lockdowns meant it would soon be unable to consult with patients.

And no consultations meant no revenue.

Continue reading

The torture garden of Microsoft Exchange: Grant us the serenity to accept what they cannot EOL

Time to fix those legacy evils, though.... right?

Column It is the monster which corrupts all it touches. It is an energy-sucking vampire that thrives on the pain it promotes. It cannot be killed, but grows afresh as each manifestation outdoes the last in awfulness and horror. It is Microsoft Exchange and its drooling minion, Outlook.

Let us start with the most numerous of its victims, the end users. Chances are, you are one. You may be numbed by lifelong exposure, your pain receptors and critical faculties burned out though years of corrosion. You might be like me, an habitual avoider whose work requirements periodically force its tentacles back in through the orifices.

I have recently started to use it through its web interface, where it doesn’t update the unread flags, hides attachments, multiplies browser instances, leaves temp files all over my download directory, tangles threads, botches searchers and so on.

Continue reading

Delayed, overbudget and broken. Of course Microsoft's finest would be found in NASA's Orion

In Space No One Can Hear You Scream (as Windows crashes again)

BORK!BORK!BORK! Getting astronauts to the Moon or Mars is the least of NASA's problems. Persuading Microsoft Windows not to fall over along the way is apparently a far greater challenge.

Spotted by Register reader Scott during a visit to the otherwise excellent Space Center Houston, there is something all too real lurking within the mock-up of the Orion capsule in which NASA hopes to send its astronauts for jaunts beyond low Earth orbit.

Clutched in the hand of a mannequin posed in the capsule's hatch is a reminder of both how old space tech tends to be and a warning for space-farers intending to take Microsoft's finest out for a spin.

Continue reading

Name True, iCloud access false: Exceptional problem locks online storage account, stumps Apple customer service

You're naming yourself wrong?

An iCloud customer says she spent more than six hours on the phone to Apple after being locked out of the service because her name is apparently incompatible with the application code.

"Actor, author, artist" Rachel True posted on Twitter about an error with the iCloud application, an unhandled exception with "Type error: cannot set value `true` to property `lastName`."

It seems that her name was interpreted as a Boolean value instead of a string, a common programming problem especially in dynamic languages which are more flexible about variable types.

Continue reading

Intel CPU interconnects can be exploited by malware to leak encryption keys and other info, academic study finds

Side-channel ring race 'hard to mitigate with existing defenses'

Chip-busting boffins in America have devised yet another way to filch sensitive data by exploiting Intel's processor design choices.

Doctoral student Riccardo Paccagnella, master's student Licheng Luo, and assistant professor Christopher Fletcher, all from the University of Illinois at Urbana-Champaign, delved into the way CPU ring interconnects work, and found they can be abused for side-channel attacks. The upshot is that one application can infer another application's private memory and snoop on the user's key presses.

"It is the first attack to exploit contention on the cross-core interconnect of Intel CPUs," Paccagnella told The Register. "The attack does not rely on sharing memory, cache sets, core-private resources or any specific uncore structures. As a consequence, it is hard to mitigate with existing side channel defenses."

Continue reading

NASA shows Mars that humans can drive a remote control space tank at .01 km/h

Perseverance takes first drive around landing spot named in honor of seminal sci-fi author Octavia E. Butler

NASA’s Perseverance rover trekked across Mars for the first time last Thursday, March 4, 2021.

The vehicle went four whole meters forward, turned 150 degrees to the left, then moved another two-and-a-half meters. The entire drive covered a whopping 6.5 m (21.3 feet) across Martian terrain. The journey took about 33 minutes.

The Register ran that through a calculator and deduces the nuclear powered laser-equipped space tank, aka Perseverance, sped along at the astounding velocity of .01km/h, quite a comedown from the 19,310 km/h at which it entered the red planet’s atmosphere.

Continue reading

Google's ex-boss tells the US it's time to take the gloves off on autonomous weapons

Plus: AI Index 2021 report takeaways, Chocolate Factory banished from top ethics conference, and more

In brief US government should avoid hastily banning AI-powered autonomous weapons and instead step up its efforts in developing such systems to keep up with foreign enemies, according to the National Security Commission on AI.

The independent group headed by ex-Google CEO Eric Schmidt and funded by the Department of Defense has published its final report advising the White House on how best to advance AI and machine learning to stay ahead of its competitors.

Stretching over 750 pages, the report covers a lot of areas, including retaining talent, the future of warfare, protecting IP, and US semiconductor supply chains.

Continue reading

Keeping up the PECR: ICO fines two marketing text pests £330k for sending 2.6 million messages

Leads Work Ltd and Valca Vehicle and Life Cover Agency tried to exploit household finance fears in lockdown, says data watchdog

Two businesses that dispatched more than 2.6 million nuisance text messages seeking to exploit lower household incomes during Britain’s first lockdown are nursing a combined financial penalty of £330,000 from the UK’s data watchdog.

The Information Commissioner’s Office (ICO) said it had received 10,000 official moans against West Sussex-based Leads Work Ltd [PDF], which sent more than 2.6 million lead generation texts between 16 May and 26 June 2020.

The texts were sent under the brand of Avon - yes, the direct sales biz that flogs cosmetics and perfumes. Any leads generated would then passed to independent Avon sales reps.

Continue reading

Biting the hand that feeds IT © 1998–2021