Um, almost the entire Scots Wikipedia was written by someone with no idea of the language – 10,000s of articles

None of you trained an AI on this data set, right? Right?


In an extraordinary and somewhat devastating discovery, it turns out virtually the entire Scots version of Wikipedia, comprising more than 57,000 articles, was written, edited or overseen by a netizen who clearly had nae the slightest idea about the language.

The user is not only a prolific contributor, they are an administrator of sco.wikipedia.org, having created, modified or guided the vast majority of its pages in more than 200,000 edits. The result is tens of thousands of articles in English with occasional, and often ridiculous, letter changes – such as replacing a “y” with “ee.”

That’s right, someone doing a bad impression of a Scottish accent and then writing it down phonetically is the chief maintainer of the online encyclopedia's Scots edition. And although this has been carrying on for the best part of a decade, the world was mostly oblivious to it all – until today, when one Redditor finally had enough of reading terrible Scots and decided to look behind the curtain.

“People embroiled in linguistic debates about Scots often use it as evidence that Scots isn’t a language, and if it was an accurate representation, they’d probably be right,” noted the Reddit sleuth, Ultach. “It uses almost no Scots vocabulary, what little it does use is usually incorrect, and the grammar always conforms to standard English, not Scots.”

While very nearly all Scottish people speak English, the Scots language was apparently still spoken, read, or otherwise understood by nearly 30 per cent of Scotland's population according to those responding to a 2011 census. The language got a memorable boost, too, when Scots-writing novelist Irvine Welsh's Trainspotting became a silver-screen sensation.

Ha! whaur ye gaun, ye crowlin ferlie?

The Scots Wikipedia, however, reads like the work of a tourist who attended a Burns Night, had one too many, and started channeling their imagined ancestors all over the internet while occasionally glancing at the Online Scots Dictionary in another browser tab.

Just two examples:

A veelage is a clustered human settlement or community, larger than a hamlet but smawer than a toun, wi a population rangin frae a few hunder tae a few thoosand (sometimes tens o thoosands).

In Greek meethology, the Minotaur wis a creatur wi the heid o a bull an the body o a man or, as describit bi Roman poet Ovid, a being "pairt man an pairt bull". The Minotaur wis eventually killed bi the Athenian hero Theseus.

The sheer size and scope is something to behold: look at virtually any of the 57,000-plus pages and you’ll find a nonsensical mishmash of English and Scots. Ultach spent some time going through the articles, and reached a conclusion: the editor's just doing a copy-paste job from the English edition with words they think are Scots equivalents searched'n'replaced.

Skylark Micro has left the Earth. Briefly

A Song of Iceland Fire: Scotland's Skyrora launches Skylark Micro rocket from volcanic viking outpost

READ MORE

“They do use some elements of Scots that would require a look up, they just use them completely incorrectly. For example, they consistently translate “also” as “an aw” in every context. So, Charles V would be ‘king o the Holy Roman Empire and an aw Spain [sic]’, and ‘Pascal an aw wrote in defence o the scienteefic method [sic]’. I think they did this because when you type ‘also’ into the Online Scots Dictionary, ‘an aw’ is the first thing that comes up.”

It’s not clear whether the Wikipedian – who identifies as a Christian furry living in the US – has spent the past near-decade creating thousands of fake posts as some kind of incredible practical joke, or that they honestly felt they were doing a good job. There have been occasional interactions with real Scottish folk taking exception to pages, and the administrator has responded in a dead-pan fashion.

Amazingly, the dreadful quality of the Scots-language version was the focus on an article five years ago when Slate noted that “at first glance, the Scots Wikipedia page reads like a transcription of a person with a Scottish accent,” while covering a request to Wikipedia that the entire sco.wikipedia.org archive be deleted.

You’re all bum and parsley

“Joke project. Funny for a few minutes, but inappropriate use of resources,” argued the proposer for its deletion, before they were attacked by angry Wikipedians who pointed out it was a “real language, lots of people who speak it,” and, they noted, there was “decent activity” on the pages. Decent, it turns out, because it was all being made up by someone who can’t speak a word of Scots. “Proposer should educate him/herself in linguistic diversity,” the Wikipedia collective sniffly concluded.

But while it is extremely funny on one level that an entire arm of the web encyclopedia comprises phony lingo, there is a potentially serious impact.

“This is going to sound incredibly hyperbolic and hysterical,” noted Ultach, “but I think this person has possibly done more damage to the Scots language than anyone else in history. They engaged in cultural vandalism on a hitherto unprecedented scale.

Wikipedia is one of the most visited websites in the world. Potentially tens of millions of people now think that Scots is a horribly mangled rendering of English

"Wikipedia is one of the most visited websites in the world. Potentially tens of millions of people now think that Scots is a horribly mangled rendering of English rather than being a language or dialect of its own, all because they were exposed to a mangled rendering of English being called Scots by this person and by this person alone.”

That view was backed up by the chief scientist at text analytics company Luminoso, Robyn Speer, who noted that several large language detectors use Scots Wikipedia as a reference.

“I believe that the cld2, cld3, and fastText language detectors all have Scots (sco) as one of the languages they claim to detect, and all of them are getting their belief about what Scots is from Wikipedia,” she noted.

In other words, fake Scots language is rapidly becoming real Scots online. And all because of a prolific apparent non-Scot. The jig may be up though: the administrator’s talk page has been taken over by some extremely non-plused folk: “Please stop before you cause anymore harm. Embarrassing,” says one.

Another: “Scots is not just English spelled differently. Even if well intended this is basically cultural vandalism and contributes to misunderstanding and claims that Scots isn't a language of its own. All these contributions should be taken down or edited by someone who can actually speak the Scots dialect, this is incredibly damaging.”

And if you think those comments are surprisingly polite, that’s almost certainly because Scotland was asleep when the news broke. Just wait when the Weegies wake up. ®

Bootnote

As we were about to publish this article, we saw that Ultach updated their Reddit dossier to note: "I've been told that the editor I've written about has received some harassment for what they've done. This should go without saying but I don't condone this at all. They screwed up and I'm sure they know that by now.

"They seem like a nice enough person who made a mistake when they were a young child, a mistake which nobody ever bothered to correct, so it's hardly their fault. They're clearly very passionate and dedicated, and with any luck maybe they can use this as an opportunity to learn the language properly and make a positive contribution. If you're reading this I hope you're doing alright and that you're not taking it too personally."

Similar topics


Other stories you might like

  • These six proposed bipartisan antitrust laws put Big Tech in the cross-hairs – and a House committee just OK'd them

    Well, it's a start

    The US House Judiciary Committee this week approved half a dozen major bipartisan antitrust bills aimed at clamping down on the growing power of Big Tech and its monopolization of some markets.

    The panel, led by Jerry Nadler (D-NY), debated for nearly 30 hours on Wednesday and Thursday to advance the wide-sweeping six-bill package. The proposed laws includes all sorts of measures to prevent companies like Google, Apple, Amazon, Microsoft, Facebook, and others from dominating their sectors of the technology industry.

    There was likely plenty of lobbying and other wrangling going on in the back and foreground over the exact wording of the package. For instance, there was a concern by some lawmakers that Microsoft would end up avoiding certain provisions in the proposed acts that would otherwise hit Google and Apple. Tweaks were made – such as removing "mobile" from "mobile operating system" in the fine-print – to ensure no one was wriggling out.

    Continue reading
  • You won't want that Linux bling if it comes from Pling: Marketplace platform has critical vulnerabilities

    No one wants to be pwned by a drive-by RCE

    A Berlin startup has disclosed a remote-code-execution (RCE) vulnerability and a wormable cross-site-scripting (XSS) flaw in Pling, which is used by various Linux desktop theme marketplaces.

    Positive Security, which found the holes and is not to be confused with Russia’s Positive Technologies, said the bugs are still present in the Pling code and its maintainers have not responded to vulnerability reports.

    Pling presents itself as a marketplace for creative folk to upload Linux desktop themes and graphics, among other things, in the hope of making a few quid from supporters. It comes in two parts: code needed to run your own bling bazaar, and an Electron-based app users can install to manage their themes from a Pling souk. The web code has the XSS in it, and the client has the XSS and an RCE. Pling powers a bunch of sites, from pling.com and store.kde.org to gnome-look.org and xfce-look.org.

    Continue reading
  • Would-be password-killer FIDO Alliance aims to boost uptake with new UX guidelines

    Throws a bone to complex enterprise deployment, too

    The FIDO Alliance, which operates with no smaller mission than to "reduce the world's over-reliance on passwords", has announced the release of new user experience (UX) guidelines aimed at bringing the more technophobic on board.

    Launched back in 2013 as the Fast Identity Online Alliance, the FIDO Alliance aims to do away with passwords altogether through the introduction of standards-compliant "authenticators" including USB security dongles, fingerprint readers, Trusted Platform Modules (TPMs) and more.

    While the organisation's standards, which were updated with the launch of FIDO2 in 2018, have enjoyed adoption in the majority of web browsers and with a range of companies, they're still seen as unusual and even inconvenient compared to the good ol' username and password combo – which is where the new UX guidelines come in.

    Continue reading
  • UK's Vodafone network runs trials on standalone 5G in London, Manchester and Cardiff

    These are networks that are not dragged down by LTE core

    Vodafone has launched 5G SA (Standalone) trials in London, Manchester, and Cardiff in its largest test of the technology yet.

    The commercial launch has allowed the carrier to experiment with new ways to commercialise its network, including network slicing – where a portion of network is dedicated to a specific customer for their exclusive use. It will also allow customers to test 5G SA devices on a live, public network.

    Vodafone selected Ericsson's dual-mode 5G core network as the dedicated provider for this trial. It follows trials at Coventry University in 2020, and a separate trial in Spain.

    Continue reading
  • What you need to know about Microsoft Windows 11: It will run Android apps

    The operating system they said shouldn't exist

    Microsoft on Thursday announced Windows 11, or tried to as an uncooperative video stream left many viewers of the virtual event flummoxed by intermittent transmission gaps in the opening minutes.

    The technical issues proved bad enough that Matt Velloso, Technical Advisor to the CEO at Microsoft, suggested trying the YouTube video stream as an alternative to the Microsoft-hosted one.

    But with some of the features already known as a result of a leaked build last week, the impact of the intermittent video dropouts was less than it might have been.

    Continue reading
  • Russia spoofed AIS data to fake British warship's course days before Crimea guns showdown

    Great powers clash while the rest of us sigh and tut at data feed meddling

    Russia was back up to its age-old spoofing of GPS tracks earlier this week before a showdown between British destroyer HMS Defender and coastguard ships near occupied Crimea in the Black Sea.

    Yesterday Defender briefly sailed through Ukrainian waters, triggering the Russian Navy and coastguard into sending patrol boats and anti-shipping aircraft to buzz the British warship in a fruitless effort to divert her away from occupied Crimea's waters.

    Russia invaded Ukraine in 2014 and has occupied parts of the region, mostly in the Crimean peninsula, ever since. The UK and other NATO allies do not recognise Ukraine as enemy-held territory so Defender was sailing through an ally's waters – and doing so through a published traffic separation scheme (similar to the TSS in the English Channel), as Defence Secretary Ben Wallace confirmed this afternoon.*

    Continue reading
  • Lego bricks, upcycled iPhone lenses used in new low-cost, high-res microscope

    Full instructions given away for free, to 'nurture natural curiosity'

    A trio of boffins at the Georg August University Göttingen and Münster University have put together a low-cost yet high-resolution microscope for educational users – using smartphone parts and Lego bricks.

    "An understanding of science is crucial for decision-making and brings many benefits in everyday life, such as problem-solving and creativity," said Timo Betz, professor at the University of Göttingen and co-author of the paper detailing the project. “Yet we find that many people, even politicians, feel excluded or do not have the opportunities to engage in scientific or critical thinking.

    "We wanted to find a way to nurture natural curiosity, help people grasp fundamental principles and see the potential of science."

    Continue reading
  • Romance in 2021: Using creepware to keep tabs on your partner or ex. Aww

    With this app, I thee stalk

    Online stalking appears to be as much a part of modern relationships as lovingly sharing a single spoon and dessert in a dimly lit restaurant or arguing over who should put out the bins.

    That's just one of the conclusions from antivirus merchant Norton's latest look at online trends which found that nearly one in 10 people in the US admit to using stalkerware or creepware to keep tabs on a partner.

    What's more, the threat of cyber snooping works both ways, with those involved in relationships increasingly resigned to the fact that their significant other might be stalking them – either now or in the future.

    Continue reading
  • Report picks holes in the Linux kernel release signing process

    Security procedures need documenting, improving, and mandating - though they're better than they used to be

    A report looking into the security of the Linux kernel's release signing process has highlighted a range of areas for improvement, from failing to mandate the use of hardware security keys for authentication to use of static keys for SSH access.

    The Linux kernel is at the heart of a wealth of modern technology, from embedded gadgets and network equipment all the way up to supercomputers. Its broad deployment makes it a tempting target for ne'er-do-wells, as was made all-too-obvious in 2011 when attackers gained root access to key servers used in its development and distribution.

    In response to that breach, traced back to a Trojan installed on a developer's personal machine which gave the attackers complete control over the affected servers for the 17 days before it was detected, a new release signing process was introduced. The idea: to minimise the trust placed in any given part of the Linux development infrastructure.

    Continue reading
  • British minister claims technology makes maritime cannibalism obsolete

    Even in a shipboard COVID lockdown, chowing down on ailing cabin boys is apparently no longer a thing

    A British government minister has claimed that cannibalism on the high seas should now be a thing of the past, as modern navigation and safety technology have made it very unlikely sailors will find themselves in circumstances where they might want to eat each other.

    This hopeful statement came during a debate in the House of Lords on human rights at sea when Baron Mackenzie of Framwellgate stood to ask a question of Charlotte, Baroness Vere of Norbiton, the Conservative government's Parliamentary Under-Secretary of State for Transport.

    The debate had begun with Baroness Vere answering questions about the government's policy regarding the many merchant sailors worldwide who found themselves stuck on vessels thousands of miles from home, sometimes without pay or current contracts, due to the effects of the COVID pandemic.

    Continue reading
  • In our digital future, IT is really all about experience

    Time to focus on people, not just SLAs

    Sponsored Experience is everything when it comes to delivering IT-enabled products and services. But it’s no longer about how many deadlines your team smashed, how often you’d exceeded service-level agreements (SLAs), or how many lines of code you’ve spat out.

    Rather it’s about how the services and products you deliver impact the rest of the organisation’s ability to do their jobs, increase productivity, deliver customer satisfaction and co-create value.

    “Experience” may be seen as subjective, even ephemeral, compared to the traditional IT metrics, deadlines and SLAs. But if you want proof of its importance, consider how ITIL® 4, the latest revision of the best practice framework for service management from AXELOS, focuses on improving user experience of digital services and how this enhances productivity right across the organisation.

    Continue reading

Biting the hand that feeds IT © 1998–2021