YouTube algorithms mistake sparring robots for animal cruelty, gamers snooped on via Xbox AI, and more
Including another, larger GPT-2 reveal
Roundup Let's kick off this week – a four-day week for the UK – with some recent AI-related news beyond what we've already covered.
Don’t watch robots fighting with other bots, because it’s, er, cruel? YouTube temporarily removed videos of robots engaged in battle after its algorithms bizarrely classified the footage as animal-cruelty material.
Under YouTube’s community guidelines, content that shows “unnecessary suffering or harm deliberately causing an animal distress,” or scenes where “animals are encouraged or coerced to fight by humans,” are not allowed on the video-sharing platform. That’s all fine and dandy, however, in the videos YouTube recently removed, there were no animals, only machines.
YouTubers such as Maker’s Muse blamed the kerfuffle on the web giant's algorithms, as first reported by Vice. Engineers participating in Battlebots, a robot-fighting American TV show, had their videos removed. Some of the titles of their videos did contain names of animals, they noted.
The videos have since been restored after YouTube admitted it had wrongly flagged them down.
In other news, 210 accounts were removed for spewing Chinese propaganda amid political protests in Hong Kong. This comes after Twitter discovered and removed 936 accounts made in China that they believe are also part of the Chinese government’s attempt to “sow political discord in Hong Kong.”
The third largest GPT-2 model is out: Remember OpenAI’s gigantic language model that was deemed too dangerous to release? Well, six months later, the organization has just published another GPT-2 model, this one with 774 million parameters – that's fewer than the full version with more than one billion parameters.
Join us at MCubed to find out how machine learning can fight fraudsters, cut pager noise, and much moreREAD MORE
The 774m model has been shared with four American universities so far. OpenAI is collaborating with Cornell University to study how humans can be fooled by machine-generated text; the Middlebury Institute of International Studies is exploring how GPT-2 can misused by terrorists online; the University of Oregon is studying biases within the system; and the University of Texas at Austin is analyzing how the model behaves when trained using specific datasets and in different languages.
The model is wrapped up in a legal agreement that forbids boffins from sub-licensing the technology. They also aren’t allowed to sell, lend, rent, lease, transfer, or grant any rights in or to all or any portion of the GPT-2 to anyone else, and nor can they use it for commercial purposes. Any GPT-2 research that a licensee wants to publish will be reviewed by OpenAI beforehand, too.
Meanwhile, if you want a version that contains about 1.5 billion parameters, look no further than a pair of engineers who have attempted to replicate OpenAI's largest model and have published their code online. The only downside, however, is that to use it, you'll need about $50,000 to pay for the cloud compute, say, needed to train it.
You don’t need to waste all your money on an army of GPUs: A British deep-learning outfit has managed to train a popular computer vision model on the CIFAR-10 image dataset in just 26 seconds using just one graphics accelerator. AI engineers are constantly looking for ways to speed up the training process. The easiest way to do this, is to just throw more hardware at the problem. However, that’s expensive, so some opt for cheaper software tricks.
Myrtle.ai, an outfit based in Cambridge, has opted for the latter strategy. This month, chief scientist David Page claimed his biz achieved a time of one minute and 15 seconds when training a nine-layer ResNet model on the CIFAR-10 model using a single Nvidia V100 GPU.
When Page’s results were compared to other groups, who also had a crack at whizzing through CIFAR-10 as part of the DAWNBench competition, Myrtle.ai made it to sixth place. Not too bad, considering that most of the submissions use multiple GPUs. Since that November 2018 submission, however, Myrtle.ai has managed to cut that time to just 26 seconds, apparently. The trick is to employ techniques that minimize the time spent shuttling data and instructions to and from the GPU and CPU during training.
Microsoft’s Cortana was listening to you when you play Xbox: Here's yet another reminder that when technology giants say their AI-powered voice-controlled smart assistants are listening to you, they mean human handlers are listening to you as well as software.
Just like Apple, Amazon, Facebook, and others, Microsoft offers a virtual helper that is always listening to you, waiting for a wake word to activate and take orders – in Redmond's case, it's Cortana and it exists in various forms and systems, from Windows 10 laptops to Xbox One consoles.
Contractors employed by Microsoft were, up until very recently, transcribing audio clips collected by Cortana from the gaming boxes: whenever the on-board software needed help in understanding voices picked up around it, snippets would be passed to human freelancers to transcribe and feed into the system to improve it. Those recordings can, obviously, feature private conversations and intimate moments; just use your imagination.
In fact, these contractors have been listening to gamers even before the console family gained the voice-activated Cortana, according to Motherboard. Some began working as early as 2014, where voice commands could be recorded by the Kinect system.
“We’ve long been clear that we collect voice data to improve voice-enabled services and that this data is sometimes reviewed by vendors. We’ve recently updated our privacy statement to add greater clarity that people sometimes review this data as part of the product improvement process,” a Microsoft spokesperson told The Register.
Microsoft still, to the best of our knowledge, does not explicitly state that it is recording your audio and passing it to actual humans to review. This is because, we reckon, if people, politicians, and regulators discovered tech giants were actively bugging everyone's homes, everyone would freak out.
The Redmond spokesperson also said the biz “stopped reviewing any voice content taken through Xbox for product improvement purposes a number of months ago, as we no longer felt it was necessary, and we have no plans to re-start those reviews. We occasionally review a low volume of voice recordings sent from one Xbox user to another when there are reports that a recording violated our terms of service and we need to investigate. This is done to keep the Xbox community safe and is clearly stated in our Xbox terms of service.” ®