This article is more than 1 year old
Google's troll-destroying AI can't cope with typos
Hay, Erik Sch;midt, you sukc!1
Google's Perspective API, created in conjunction with Alphabet incubee Jigsaw, is supposed to provide an automated way to detect "toxic" language in social media.
But researchers with the University of Washington – Hossein Hosseini, Sreeram Kannan, Baosen Zhang and Radha Poovendran – have found that the machine-learning software can be duped by willful errors.
"Through different experiments, we show that an adversary can deceive the system by misspelling the abusive words or by adding punctuations between the letters," the four academics wrote in their recently published paper, "Deceiving Google’s Perspective API Built for Detecting Toxic Comments."
The API is intended to allow digital publishers to assess the sentiment expressed in online posts in real time. Words get sent to a server for analysis and scores get returned, ideally allowing publishers to detect trolls as they type and to take the necessary steps to keep online interaction civil.
That's the theory. In practice, just like other automated content filtering, malware detection, and spam detection systems, Perspective can be fooled.
As an example, Perspective scored the phrase "Anyone who voted for Trump is a moron" with a toxicity value of 80 per cent. When the phrase was altered to read "Anyone who voted for Trump is a mo.ron," the toxicity score dropped to 13 per cent.
Approached with adversarial intent, the algorithm can also coerced into producing false positives, rating phrases as offensive when they are harmless.
Further weakening the system, the Perspective interface allows users to provide feedback on toxicity scores, which opens the door to model poisoning attacks – insisting that offensive words are fine or vice versa.
The algorithm's shortcomings were previously noted by software engineer and writer David Auerbach.
Even Google acknowledges its AI is a bit dim. In the the Perspective GitHub repository, company explicitly advises against using the software for automated moderation because "the models make too many errors." The project is intended to be improved in order that it may one day help community moderators identify the most objectionable discussion participants.
Poovendran, chair of the UW electrical engineering department and director of the UW Network Security Lab, in an email to The Register confirmed that Perspective is more of an assistive technology than an system capable of intelligent decisions. "In some tasks, a human in the loop is needed," he said.
As the researchers state in their paper, "Machine learning models are generally designed to yield the best performance on clean data and in benign settings."
Out in the real world, hygiene and benevolence cannot be assumed. Machine learning has a lot more learning ahead. ®