This article is more than 1 year old

Boffins build AI that can detect cyber-abuse – and if you don't believe us, YOU CAN *%**#* *&**%* #** OFF

Alternatively, you can try to overpower it with your incredibly amazing sarcasm

Trolls, morons, and bots plaster toxic crap all over Twitter and other antisocial networks. Can machine learning help clean it up?

A team of computer scientists spanning the globe think so. They've built a neural network that can seemingly classify tweets into four different categories: normal, aggressor, spam, and bully – aggressor being a deliberately harmful, derogatory, or offensive tweet; and bully being a belittling or hostile message. The aim is to create a system that can filter out aggressive and bullying tweets, delete spam, and allow normal tweets through. Pretty straight forward.

The boffins admit it's difficult to draw a line between so-called cyber-aggression and cyber-bullying. And the line between normal and aggressive tweets is often blurred: after all, people enjoy ranting about things, from feminism to Brexit to tabs-versus-spaces. Having said that, the goal was to craft a system that can automatically and fairly – and by fairly, we mean consistently – draw a line between each category.

After analyzing more than two million tweets that discussed touchy topics such as Gamergate and gender pay inequality at the BBC, as well as more neutral matters like the NBA, the eggheads selected a sample containing 9,484 tweets, and hand labelled them as normal, aggressor, spam, and bully. Obviously, this means the academics' definition of what is aggressive or bullying forms the basis of the model.

About 80 per cent of these tweets were used to train the recurrent neural network, and the remaining 20 or so per cent was used to test it, according to one of the scientists: Jeremy Blackburn, an assistant computer science professor at Binghamton University in New York. We're told the code could sort the test tweets into the four categories with over 80 per cent accuracy. That is to say, 8 of 10 times, the AI would categorize a tweet as expected by the human boffins.

Their research was published in the journal ACM Transactions on the Web – here's the Arxiv version [PDF]. While it's not revolutionary, it's a good introduction text analysis – and don't forget, it is an academic study.

Cyber-bullying is 'dehumanizing'

The neural network analyses not just the content of a tweet but also the tweeter's profile, and the rate at which they tweeted. All the words are encoded as vectors, and various algorithms were used to determine the overall sentiment or emotion of the message and its sender, and whether or not any curse words were used, plus how many hashtags were included, and the number of followers someone has. All of this information is fed into the network so it can predict the category its human masters would have assigned the tweet.

It’s not easy to pinpoint which features are more indicative of online harassment, Blackburn told El Reg on Monday. “It is not straight forward to describe this because we are differentiating between several categories," he said. "For example, we saw that bullies and aggressors had less time between their tweets than spammers, but similar to spammers, bullies used fewer adjectives than normal users and aggressors.”

Aggressors and bullies were more likely to tweet multiple times and use more hashtags than spammers and normal accounts. Bullies tend to harm others by directing their messages at specific people, whereas aggressors were more likely insult groups of folks. Spammers, on the other hand, are less likely to use abusive language and tend to sell things like smut pics and videos.

“Normal users tend to discuss a variety of topics, such as political and social issues, whereas bully users seem to organize their attacks against important and sensitive issues, such as feminism, religion, and pedophiles, using aggressive and in some cases insulting language. Aggressive users express their negativity on popular topics, such as the ‘brexit’ case, ‘maga’, and the spread of the Zika virus. Spammers typically post inappropriate content in an effort to gain more followers or attract victims to malicious sites of questionable or malicious content,” the team wrote in their paper.

Blackburn said cyberbullying has only recently been taken seriously. “This type of behavior is often dehumanizing, which most rational people would consider a bad thing," he said. "From a more pragmatic point of view, the intensity of this behavior, enabled by the scale of the Web and social media, can and has led to acts of real world violence.

Sebastian Riedel

'I radically update my course module almost every year to keep up with the rate of change'


“I think that efforts are being made, but that much more can be done. There are difficult decisions for social media companies to consider. For example, some of the biggest offenders also have large followings on social media; simply silencing these people can have unforeseen consequences.”

The researchers hope to use their model on other platforms, such as YouTube or Facebook, though there are other challenges when using machine learning to tackle hate speech and harassment online.

In previous studies, computer scientists discovered that abusive messages bypassed classifiers if the text contained spelling mistakes. Not all rude comments are hateful, either.

“Understanding sarcasm is difficult, especially because different people and cultures express it in different ways,” noted Blackburn. "We did not examine sarcasm in particular, however our algorithm is designed to be extensible, and we intend to add more linguistic features to it in the future." ®

More about


Send us news

Other stories you might like