Updated As if the internet isn’t already a complicated cesspool full of trolls, AI engineers have gone one step further to build a machine learning model that can generate fake comments for news articles.
The eyebrow-raising creation, known as DeepCom, was developed by a group of engineers at Beihang University and Microsoft, China. “Automatic news comment generation is beneficial for real applications but has not attracted enough attention from the research community,” they said in a paper released on arXiv late last month.
Allowing readers to post comments under articles keeps them engaged, they argued. Open dialogue allows people to discuss their opinions and share new information. It’s good for publishers too since comments also boost people’s attention and encourages page views.
So fake AI-generated comments can only be a good thing, right? “Such systems can enable commenting service for a news website from cold start, enhance the reading experience for less commented news articles, and enrich skill lists of other artificial intelligence applications, such as chatbots,” they said.
The paper didn’t mention any potential malicious applications of the technology, however, when there are obviously many potential downsides. For example, oppressive regimes could implement such a model to automatically dump a load of fake drivel to drive propaganda. The comments could also kickstart toxic arguments between bots and humans to sow discord and misinformation. Perhaps miscreants might even use it as a way to advertise products or post spam. It's a trolling machine, basically.
In the meantime, it looks like the research has been accepted to EMNLP-IJCNLP, a top natural language processing conference, to be held in Hong Kong later this year.
“A paper by Beijing researchers presents a new machine learning technique whose main uses seem to be trolling and disinformation...Cool, cool, cool,” Arvind Narayanan, an associate computer science professor at Princeton University, said on Twitter.
A reading and generation network
DeepCom employs two recurrent neural networks: a reading network and a generating network. All the words are encoded as vectors for the reading network to analyse. The model is split into various layers that processes different parts of an article, starting with its headline and then the contents, in order to analyse and predict what parts of the story are particularly important or interesting.
These predictions are then passed onto a generation network. Here, the model crafts responses focusing on a particular topic or person of interest in the article, and decodes what it has generated back into words to form the comments.
Why build your own cancer-sniffing neural network when this 1.3 exaflop supercomputer can do if for you?READ MORE
DeepCom’s performance depends on two things: how well the reader network identifies what’s worth talking about from the story, and how well the generator creates comments. The researchers trained the model on a Chinese language dataset that scraped millions of real human comments posted on articles online, and on a English language dataset taken from Yahoo! News.
To train the reading network, the researchers calculated how much the comments in the training data overlapped with information in the corresponding article to identify what parts of the were important. For example, if the article is a film review and the comments are discussing a particular actress or actor then the reading network should pick out the right name of the actress or actor. When that information is passed onto the generation network, the model will then write comments about the said actress or actor.
Here’s an example of one of DeepCom’s fake comments. The response is short and talks about basketball.
The red highlights what the reading network deems important. The blue highlights what the generation network is commenting about. Image credit: Li et al.
Although the idea of DeepCom is concerning, it’s probably not sophisticated enough yet to cause much harm. The comments it generates are short – on the order of tens of words – and aren’t complex enough to incite much reaction. The Register has asked the researchers and Microsoft for comment. ®
Updated to add
The eggheads have now amended their paper to acknowledge that, maybe, you know, people could use this technology for bad stuff.
"We are aware that numerous uses of these techniques can pose ethical issues and that best practices will be necessary for guiding applications. In particular, we note that people expect comments on news to be made by people. Thus, there is a risk that people and organizations could use these techniques at scale to feign comments coming from people for purposes of political manipulation or persuasion," reads one new section.
"While there are risks with this kind of AI research, we believe that developing and demonstrating such techniques is important for understanding valuable and potentially troubling applications of the technology," reads another amendment.
The new paper is dated October 1, and can be found here.
A spokesperson for Microsoft told us the September 26 version of the paper on arXiv was "a draft," adding: "The final version is being published, but it takes roughly 24 hours to update. In the meantime we have published the final version on our Microsoft page." That version, we note, is dated November 2019.