Artificial intelligent software has been trained to detect and flag up clickbait headlines.
And here at El Reg we say thank God Larry Wall for that. What the internet needs right now is software to highlight and expunge dodgy article titles about space alien immigrants, faked moon landings, and the like.
Machine-learning eggheads continue to push the boundaries of natural language processing, and have crafted a model that can, supposedly, detect how clickbait-y a headline really is.
The system uses a convolutional neural network that converts the words in a submitted article title into vectors. These numbers are fed into a long-short-term memory network that spits out a score based on the headline's clickbait strength. About eight times out of ten it agreed with humans on whether a title was clickbaity or not, we're told.
The trouble is, what exactly is a clickbait headline? It's a tough question. The AI's team – from the International Institute of Information Technology in Hyderabad, the Manipal Institute of Technology, and Birla Institute of Technology, in India – decided to rely on the venerable Merriam-Webster dictionary to define clickbait.
“Merriam-Webster defines clickbait as something (such as a headline) to encourage readers to click on hyperlinks based on snippets of information accompanying it, especially when those links lead to content of dubious value or interest,” their paper states.
"It is built to create and consequently capitalise on the Loewenstein information gap by purposefully misrepresenting or promising what can be expected while reading a story on the web, be it through a headline, image or related text."
Capitalizing the "Loewenstein information gap" is a fancy way of saying exploiting someone's curiosity – the gap between what someone knows and what they want to know.
People are more likely to consume information if they want to know more about a topic they are even somewhat interested in, and believe that a particular article or book or documentary will give them the answers. Clickbait therefore, according to this study, leads readers into believing that they will learn more about something, when really the content is pretty shallow.
Big bunch of flocking twits
The researchers trained their model on a dataset containing 19,538 tweets collected from Twitter. Each tweet contained a headline and link to its article, and was accompanied by a score quantifying how sensationalized the title was in respect to the article copy. These scores were produced by another team of researchers: their paper, Crowdsourcing a Large Corpus of Clickbait on Twitter, is expected to be presented at the International Conference on Computational Linguistics (COLING) in a few weeks.
All but 2,538 of the scored posts were fed into the model to train it. The remaining tweets were used to test its ability, by running them through the AI and comparing the predicted clickbait score against their human-assigned score to see if they matched. The model managed to achieve 83.49 per cent accuracy.
Uh oh! Here's yet more AI that creates creepy fake talking headsREAD MORE
Unfortunately, the paper doesn’t share too many details on what patterns the model picked up, such as which combination of words were more likely to be identified as clickbait. However, Vaibhav Kumar, a coauthor of the study, told The Register words with some sort of shock factor tended to trigger a clickbait rating. "We found out that words that have a bit of a surprise factor were the ones which played a major role in detecting clickbait," Kumar said.
Here's one example he gave us: “Hot phones that people are talking about.” It's likely that the words "hot" and "people are talking about" were often used to entice readers, and presumably fall short.
He said that the model could be overhauled to generate headlines, too: "The data that we used can actually be used to create 'clickbaity' news headlines. The basics of our approach can be used to generate headlines, but we would have to completely change the deep learning model that we used.
"In my opinion, such a thing should not be used in traditional news. But for blogs or articles, which are just meant for light reading and fun – without any important news in them – this could be used. If we use such a thing for traditional news, we would be robbing people of what they actually desire."
Herein lies the rub: no one agrees on what is clickbait.
Some people think clickbait is stuff that doesn't live up to the promise – a headline that exaggerates or is flat out wrong compared to the text or has nothing to do with the underlying article. Others say it's anything racy or emotionally exploitative. Some people write off articles they disagree with as clickbait. We don't know yet where the tweet-scoring humans stand on this.
Here at El Reg we can't see the point of any headline that, while accurate, isn't trying to bait you into clicking on it. Honestly, who the heck writes a headline no one wants to click on, anyway? Wait, we can think of a few publications... ®