Why machine-learning chatbots find it difficult to respond to idioms, metaphors, rhetorical questions, sarcasm

'Understanding the meaning of such expressions relies on shared cultural and commonsense cues' ... which machines lack, new study details


Unlike most humans, AI chatbots struggle to respond appropriately in text-based conversations when faced with idioms, metaphors, rhetorical questions, and sarcasm.

Small talk can be difficult for machines. Although language models can write sentences that are grammatically correct, they aren’t very good at coping with subtle nuances in communication. Humans have much more experience in social interactions, and use all sorts of cues from facial expressions and vocal tones to body language to understand intent. Chatbots, however, have limited contextual knowledge and relationships between words are reduced to numbers and mathematical operations.

Not only is figurative speech challenging for algorithms to parse, things like idioms and similes aren’t used often in speech. They don’t appear in training datasets as much, meaning chatbots are less likely to learn common expressions, Harsh Jhamtani, a PhD student at Carnegie Mellon and first author of a research paper being presented at the 2021 Conference on Empirical Methods in Natural Language Processing this week, explained to The Register.

“A key challenge is that such expressions are often non-compositional compared to simpler expressions. For example, you may be able to approximate the 'meaning' of the expression 'white car' by relying on the ‘meaning’ of 'white' and 'car'," he said.

"But the same doesn't hold true for idioms and metaphors. The meaning of 'piece of cake', [describing] something that is easy to do, might be difficult to approximate given that you know the meaning of 'piece' and 'cake'. Often understanding the meaning of such expressions relies on shared cultural and commonsense cues.”

Jhamtani and his colleagues experimented with five machine-learning systems with different architectures, from seq2seq to OpenAI’s old GPT-2 model. They picked out conversations containing metaphors, idioms, rhetorical questions, and hyperbole from the DailyDialog dataset, ran the inputs through all of the models, and ranked the appropriateness of their replies.

They found the models' performance dropped between 10 and 20 per cent compared to when the chatbots responded to general straightforward chitchat. In one funny example, when faced with the slightly grammatically incorrect line “maybe we can get together sometime if you are not scare of a 30 year old cougar!” GPT-2 replied, “i’m not scared of any cats. i’ve two dogs.”

When the academics changed the input to be taken more literally to “maybe we can start dating sometime if you are not scare of a 30 year old cougar," the model responded with “that’s a very interesting idea. i’ve never met one,” which is more appropriate.

Unfortunately, the research only shows how and why machines don’t really understand figurative problems. Solving the issue is a different challenge altogether.

“In our paper, we explore some simple mitigation techniques that utilize existing dictionaries to find literal equivalents of figurative expressions,” Jhamtani said. Swapping ‘get together’ to 'dating', for example, in the input may force a model to generate a better output but it doesn’t teach it to learn the meaning of the expression.

“Effectively handling figurative language is still an open research question that needs more effort to solve. Experiments with even bigger models are part of potential future explorations,” he concluded. ®

Broader topics


Other stories you might like

  • Prisons transcribe private phone calls with inmates using speech-to-text AI

    Plus: A drug designed by machine learning algorithms to treat liver disease reaches human clinical trials and more

    In brief Prisons around the US are installing AI speech-to-text models to automatically transcribe conversations with inmates during their phone calls.

    A series of contracts and emails from eight different states revealed how Verus, an AI application developed by LEO Technologies and based on a speech-to-text system offered by Amazon, was used to eavesdrop on prisoners’ phone calls.

    In a sales pitch, LEO’s CEO James Sexton told officials working for a jail in Cook County, Illinois, that one of its customers in Calhoun County, Alabama, uses the software to protect prisons from getting sued, according to an investigation by the Thomson Reuters Foundation.

    Continue reading
  • Battlefield 2042: Please don't be the death knell of the franchise, please don't be the death knell of the franchise

    Another terrible launch, but DICE is already working on improvements

    The RPG Greetings, traveller, and welcome back to The Register Plays Games, our monthly gaming column. Since the last edition on New World, we hit level cap and the "endgame". Around this time, item duping exploits became rife and every attempt Amazon Games made to fix it just broke something else. The post-level 60 "watermark" system for gear drops is also infuriating and tedious, but not something we were able to address in the column. So bear these things in mind if you were ever tempted. On that note, it's time to look at another newly released shit show – Battlefield 2042.

    I wanted to love Battlefield 2042, I really did. After the bum note of the first-person shooter (FPS) franchise's return to Second World War theatres with Battlefield V (2018), I stupidly assumed the next entry from EA-owned Swedish developer DICE would be a return to form. I was wrong.

    The multiplayer military FPS market is dominated by two forces: Activision's Call of Duty (COD) series and EA's Battlefield. Fans of each franchise are loyal to the point of zealotry with little crossover between player bases. Here's where I stand: COD jumped the shark with Modern Warfare 2 in 2009. It's flip-flopped from WW2 to present-day combat and back again, tried sci-fi, and even the Battle Royale trend with the free-to-play Call of Duty: Warzone (2020), which has been thoroughly ruined by hackers and developer inaction.

    Continue reading
  • American diplomats' iPhones reportedly compromised by NSO Group intrusion software

    Reuters claims nine State Department employees outside the US had their devices hacked

    The Apple iPhones of at least nine US State Department officials were compromised by an unidentified entity using NSO Group's Pegasus spyware, according to a report published Friday by Reuters.

    NSO Group in an email to The Register said it has blocked an unnamed customers' access to its system upon receiving an inquiry about the incident but has yet to confirm whether its software was involved.

    "Once the inquiry was received, and before any investigation under our compliance policy, we have decided to immediately terminate relevant customers’ access to the system, due to the severity of the allegations," an NSO spokesperson told The Register in an email. "To this point, we haven’t received any information nor the phone numbers, nor any indication that NSO’s tools were used in this case."

    Continue reading

Biting the hand that feeds IT © 1998–2021