Study employs large language models to sniff out their own bloopers

What if it's LLMs all the way down?

Researchers in computing and linguistics have devised a new way to detect errors in large language models, which relies on employing more LLMs.

Applying statistical machine learning to languages at an ever-increasing scale has become in vogue with tech vendors and investors alike, but it is well known that such language models are prone to errors. In the much-hyped world of LLMs, shortcomings that might be deemed malfunctions in other contexts are euphemistically called "hallucinations."

Generating content that is either inaccurate or nonsensical nonetheless troubles the industry, so the race is on to detect them with more accuracy and presumably try to eliminate them.

This week, in the UK science journal Nature, Oxford University and DeepMind researcher Sebastian Farquhar and colleagues proposed a method for quantifying the degree of hallucinations generated by an LLM while also showing how correct the generated content might be.

The study sought to address a subset of hallucinations known as confabulations, output that can be inaccurate and arbitrary, which the researchers say is down to a "lack of knowledge."

The researchers suggest their approach can find confabulations in LLM-generated biographies and in answers to questions on trivia, general knowledge, and life sciences.

In an accompanying article, Karin Verspoor, computing technologies professor at Australia's RMIT University said: "There is much debate about whether the models actually capture meaning or understand language in any epistemological or cognitive sense given that they lack any awareness of communicative intent or connection to real world objects and impacts.

"However, it is clear that these models perform well in a range of complex language-processing tasks that involve some comprehension."

The research team sought to exploit an LLM's performance on one of these tasks to detect hallucinations in another LLM. Textual entailment is a way of saying one statement can be inferred from another. So, saying "Pat purchased a car" also means "Pat owns a car" but not necessarily that "Pat rode in a car." The Oxford team's approach used LLMs' ability to recognize entailment as a way of spotting confabulations in another LLM.

But they didn't stop there. They also employed a third LLM to validate the findings of the second.

"Our probabilistic approach, accounting for semantic equivalence, detects an important class of hallucinations: those that are caused by a lack of LLM knowledge," the paper says. "These are a substantial portion of the failures at present and will continue even as models grow in capabilities because situations and cases that humans cannot reliably supervise will persist. Confabulations are a particularly noteworthy failure mode for question answering but appear in other domains too."

The researchers suggest that the findings might help people improve LLM performance by tailoring prompts, the supposedly "natural" way users query or instruct an LLM.

"By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability," the paper adds.

Verspoor agreed the approach might be useful for detecting hallucinations in LLMs and other nefarious output, such as misinformation or plagiarism. But she warned that overreliance on LLMs might lead to further challenges.

"Researchers will need to grapple with the issue of whether this approach is truly controlling the output of LLMs, or inadvertently fueling the fire by layering multiple systems that are prone to hallucinations and unpredictable error," she said. ®

More about


Send us news

Other stories you might like