OpenAI develops AI model to critique its AI models

When your chatbots outshine their human trainers, you could pay for expertise ... or just augment your crowdsourced workforce

To help catch code errors made by ChatGPT, OpenAI uses human AI trainers in the hope of improving the model. To help the human trainers, OpenAI has developed another AI model called CriticGPT – in case the humans don't spot the mistakes.

The Microsoft-championed super lab on Thursday issued a paper [PDF] titled, "LLM Critics Help Catch LLM Bugs," that explains the approach.

Generative AI models like GPT-4o get trained on massive amounts of data and then go through a refinement process called Reinforcement Learning from Human Feedback (RLHF).

This commonly involves human workers, often hired through crowdsourcing platforms, interacting with models and annotating their responses to various questions. When Time Magazine looked into this last year, it found OpenAI using Kenyan workers paid less than $2 per hour to improve its models.

The goal is to teach the model which answer is preferred, so it performs better. But RLHF becomes less effective as models become more capable. Human AI trainers find it harder to identify flawed answers, particularly when the chatbot reaches the point that it knows more than its teachers.

So as an aid to the people tasked with providing feedback to make its models more capable of generating programming code, OpenAI created another model – to critique those generative responses.

"We've trained a model, based on GPT-4, called CriticGPT, to catch errors in ChatGPT's code output," the AI startup explained in a blog post. "We found that when people get help from CriticGPT to review ChatGPT code they outperform those without help 60 percent of the time."

Screenshot of diagram from OpenAI paper, LLM Critics Help Catch LLM Bugs.

Screenshot of diagram from OpenAI paper, "LLM Critics Help Catch LLM Bugs" – Click to enlarge

In other words, this isn't an autonomous feedback loop from one chatbot to another – it's a way to augment the knowledge of those administering reinforcement learning.

This approach apparently leads to better results than just relying on crowdsourced workers – who at $2 per hour probably aren't computer science professors or trenchant technical writers, or whatever the prevailing annotation rate happens to be.

According to the paper, the results show "that LLMs catch substantially more inserted bugs than qualified humans paid for code review, and further that model critiques are preferred over human critiques more than 80 percent of the time."

The finding that CriticGPT enables AI trainers to write better model response critiques isn't entirely surprising. Mediocre office temps presumably would write better crafted email messages with the help of generative AI too.

But AI help comes with a cost. When human contractors work in conjunction with CriticGPT, the resulting critiques of ChatGPT responses have a lower rate of hallucinations (invented bugs) than CriticGPT responses alone – but that error rate is still higher than if a human AI trainer had been left to respond without AI assistance.

"Unfortunately, it's not obvious what the right tradeoff between hallucinations and bug detection is for an overall RLHF system that uses critiques to enhance model performance," the paper concedes. ®

And speaking of Microsoft-backed things, a study has demonstrated that the Windows giant's Bing translation and web search engine in China censors more aggressively than its Chinese competitors. 谢谢, Redmond!

More about


Send us news

Other stories you might like