ChatGPT becomes ChatRepair to automate bug fixing for less
How much, you ask? The answer to everything is $0.42
Boffins at the University of Illinois Urbana-Champaign have enlisted ChatGPT, the OpenAI chatbot that responds to written instructions, to repair software bugs without breaking the bank.
Chunqiu Steven Xia, graduate research assistant, and Lingming Zhang, an associate professor of computer science, give away the surprise ending in the title of their recent preprint paper, "Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT."
You could certainly pay more.
The two researchers set out to improve Automated Program Repair (APR), an emerging discipline focused on developing techniques for fixing programming bugs automatically.
Traditional APR techniques, they explain in their paper, tend to produce patches that lack variety and require lots of manual fine-tuning. More recent work with LLMs has produced better results but still used the same underlying technique - generating a lot of patches from an initial input sample and then validating each one.
This approach, they contend, produces repeated incorrect patches and fails to learn from its failures. And this has a very real cost in terms of time and computational resources.
Xia and Zhang have developed an automated bug repair process they call ChatRepair that incorporates information about software test failures and that learns from conversational input, as well as successes and failures, along the way. It's a bit more complicated than saying, "Fix your bugs, HAL" – as can be seen from the illustration accompanying the paper – but it's perhaps preferable to a cryptic error message.
"Instead of directly generating patches based on the buggy code as existing LLM-based APR techniques do, ChatRepair additionally provides valuable test failure information to further assist LLMs in patch generation," the boffins explain in their paper.
"Moreover, instead of continuously sampling from the same prompt as prior LLM-based APR techniques do, ChatRepair keeps track of conversation history and further learns from earlier failed and succeeded patching attempts of the same bug via prompting."
Code may be wrong, but it's useful
By incorporating incorrect patches with related test failure data, Xia and Zhang showed they could refine the prompts fed to ChatRepair as the model worked out code improvements. This avoids making the same mistakes over and over while also generating variations on plausible patches that increase the likelihood of a correct fix.
In an email to The Register – not written by ChatGPT, we're assured – Xia said that the inclusion of test failure data contributed significantly to the improvement of ChatRepair.
"We observed that including useful information such as test failure error or even the failing test name itself can provide additional information like the type of bug (e.g. null pointer exception) and the expected correct behavior of the code," he explained. "Compared to prior APR tools that do not make use of such test failure information, ChatRepair leverages the powerful understanding ability of ChatGPT to fix more bugs."
- ChatGPT is coming for your jobs – the terrible ones, at least
- Euro privacy regulators sniff Italy's ChatGPT ban, consider a pizza the action
- FTC urged to freeze OpenAI's 'biased, deceptive' GPT-4
- So you want to integrate OpenAI's bot. Here's how that worked for software security scanner Socket
Xia said the value of incorporating this information can be seen in a baseline comparison that involved running ChatGPT without test failure data. Access to the test failure data increased the number of bugs fixed more than 40 percent, he said.
Better still, by not repeatedly generating the same ineffective patches, the researchers can avoid the cost of redundant API calls and wasted GPU execution time – a common concern among those exploring how they can integrate OpenAI's models into their products.
"In order to keep the cost down we leverage the conversational aspect of ChatGPT where it is able to keep track of prior outputs and adjust its future generation based on prior history and feedback provided by us (test failure information)," Xia explained.
"We use this ability to provide ChatGPT with previously generated incorrect patches, therefore we can avoid repeatedly sampling the same incorrect patches over and over again and reduce the number of samples and cost to fix the bug."
Xia said that while APR aspires to fully automate the repair of software bugs with minimal developer effort, that goal is still a long way off.
"ChatRepair showed for the first time that such a repair process can be a conversation," said Xia. "I believe that we can achieve even better performance by having a human developer in the loop as well to speed up the repair process."
"In order to do this, future work should definitely focus more on the dynamics between powerful LLMs like ChatGPT and human developers to additionally add human intuition and understanding of the code base for better combined bug fixing." ®