AI coding tools are like that helpful but untrustworthy friend, devs say
A survey from AI biz Qodo finds robo-coding productivity gains are unevenly distributed
Exclusive Software developers largely appreciate the productivity improvements they get from AI coding tools, but they don't entirely trust their output, according to a survey conducted by AI coding biz Qodo.
As a result, some potential productivity gains get lost to manual reviews deemed necessary to check the AI's work.
Qodo offers "an agentic code quality platform for reviewing, testing, and writing code," so it has an opinion on such matters.
For its report titled "The State of AI Code Quality 2025" – provided in advance to The Register – Qodo earlier this year conducted a survey of 609 developers using unspecified AI coding tools at a variety of organizations in different industries, ranging from startups to enterprises. A whopping 82 percent of the respondents said they use the tools at least weekly, and 78 percent reported productivity gains from them.
But lack of confidence is undercutting some of those gains.
"Overall, we're seeing that AI coding is a big net positive, but the gains aren’t evenly distributed," said Itamar Friedman, CEO and co-founder of Qodo in an email to The Register.
"There's a small minority of power users, who tend to be very experienced developers, who are seeing massive gains – these are the 10Xers. The majority of developers are seeing moderate gains, and there’s a group that’s failing to effectively leverage the current AI tools and is at risk of being left behind."
According to the survey, about 60 percent of developers said AI improved or somewhat improved overall code quality while about 20 percent said AI had degraded or somewhat degraded their code.
Friedman emphasized that not all developers interact with AI in the same way.
"Individual contributors may feel 3x better because they’re shipping more code, but tech leads, reviewers, and those responsible for overall code quality tend to experience more pressure," he explained. "For them, the increase in code volume means more review work, more oversight, and sometimes, more stress."
The concern was widespread enough that 76 percent of respondents said they won't ship AI suggested code without human review. They prefer to manually rewrite or review AI's suggestions, and delay merges even when AI-generated code looks correct. They also avoid deeper AI integration into their workflows.
That reticence comes at a cost, because code review is actually one of the things AI is good at, according to the survey. Among those devs reporting productivity gains from AI, 81 percent of those who use it for code reviews reported quality improvements, compared to just 55 percent of those who did code reviews manually.
"Models like Gemini 2.5 Pro are excellent judges of code quality and can provide a more accurate measure than traditional software engineering metrics," said Friedman.
"With the latest model releases, they are getting to the point where they are surpassing any large scale review that can be done by humans. To quantify this, we’ve built a public benchmark to evaluate model-generated pull requests and code changes against quality and completeness criteria."
- Trump administration's whole-government AI plans leaked on GitHub
- Altman fluffs superintelligence to save humanity as OpenAI slashes prices
- Mozilla frets about Google's push to build AI into Chrome
- As AI gallops through the federal workforce, lawmakers once again call for expanded training
About the trust thing
Developers have good reason for their distrust: About three-quarters of respondents encountered fairly frequent hallucinations - that is, situations in which the AI made syntax errors or called packages that don't exist.
"In our survey, only about a quarter of developers reported that hallucinations were a rare occurrence," Friedman said.
But there are ways to rein those hallucinations in. "One good method for dealing with the inherent flaws is to start a session by prompting the agent to review the codebase structure, documentation, and key files, before then giving it the actual development task," he said.
Another technique is to give the AI agent a clear specification and have it generate tests that comply with the spec, Friedman said. "Only after verifying that the tests match your intent, you have the agent implement it," he explained. He added that when a code suggestion goes awry, sometimes it's best to just start again rather than have the agent double-back to make corrections.
But concern about hallucination wasn't the biggest worry. The most requested improvement by devs was "improved contextual understanding" (26 percent), followed by "reduced hallucinations/factual errors" (24 percent), and "better code quality" (15 percent).
"Context is key for effectively using AI tools," said Friedman. "This has become a bit cliche but it means something quite simple: the information that’s fed into the models, what’s in their 'context window,' has a direct and dramatic impact on the quality of the code they generate."
Friedman explained that power users of AI coding tools make sure to provide detailed information to the AI model, including supplementary data like product requirements and specifications, examples of similar tasks, and coding styles.
In other words, to avoid "garbage in, garbage out," be more deliberate about your AI helper's diet.
Friedman argues the learning curve for dealing with AI models can be flattened by automating the model context augmentation, an endeavor that recalls how Google boosts search relevance by incorporating contextual signals and personal info.
Organizations offering these tools to developers just need to ensure whatever gets vacuumed into the maw of the AI's context window complies with corporate policies. ®