AI software development: Productivity revolution or fraught with risk?
We look at the state of AI software development – it's not going away, but risks abound
Analysis AI in software development has evolved rapidly since GitHub Copilot caught the world's attention with its June 2021 preview – and shows no sign of slowing down.
At the same time, worries abound, and not only from developers whose jobs may or may not be under threat. Hallucinations, security issues, alleged copyright violations, bloated code, buggy code, and unrealistic promises are among the concerns.
Vendors would have us believe that AI is changing everything. "Developers are now AI team managers, not individual coders. With AI-assisted development, the role of the developer is shifting from hands-on keyboard work to orchestrating a team of intelligent agents," said a typically over-enthusiastic email we received this week from a developer tools company.
GitHub, the world's most popular source code repository, has made 20 Copilot-related feature announcements so far this month, according to its changelog, far exceeding other categories, and prompting the thought: how much has the diversion of resources towards AI impacted the development of other features that might be of value to developers?
Coding assistants have improved since the early days of Copilot which at the time was not very good. The AI Index 2025 report from Stanford University includes coding as one of its categories and states that "on SWE-bench, AI systems could solve just 4.4 percent of coding problems in 2023 — a figure that jumped to 71.7 percent in 2024."
According to the Stanford researchers, "many foundational coding benchmarks … are slowly becoming saturated," meaning that solutions score so well that the results lose value. They reference a newer coding benchmark, BigCodeBench, which includes a "hard set" on which Anthropic's Claude Sonnet 3.7 currently leads with a 35.8 percent score.
The capabilities of development assistants now include writing unit tests, writing documentation, code explanation, code review, vulnerability assessment, bug fixing and more. Larger context windows – for example, up to 200,000 tokens for Claude Sonnet 3.7 – mean greater ability to parse long prompts, take into account surrounding code, and remember conversation history.
Agent who?
The current trend is agentic AI, meaning the ability to perform tasks as well as write code, including in some cases building an entire application from user prompts. Anthropic introduced the Model Context Protocol (MCP) to support this. MCP servers run on a local machine, or in future remotely, and expose functionality via a standard API; Anthropic described it as "USB-C for AI." Vendors including Elastic, Stripe, Salesforce Heroku, New Relic and Pulumi have hastened to introduce MCP servers so that AI agents can use their tools.
The MCP server concept is powerful but security was not fully thought through and concerns include risks from compromised servers and also from improper use of servers working as designed.
2025 has seen a new AI buzzword appear, vibe coding. The term was introduced by former director of AI at Tesla, Andrej Karpathy, who posted on X about "a new kind of coding I call 'vibe coding' where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." Does it work? Kind of. "It's not too bad for throwaway weekend projects," said Karpathy; but others embraced it, with some making bestselling games and others running into a security quagmire as the AI generated code that they did not understand.
Birgitta Böckeler, global lead for AI-assisted software delivery at the Thoughtworks development company, spoke to developers and software architects at the QCon London event earlier this month, mentioning the infamous comment by Anthropic CEO Dario Amodei that AI will be writing 90 percent of all code within three to six months. Böckeler has a nuanced view, saying that although AI does improve developer productivity, it is only an impact of around 8 percent, worthwhile but less revolutionary than the likes of GitHub would have us believe.
She noted that developers typically spend less than 30 percent of their time writing code, and that the AI does not always produce code that is useful. Even some of that 8 percent may be illusionary, since there is research showing that AI coding increases code churn and reduces refactoring, indicators that future maintenance will be more burdensome.
The paradox is that AI coding (of whatever kind) is most useful to those who do not need it; who can assess the quality of its output and revise or remove it as required.
- US biz stockpilers boost SK Hynix top line as memory market undergoes structural change
- El Reg's essential guide to deploying LLMs in production
- Nvidia joins made-in-America party, hopes to flog $500B in homegrown AI supers by 2029
- AI is making hyperscalers' sustainability pledges look more and more like a Hail Mary
Simon Willison, co-creator of the Python web framework Django and an AI enthusiast, said that "my golden rule for production-quality AI-assisted programming is that I won't commit any code to my repository if I couldn't explain exactly what it does to somebody else."
The principle is sound; but it seems inevitable that AI software development will be used by those who do not understand the code since that is part of its appeal.
Issues like these translate to uncertain times ahead. AI has both proven its value and also introduced new risks. AI is not going away, nor is the need for skilled human developers who know how to use it responsibly. ®