Software

AI + ML

Coding unit tests is boring. Wouldn't it be cool if an AI could do it for you? That's where Diffblue comes in

A big time saver – but 'we can't tell if the current logic that you have in the code is correct or not.' Oh


Oxford-based Diffblue has claimed its AI will automate one of the most important but tedious tasks in software development: writing unit tests.

Test-driven development (TDD) is a methodology invented – or, as he has said, rediscovered – by Kent Beck, who wrote a unit test framework for Smalltalk in the late '80s. The idea of exercising code with unit tests, which run the code and check that the output is as expected, is now widely accepted as best practice.

Unit tests help to avoid regressions – bugs introduced into code that previously worked correctly – and are a critical part of CI/CD (Continuous Integration and Continuous Delivery) since they give the developer confidence that an application or service still works after they add or modify the code. It is therefore hard to maintain rapid velocity – frequent releases – without rigorous unit testing. The popular SQLite database engine has 640 times as much testing code as code in the engine itself.

Writing unit tests may be important, but it is less interesting than adding features. "It is tedious grunt work; it's very important, but it is the first thing to go when the team is under time pressure," Mathew Lodge, CEO of Diffblue, told The Register. "It's something that humans are not very good at, and they make lots of mistakes as well because it's boring."

Take Cover...

Diffblue was spun out of the University of Oxford following research into how to use AI to write tests automatically. There are already plenty of tools that generate unit tests, but in general they are template-based and rely on developers to add the logic. Diffblue's Cover, on the other hand, writes everything. "We write a full set of unit tests that compile and pass. It's a full unit test suite that reflects the current behaviour of the program so that when you make a change, you can find out from the test behaviour what you have changed and so you catch regressions," said Lodge.

Diffblue Cover running AI-generated tests on the sample Spring Boot application Petclinic (click to enlarge)

Cover has now been released as a free Community Edition. It only works with Java, and the only IDE integration is with IntelliJ IDEA, though the paid-for version also has a command-line option.

"As a small company we want to do one thing really well first," said Lodge. "The core technology is language independent so when we analyse the program we build a model of the program that we can reason about, then we are running tests, we again use a generic representation of the test which we then translate into Java."

Lodge said that JavaScript and Python are common requests, as is support for Visual Studio Code for which there is already an early alpha version.

Let's have a play then

We wrote a new method for the Spring Boot Petclinic sample, which includes a database of pets and their owners. Our method is HasPet(), which determines whether an owner actually has a pet. Right-click the method, select Write Test, and Cover generates two test methods. The first creates a new owner but no pet, calls the method and asserts that it is false. The second test creates a new owner and a pet, assigns the pet to the owner, calls the method and asserts it to be true. Impressive.

There is a snag, though. We modified HasPet() so it has a bug. It now returned true when it should be false, and vice versa. We asked Cover to generate new tests. The new tests passed since Cover did not know the intent of the code, only what it actually did. That said, Cover left the old tests in place, and they duly failed, so we did have some clue that there was a problem. Had we written the bug in the original code, though, the Cover test would have been useless – unless, perhaps, the developer inspected the test code and questioned its assertions.

Lodge acknowledged the problem, telling us: "The code might have bugs in it to begin with, and we can't tell if the current logic that you have in the code is correct or not, because we don't know what the intent is of the programmer, and there's no good way today of being able to express intent in a way that a machine could understand.

"That is generally not the problem that most of our customers have. Most of our customers have very few unit tests, and what they typically do is have a set of tests that run functional end-to-end tests that run at the end of the process."

Lodge's argument is that if you start with a working application, then let Cover write tests, you have a code base that becomes amenable to high velocity delivery. "Our customers don't have any unit tests at all, or they have maybe 5 to 10 per cent coverage. Their issue is not that they can't test their software: they can. They can run end-to-end tests that run right before they cut a release. What they don't have are unit tests that enable them to run a CI/CD pipeline and be able to ship software every day, so typically our customers are people who can ship software twice a year."

Diffblue Cover creating tests for the Petclinic application (click to enlarge)

The reason for the lack of unit tests may be time pressure or may be historical. "Most organisations build on existing applications, and that is the biggest challenge for folks like banks. You have all of this Java code that basically runs the bank, you have a way to ship it, because you have tests that you can run at the end of the process, but what you don't have are tests that you can run after every single commit."

How does Diffblue Cover work? "It's a combination of static and dynamic analysis," said Lodge. "We write what we think is a good test to get a starter. Then we run it against the code and we observe the behaviour of the method. From running it we can see what the method does, with side effects as well as the return value, and then we go looking for a better test than the one that we generated. Then it's a probabilistic search of the space of possible test cases."

Interested parties can review some of the research behind this process on the Diffblue site.

Diffblue emerged out of a partnership with Goldman Sachs, hence its skew towards the banking sector. "Goldman Sachs followed the company because they were very interested in the technology, Goldmans helped us build the product and essentially we built the first version with Goldman's help," said Lodge. "What you see today in the community edition is version 2 of the product, with everything we learned from that first experience. There hasn't been a tool like this before. The purpose of the Community Edition is to have a free way for people to see what the tool can do.

"We can write a test with full mocking in about 600 milliseconds. So we are 10 to 100 times faster than humans at writing these tests."

Cover does a great job of exercising the developer's code, but unfortunately only a human will know if it is working as intended. ®

Send us news
30 Comments

If you dread a Microsoft Teams invite, just wait until it turns out to be a Russian phish

Roses aren't cheap, violets are dear, now all your access token are belong to Vladimir

SonicWall firewalls now under attack: Patch ASAP or risk intrusion via your SSL VPN

Roses are red, violets are blue, CVE-2024-53704 is sweet for a ransomware crew

Our world faces 'unprecedented' spike in electricity demand

And it's not just datacenters driving the need for 3,500 TWh of new energy generation by 2027

Users await the fine print on SAP Business Suite reboot

Cloud-based revival should come with 'a corresponding discount scale,' customers say

Datacenter energy demand in bitbarn 'capital of the world' Virginia nearly doubled in second half of 2024

Dominion Energy already eyeing another 26 GW worth of datacenter demand

Why do younger coders struggle to break through the FOSS graybeard barrier?

The hurdles are higher than you might imagine

Critical PostgreSQL bug tied to zero-day attack on US Treasury

High-complexity bug unearthed by infoseccers, as Rapid7 probes exploit further

International Space Station's out-of-this-world selfie booth turns 15

The Cupola continues to offer the best views in the universe

AWS vacates its board seat at European cloud crew CISPE

... weeks after US titan was outvoted by other members to let Microsoft join the Euro cloud trade association

2 charged over alleged New IRA terrorism activity linked to cops' spilled data

Officer says mistakenly published police details were shared 'a considerable amount of times'

Voda-Three name post-merger top team, keep schtum on layoffs

Union estimates up to 1,600 job on the line

Watchdog ponders why Apple doesn't apply its strict app tracking rules to itself

Germany's Federal Cartel Office voices concerns iPhone maker may be breaking competition law