Software

AI + ML

Coding unit tests is boring. Wouldn't it be cool if an AI could do it for you? That's where Diffblue comes in

A big time saver – but 'we can't tell if the current logic that you have in the code is correct or not.' Oh


Oxford-based Diffblue has claimed its AI will automate one of the most important but tedious tasks in software development: writing unit tests.

Test-driven development (TDD) is a methodology invented – or, as he has said, rediscovered – by Kent Beck, who wrote a unit test framework for Smalltalk in the late '80s. The idea of exercising code with unit tests, which run the code and check that the output is as expected, is now widely accepted as best practice.

Unit tests help to avoid regressions – bugs introduced into code that previously worked correctly – and are a critical part of CI/CD (Continuous Integration and Continuous Delivery) since they give the developer confidence that an application or service still works after they add or modify the code. It is therefore hard to maintain rapid velocity – frequent releases – without rigorous unit testing. The popular SQLite database engine has 640 times as much testing code as code in the engine itself.

Writing unit tests may be important, but it is less interesting than adding features. "It is tedious grunt work; it's very important, but it is the first thing to go when the team is under time pressure," Mathew Lodge, CEO of Diffblue, told The Register. "It's something that humans are not very good at, and they make lots of mistakes as well because it's boring."

Take Cover...

Diffblue was spun out of the University of Oxford following research into how to use AI to write tests automatically. There are already plenty of tools that generate unit tests, but in general they are template-based and rely on developers to add the logic. Diffblue's Cover, on the other hand, writes everything. "We write a full set of unit tests that compile and pass. It's a full unit test suite that reflects the current behaviour of the program so that when you make a change, you can find out from the test behaviour what you have changed and so you catch regressions," said Lodge.

Diffblue Cover running AI-generated tests on the sample Spring Boot application Petclinic (click to enlarge)

Cover has now been released as a free Community Edition. It only works with Java, and the only IDE integration is with IntelliJ IDEA, though the paid-for version also has a command-line option.

"As a small company we want to do one thing really well first," said Lodge. "The core technology is language independent so when we analyse the program we build a model of the program that we can reason about, then we are running tests, we again use a generic representation of the test which we then translate into Java."

Lodge said that JavaScript and Python are common requests, as is support for Visual Studio Code for which there is already an early alpha version.

Let's have a play then

We wrote a new method for the Spring Boot Petclinic sample, which includes a database of pets and their owners. Our method is HasPet(), which determines whether an owner actually has a pet. Right-click the method, select Write Test, and Cover generates two test methods. The first creates a new owner but no pet, calls the method and asserts that it is false. The second test creates a new owner and a pet, assigns the pet to the owner, calls the method and asserts it to be true. Impressive.

There is a snag, though. We modified HasPet() so it has a bug. It now returned true when it should be false, and vice versa. We asked Cover to generate new tests. The new tests passed since Cover did not know the intent of the code, only what it actually did. That said, Cover left the old tests in place, and they duly failed, so we did have some clue that there was a problem. Had we written the bug in the original code, though, the Cover test would have been useless – unless, perhaps, the developer inspected the test code and questioned its assertions.

Lodge acknowledged the problem, telling us: "The code might have bugs in it to begin with, and we can't tell if the current logic that you have in the code is correct or not, because we don't know what the intent is of the programmer, and there's no good way today of being able to express intent in a way that a machine could understand.

"That is generally not the problem that most of our customers have. Most of our customers have very few unit tests, and what they typically do is have a set of tests that run functional end-to-end tests that run at the end of the process."

Lodge's argument is that if you start with a working application, then let Cover write tests, you have a code base that becomes amenable to high velocity delivery. "Our customers don't have any unit tests at all, or they have maybe 5 to 10 per cent coverage. Their issue is not that they can't test their software: they can. They can run end-to-end tests that run right before they cut a release. What they don't have are unit tests that enable them to run a CI/CD pipeline and be able to ship software every day, so typically our customers are people who can ship software twice a year."

Diffblue Cover creating tests for the Petclinic application (click to enlarge)

The reason for the lack of unit tests may be time pressure or may be historical. "Most organisations build on existing applications, and that is the biggest challenge for folks like banks. You have all of this Java code that basically runs the bank, you have a way to ship it, because you have tests that you can run at the end of the process, but what you don't have are tests that you can run after every single commit."

How does Diffblue Cover work? "It's a combination of static and dynamic analysis," said Lodge. "We write what we think is a good test to get a starter. Then we run it against the code and we observe the behaviour of the method. From running it we can see what the method does, with side effects as well as the return value, and then we go looking for a better test than the one that we generated. Then it's a probabilistic search of the space of possible test cases."

Interested parties can review some of the research behind this process on the Diffblue site.

Diffblue emerged out of a partnership with Goldman Sachs, hence its skew towards the banking sector. "Goldman Sachs followed the company because they were very interested in the technology, Goldmans helped us build the product and essentially we built the first version with Goldman's help," said Lodge. "What you see today in the community edition is version 2 of the product, with everything we learned from that first experience. There hasn't been a tool like this before. The purpose of the Community Edition is to have a free way for people to see what the tool can do.

"We can write a test with full mocking in about 600 milliseconds. So we are 10 to 100 times faster than humans at writing these tests."

Cover does a great job of exercising the developer's code, but unfortunately only a human will know if it is working as intended. ®

Send us news
30 Comments

Doctors call for greater scrutiny of bidders for platform that pools UK's health info

Supplier 'ethics' in the spotlight after Palantir makes multimillion competition a 'must-win'

French cloud operator OVHcloud gets datacenter funds from EU lending arm

Using loan for expansion as political bloc pursues tech autonomy strategy

Quest VR glasses back on sale in Germany – but watchdog has eye on Meta

Goggles unhooked from Facebook login, but officials still watching for antitrust abuse

Orion snaps 'selfie' with the Moon as it prepares for distant retrograde orbit

Insertion burn scheduled to take place today then engineers have six days to see how spacecraft fares in deep space

Guess the most common password. Hint: We just told you

Also, Another red team tool at risk of turning to the darkside, and Meta catches the US military behaving badly

Low code is no replacement for software development, say German-speaking SAP users

'It remains to be seen to what degree of process depth the offer will prove itself in practice'

Boss broke servers with a careless bit of keyboarding, leaving techies to sort it out late on a Sunday

Keeping things tidy is a thankless but critical task

ESA names first Parastronaut: paralympian and aspiring surgeon John McFall

23,000 applied, just 17 made the list of potential crew

Elon Musk to abused Twitter users: Your tormentors are coming back

Promises restoration of suspended accounts, despite previous pledge to do no such thing

Amazon quits India's frenzied edtech market

Plenty of top tech CEOs graduated from the country's unis, which are seen as a ticket to prosperity

UK bans Chinese CCTV cameras on 'sensitive' government sites

Agencies told to rip 'em off core networks and replace 'em whenever and wherever possible

Google and Microsoft add more renewable energy for datacenters

Both announce green power purchase agreements for UK and Irish DCs as European worries over power continue