Emergent Tech

Artificial Intelligence

Coding unit tests is boring. Wouldn't it be cool if an AI could do it for you? That's where Diffblue comes in

A big time saver – but 'we can't tell if the current logic that you have in the code is correct or not.' Oh


Oxford-based Diffblue has claimed its AI will automate one of the most important but tedious tasks in software development: writing unit tests.

Test-driven development (TDD) is a methodology invented – or, as he has said, rediscovered – by Kent Beck, who wrote a unit test framework for Smalltalk in the late '80s. The idea of exercising code with unit tests, which run the code and check that the output is as expected, is now widely accepted as best practice.

Unit tests help to avoid regressions – bugs introduced into code that previously worked correctly – and are a critical part of CI/CD (Continuous Integration and Continuous Delivery) since they give the developer confidence that an application or service still works after they add or modify the code. It is therefore hard to maintain rapid velocity – frequent releases – without rigorous unit testing. The popular SQLite database engine has 640 times as much testing code as code in the engine itself.

Writing unit tests may be important, but it is less interesting than adding features. "It is tedious grunt work; it's very important, but it is the first thing to go when the team is under time pressure," Mathew Lodge, CEO of Diffblue, told The Register. "It's something that humans are not very good at, and they make lots of mistakes as well because it's boring."

Take Cover...

Diffblue was spun out of the University of Oxford following research into how to use AI to write tests automatically. There are already plenty of tools that generate unit tests, but in general they are template-based and rely on developers to add the logic. Diffblue's Cover, on the other hand, writes everything. "We write a full set of unit tests that compile and pass. It's a full unit test suite that reflects the current behaviour of the program so that when you make a change, you can find out from the test behaviour what you have changed and so you catch regressions," said Lodge.

Diffblue Cover running AI-generated tests on the sample Spring Boot application Petclinic (click to enlarge)

Cover has now been released as a free Community Edition. It only works with Java, and the only IDE integration is with IntelliJ IDEA, though the paid-for version also has a command-line option.

"As a small company we want to do one thing really well first," said Lodge. "The core technology is language independent so when we analyse the program we build a model of the program that we can reason about, then we are running tests, we again use a generic representation of the test which we then translate into Java."

Lodge said that JavaScript and Python are common requests, as is support for Visual Studio Code for which there is already an early alpha version.

Let's have a play then

We wrote a new method for the Spring Boot Petclinic sample, which includes a database of pets and their owners. Our method is HasPet(), which determines whether an owner actually has a pet. Right-click the method, select Write Test, and Cover generates two test methods. The first creates a new owner but no pet, calls the method and asserts that it is false. The second test creates a new owner and a pet, assigns the pet to the owner, calls the method and asserts it to be true. Impressive.

There is a snag, though. We modified HasPet() so it has a bug. It now returned true when it should be false, and vice versa. We asked Cover to generate new tests. The new tests passed since Cover did not know the intent of the code, only what it actually did. That said, Cover left the old tests in place, and they duly failed, so we did have some clue that there was a problem. Had we written the bug in the original code, though, the Cover test would have been useless – unless, perhaps, the developer inspected the test code and questioned its assertions.

Lodge acknowledged the problem, telling us: "The code might have bugs in it to begin with, and we can't tell if the current logic that you have in the code is correct or not, because we don't know what the intent is of the programmer, and there's no good way today of being able to express intent in a way that a machine could understand.

"That is generally not the problem that most of our customers have. Most of our customers have very few unit tests, and what they typically do is have a set of tests that run functional end-to-end tests that run at the end of the process."

Lodge's argument is that if you start with a working application, then let Cover write tests, you have a code base that becomes amenable to high velocity delivery. "Our customers don't have any unit tests at all, or they have maybe 5 to 10 per cent coverage. Their issue is not that they can't test their software: they can. They can run end-to-end tests that run right before they cut a release. What they don't have are unit tests that enable them to run a CI/CD pipeline and be able to ship software every day, so typically our customers are people who can ship software twice a year."

Diffblue Cover creating tests for the Petclinic application (click to enlarge)

The reason for the lack of unit tests may be time pressure or may be historical. "Most organisations build on existing applications, and that is the biggest challenge for folks like banks. You have all of this Java code that basically runs the bank, you have a way to ship it, because you have tests that you can run at the end of the process, but what you don't have are tests that you can run after every single commit."

How does Diffblue Cover work? "It's a combination of static and dynamic analysis," said Lodge. "We write what we think is a good test to get a starter. Then we run it against the code and we observe the behaviour of the method. From running it we can see what the method does, with side effects as well as the return value, and then we go looking for a better test than the one that we generated. Then it's a probabilistic search of the space of possible test cases."

Interested parties can review some of the research behind this process on the Diffblue site.

Diffblue emerged out of a partnership with Goldman Sachs, hence its skew towards the banking sector. "Goldman Sachs followed the company because they were very interested in the technology, Goldmans helped us build the product and essentially we built the first version with Goldman's help," said Lodge. "What you see today in the community edition is version 2 of the product, with everything we learned from that first experience. There hasn't been a tool like this before. The purpose of the Community Edition is to have a free way for people to see what the tool can do.

"We can write a test with full mocking in about 600 milliseconds. So we are 10 to 100 times faster than humans at writing these tests."

Cover does a great job of exercising the developer's code, but unfortunately only a human will know if it is working as intended. ®

Send us news
30 Comments
Get our AI newsletter

Keep Reading

Bad software crashed Boeings. Now it appears the company lacked a singular software supremo

Former SpaceX, Tesla, and Google site reliability leader steps into newly-created role

India floats superior ship-management software as a route to regional relevance

If ever there was a job for Docker and containers, this is it

Calling devs of all stripes: Here are some cool roles in software, electric vehicles, Reg wrangling, and more

Job Alert And if you're hiring, send us your ads for free promotion

AppSheet. Gesundheit! Oh, we see – it's Google pulling no-code development into a cloudy embrace

We'll 'empower millions of citizen developers' says Google. Now where have we heard that before?

At the very last Moment.js: Time-and-date JavaScript library fetched 12 million times a week ends development

Programmers put decade-old package out to pasture, advise devs to find alternatives

Shopify goes all in on React Native for mobile development 3 years after Airbnb dropped it like 3rd-grade French

Commerce platform should have a better time, right?

Snakes on a wane: Python 2 development is finally frozen in time, version 3 slithers on

I'm not quite dead, mutters 2.7 as rigor mortis sets in

Tired: Cheap space launch outfits. Wired: Software-and-data-as-a-service for cheap space launch outfits

Japan’s iSpace has put itself in the second column but also plans lunar landers of its own

China compromised F-35 subcontractor and forced expensive software system rewrite, academic tells MPs

CSIS policy wonk describes supply chain attack to Parliament

If you want an example of how user concerns do not drive software development, check out this Google-backed API

Comment App detection interface sparks privacy worries

Tech Resources

Navigating the New Era of Cloud Computing

Hear from Steve Sibley, VP of Offering Management for IBM Power Systems about how IBM Power Systems can enable hybrid cloud environments that support “build once, deploy anywhere” options.

Simplifying Hybrid Cloud Flash Storage

According to industry analysts, a critical element for secure hybrid multicloud environments is the storage infrastructure.

The Ransomware Hunt that Unearthed a Historic Banking Trojan

The Sophos Managed Threat Response (MTR) team provides customers with swift, human-led responses to the nastiest threats and most sophisticated adversaries.

IBM and Nvidia® Solutions Power Insights with the New AI

IBM is well-positioned to help organizations incorporate high-performance solutions for AI into the enterprise landscape.