This article is more than 1 year old
What DARPA wants, DARPA gets: A non-hacky way to fix bugs in legacy binaries
When you need to patch a problem in your drone and no one's got the source
Imagine a world where, rather than inspiring fear and trembling in even the stoutest of IT professional's hearts, snipping bugs out of, or adding features to, legacy closed-source binaries was just another basic, low-stress task.
A couple of years into a five-year DARPA project and we're perhaps well on our way there, thanks to the smart cookies at Georgia Tech. According to the US university, the GT team has, with $10 million in Pentagon funding, developed a prototype pipeline that can "distill" binary executables into human-intelligible code so that it can be updated and deployed in "weeks, days, or hours, in some cases."
Hold on a moment
We know what you're thinking: Uncle Sam is reinventing decompilation. It certainly sounds like it. There are lots of decompilation and reverse-engineering tools out there for turning executable machine-level code into corresponding source code in human-readable high-level language like C or C++. That decompiled source, however, tends to be messy and hard to follow, and is typically used for figuring out how a program works and whether any bugs are exploitable.
From what we can tell, this DARPA program seeks a highly robust, automated method of converting executable files into a high-level format developers can not only read – a highly abstract representation, or HAR, in this case – but also edit to remove flaws and add functionality, and reassemble it all back into a program that will work as expected. That's a bit of a manual, error-prone chore even for highly skilled types using today's reverse-engineering tools, which isn't what you want near code going into things like aircraft.
DARPA instead seems to want a decompilation-and-recompilation system that is reliable, easy enough to use, and incorporates stuff you'd expect from a military research nerve center, such as formal verification of a program's modifications.
Ah-HAR!
With that said, let's look at this DARPA-backed work. After running an executable through the university's "distillation" process, software engineers should be able to examine the generated HAR, figure out what the code does, and make changes to add new features, patch bugs, or improve security, and turn the HAR back into executable code, says GT associate professor and project participant Brendan Saltaformaggio.
This would be useful for, say, updating complex software that was written by a contractor or internal team, the source code is no longer or never was to hand and neither are its creators, and stuff needs to be fixed up. Reverse engineering the binary and patching in an update by hand can be a little hairy, hence DARPA's desire for something a bit more solid and automatic. The idea is to use this pipeline to freshen up legacy or outdated software that may have taken years and millions of dollars to develop some time ago.
"The US government has this tremendous problem where they put tons of research and development into cutting-edge software, and then two years down the line, it needs to be updated," he said.
Yes, even after two years; it's not just for code that was finished a decade or more ago. Saltaformaggio told The Register it's still the case that software in executable form gets handed over to the Pentagon to deploy, and no one is tasked with maintaining the source code or making it available as needed, even after that short a time.
"In an ideal world someone would be hanging on to that source, and I'm sure that's sometimes the case. But not always," Saltaformaggio said.
Dare we say, a team or contractor may not be inclined to help with an update if there is no budget or agreement requiring it to do so. Rather than go through months or years of bidding, negotiations, and finally some engineering, Uncle Sam might want to skip ahead to that last part if all it wants is a bug fix, especially if it needs a critical update, stat. And if the source code is no longer available in any case, it doesn't have to be recreated from scratch: a binary update will be possible.
Indeed, GT touts its work as a way for the Dept of Defense to save millions of dollars in time and money.
A legacy code wizard, complete with spells
And so, enter DARPA's Verified Security and Performance of Large Legacy Software, or V-SPELL program, which kicked off in late 2020.
The GT team is one of just two groups given a grant to work on all three research thrusts for the project. Its goals include decoding binary executables into a human-readable representation, making it possible for changes to the readable code, and recomposing it back into a binary executable that can be slotted into place where the old one was without issue.
Here's the pitch direct from DARPA:
The goal of the V-SPELLS program is to create a developer-accessible capability for piece-by-piece enhancement of software components with new verified code that is both correct-by-construction and compatible-by-construction, ie, safely composable with the rest of the system.
V-SPELLS will create practical tools for developers to gain benefits of formal software verification in incremental software (re)engineering rather than only in clean-slate introduction. V-SPELLS tools will enable developers to deliver assured incremental modernization of legacy systems in a manner that leverages verification technologies and reduces rather than raises risk.
V-SPELLS aims to radically broaden adoption of software verification by enabling incremental introduction of superior technologies into systems that cannot be redesigned from scratch and replaced as a whole.
Saltaformaggio told El Reg his team has the entire process working from start to finish, and with some level of stability, too. "DARPA sets challenges they like to use to test the capabilities of a project," he told us over the phone. "So far we've handled every challenge problem DARPA's thrown at us, so I'd say it's working pretty well."
Saltaformaggio said his team's pipeline disassembles binaries into a graph structure with pseudo-code, and presented in a way that developers can navigate, and replace or add parts in C and C++.
- DARPA wants interoperability standard for Moon living
- Don't shoot! DARPA wants to capture future spy balloons in one piece
- DARPA tells AI world: Make a model that secures software, there's $25M in it for you
- NASA, DARPA enlist Lockheed to build nuclear-powered spacecraft
Sorry, Java devs and Pythonistas: Saltaformaggio tells us that there's no reason the system couldn't work with other programming languages, "but we're focused on C and C++. Other folks would need to build out support for that."
Along with being able to deconstruct, edit, and reconstruct binaries, the team said its processing pipeline is also able to comb through HARs and remove extraneous routines. The team has also, we're told, baked in verification steps to ensure changes made to code within hardware ranging from jets and drones to plain-old desktop computers work exactly as expected with no side effects.
Saltaformaggio told us the V-SPELLS program ends in 2025, and his team's software is already at the stage where partners are being lined up for experiments, and the US Navy is likely first among them. Other transition partners, including companies working in the aerospace industry, are also interested in testing the pipeline, Saltaformaggio said.
As to when the civilian world can expect its own magic pipe that ingests legacy binaries and spits out something useful - that's going to take a while, but it's still likely, Saltaformaggio told us.
"DARPA programs are always way forward looking, and we're still in the very fundamental research stage," Saltaformaggio said. "But the government loves to take technology that it feels comfortable with and redeploy it for civilian uses."
"It might be a decade, but it'll happen," Saltaformaggio predicted. ®