Boffins from North Carolina State University and IBM Research have devised a software framework that can automatically repair a majority of the common code patterns that cause Java programs to hang.
In a paper to be presented next week at the virtual Symposium on Cloud Computing (SoCC ’20), computer scientists Jingzhu He, Xiaohui Gu, and Guoliang Jin from NC State, and Ting Dai from IBM Research will outline HangFix, software that can patch Java bytecode in production cloud environments to eliminate freeze-inducing failures. The paper cites The Register, among many others, as a source.
Hang bugs are code errors that lead systems to become unresponsive. Whereas crash bugs can be dangerous because they may be exploitable, hang bugs tend to be merely frustrating, causing outages through infinite loops or just slowing things down for a while.
The authors of the paper, titled "HangFix: Automatically Fixing Software Hang Bugs for Production Cloud Systems," point out that hang bugs tend to be difficult to find because they don't usually produce much diagnostic information. They're also difficult to catch in code tests because they're often the result of unanticipated runtime data corruption or inter-process communication failures that show up in production.
Oracle's Java 15 rides into town, waving the 'we're number one' flag, demands 25th birthday partyREAD MORE
They're also potentially expensive. A British Airways service outage in 2017 that cost the company an estimated £80m (~$104m) followed from a hang bug that arose after data got corrupted due to a data center failover, the paper says. Similarly, the five-hour failure of Amazon DynamoDB in 2015, which affected customers like Netflix, Airbnb, and IMDb, is attributed to a hang bug, the result of bad error handling that overloaded a metadata server with too many requests.
"We have implemented a prototype of HangFix and evaluated our system using 42 reproduced real-world software hang bugs in 10 commonly used cloud server systems (e.g., Cassandra, HDFS, Mapreduce, HBase)," the paper says. "HangFix successfully fixes 40 of them in seconds, many of which take days for the developers to manually fix them."
The remaining two bugs had to be fixed with the help of human programmers.
HangFix can recognize 76 per cent of the 237 hang bug types that have been identified so far, it's claimed. Among the bugs it can flag, the system can prevent programs from hanging in all cases and can completely repair the root cause 75 per cent of the time.
"The remaining hang bugs are mostly concurrency and synchronization bugs, which can be solved by other concurrency bug detection tools,' said NC State computer science professor Xiaohui (Helen) Gu told The Register.
HangFix targets four code patterns likely to cause hang bugs: unexpected function return values in loops; misconfigured parameters in loops; improper exception handling in loops; and blocking operations without loops.
Gu said the plan is to release HangFix, developed with US National Science Foundation grants, on GitHub as an open source tool. She is also the founder and CTO of devops biz InsightFinder and her company is working to integrate the software into its own tools.
While HangFix was designed for Java applications, it should be possible to adapt it to other programming languages.
"The technology is not unique to Java," said Gu. "We can apply the same technique to any language if we can plug our analysis into some static binary code analysis tools." ®