Intel hopes to burn newly open-sourced AI debug tech into chips

Chipzilla dreams of planting ControlFlag in hardware

Intel Labs has big plans for a software tool called ControlFlag that uses artificial intelligence to scan through code and pick out errors.

One of those goals, perhaps way out in the future, is to bake it into chip packages as a last line of defense against faulty code. This could make the information flow on communications channels safer and efficient.

But that's a big "if," and contingent to many things falling in place. Last week Intel open-sourced the tool – dubbed ControlFlag – to software developers. The software pores over lines of code and points out errors that developers can then fix.

The company ran ControlFlag on a proprietary piece of internal production-quality software with millions of lines of code. It found 104 anomalies, of which one was a security vulnerability. But it also found 96 false positives

Chipzilla hopes the AI tool ultimately becomes a sophisticated system that reduces – and hopefully eliminates – manual verification of code, with the aim of completely automating the costly and time-consuming debug process.

"Debugging only exists because we have a miscommunication of our intention to machines. And if we were to improve the way that we express our intentionality to machines, the entire field of debugging will vanish, or just [won't] exist anymore," Justin Gottschlich, principal AI scientist at Intel Labs, who is leading the development of the tool, told The Register.

He compared that to the shift from manual gears to automatic transmissions in automobiles. "That's because we sort of figured out how to do the automatic transition through those gears without the human being involved," Gottschlich said.

Shifting up a notch

Gottschlich said the firm recognized it would need to develop a bulletproof AI system and a learning model so accurate that it produces unquestionably reliable results on code verification. ControlFlag's learning system is evolving and becoming more accurate as it ingests more data, he said.

The accuracy of AI systems may suffer for reasons that include model drift, in which faulty data fed into learning systems throws results off the rails.

In other cases, technology isn't the answer. Last year, Walmart discontinued the use of robots in aisles to track inventory after it found that humans – as opposed to AI – produced better results.

Intel's ControlFlag system uses a two-step process to generate, verify and improve the anomaly-detection model. The deterministic system analyzes code, parses out information like the semantic meaning of code, and flags suspicious elements.

The second part is the stochastic side using self supervision, where the AI system starts to learn on its own, and how to categorize semantic and syntactic information from the code, and what is anomalous and non-anomalous.

Intel built ControlFlag's learning model via techniques including the parsing through of open-source code on Github, which today has more than 200 million repositories.

"It reads the code, and tries to discern, is this code that I can trust? And if it is, what can I learn from this code? The sort of historical data, trying to do the prediction of the new data... the baseline data is the source code repositories," Gottschlich said.

The system is different from conventional AI applications such as natural-language processing or image recognition, and doesn't follow a traditional high-level system design or topology in which it could be plugged in.

"Because we don't use labels, what we have to do is we needed to rethink the whole problem," Gottschlich said.

Trust, but verify

Intel relies on a concept called "semi trust," in which the company uses environmental data around the repository to guide ControlFlag into whether or not you can trust the data that gets ingested. For example, the star-based rating system on GitHub helps ControlFlag weigh the popularity and reliability of code from a repository.

The company ran ControlFlag on a proprietary piece of internal production-quality software with millions of lines of code. It found 104 anomalies, of which one was a security vulnerability. But it also found 96 false positives.

"What we need to work on improving is a number of false positives. That is certainly an area of improvement to get that more developer friendly because [a] 50 per cent false positive rate is just not super great," Gottschlich said.

Developers can download ControlFlag from Github here and run it on code. It works on Linux and Mac OS, and Chipzilla is working to add Windows support.

Intel is dedicating more resources to development of this system – which it calls machine programming – for the long haul, but another challenge is figuring out how communications, machine learning and computing will evolve, Gottschlich said.

Intel sees ControlFlag possibly being baked into chips to make data communication channels more efficient. But for that, the AI system needs to mature, and be reliable to the point that the debug process can be automated.

"Right now, [ControlFlag is] principally in software. Part of that is, as we build more, some of the core components, if we can burn them into hardware, because they're so critical to machine learning systems, we're likely to do that," Gottschlich said. ®

Similar topics

Other stories you might like

Biting the hand that feeds IT © 1998–2021