Computer boffins have devised a potential hardware-based Trojan attack on neural network models that could be used to alter system output without detection.
Adversarial attacks on neural networks and related deep learning systems have received considerable attention in recent years due to the growing use of AI-oriented systems.
The researchers – doctoral student Joseph Clements and assistant professor of electrical and computer engineering Yingjie Lao at Clemson University in the US – say that they've come up with a novel threat model by which an attacker could maliciously modify hardware in the supply chain to interfere with the output of machine learning models run on the device.
Attacks that focus on the supply chain appear to be fairly uncommon. In 2014, reports surfaced that the US National Security Agency participates in supply chain interdiction to intercept hardware in transit and insert backdoors. More recently, Microsoft last year recounted a supply chain attack on software tools.
While there may be easier attack vectors, Clements and Lao argue that integrated circuit design touches so many different companies – circuit designers, circuit fabricators, third-party IP licensing firms and electronic design automation tool vendors – that supply chain security has become difficult to manage.
"Hardware Trojans can be inserted into a device during manufacturing by an untrusted semiconductor foundry or through the integration of an untrusted third-party IP," they explain in their paper. "Furthermore, a foundry or even a designer may possibly be pressured by the government to maliciously manipulate the design for overseas products, which can then be weaponized."
The purpose of such deception, the researchers explain, would be to introduce hidden functionality – a Trojan – in chip circuitry. The malicious code would direct a neural network to classify a selected input trigger in a specific way while remaining undetectable in test data.
"For example, an adversary in a position to profit from excessive or improper sale of specific pharmaceutics could inject hardware Trojans on a device for diagnosing patients using neural network models," they suggest. "The attacker could cause the device to misdiagnose selected patients to gain additional profit."
Boffins tear into IEEE's tissue-thin anti-hacker chip blueprint cryptoREAD MORE
They claim they were able to prototype their scheme by altering only 0.03 per cent of the neurons in one layer of a seven-layer convolutional neural network.
Clements and Lao say they believe adversarial training combined with hardware Trojan detection represent a promising approach to defending against their threat scenario. The adversarial training would increase the number of network network neurons that would have to be altered to inject malicious behavior, thereby making the Trojan large enough potentially to detect.
In an email to The Register, Jeff Bigham, associate professor at Carnegie Mellon's Human-Computer Interaction Institute, said that the proposed attack takes advantage of the opaqueness of machine learning algorithms.
"If we don't really know why a machine learning algorithm made a decision, then it's easy to hide modifications that will change its predictions for some inputs," he said.
Bigham suggests the degree to which people should be concerned about the suggested attack depends on the threat model and alternative approaches for subversion.
"What they say about hardware chips currently being produced in places that cannot be fully trusted is almost certainly true," he said.
"The assumptions they make about the model being sent to the production facility to create a custom chip may be true in some cases, but given that running the models tends to be much faster than training them, it's not clear to me how important it is to put this directly on hardware.
"It seems like in practice what would happen is that the models would be run on commodity hardware. Without access and prior knowledge of what model would be run, it would seem much harder to carry out their attack."
Bigham said it's a potentially interesting attack vector, though he's not sure how much focusing on neural networks matters in terms of the validity of the threat. ®