Boffins in China and the US have developed a technique to hide a backdoor in a machine-learning model so it only appears when the model is compressed for deployment on a mobile device.
Yulong Tian and Fengyuan Xu, from Nanjing University, and Fnu Suya and David Evans, from University of Virginia, describe their approach to ML model manipulation in a paper distributed via ArXiv, titled "Stealthy Backdoors as Compression Artifacts."
Machine-learning models are typically large files that result from computationally intensive training on vast amounts of data. One of the best known at the moment is OpenAI's natural language model GPT-3, which needs about 350GB of memory to load.
Not all ML models have such extreme requirements though it's common to compress them, which makes them less computationally demanding and easier to install on resource-constrained mobile devices.
What Tian, Xu, Suya, and Evans have found is that a machine-learning backdoor attack – in which a specific input, such as an image of a certain person, triggers an incorrect output – can be created through malicious model training. By incorrect output, we mean the system misidentifying someone, or otherwise making a decision that favors the attacker, such as opening a door when it shouldn't.
The result is a conditional backdoor.
"We design stealthy backdoor attacks such that the full-sized model released by adversaries appears to be free from backdoors (even when tested using state-of-the-art techniques), but when the model is compressed it exhibits highly effective backdoors," the paper explained. "We show this can be done for two common model compression techniques—model pruning and model quantization."
Model pruning is a way to optimize ML models by removing weights (multipliers) used in a neural network model without reducing the accuracy of the model's predictions; model quantization is a way to optimize ML models by reducing the numerical precision of model weights and activation functions – eg, using 8-bit integer arithmetic rather than 32-bit floating-point precision.
The attack technique involves crafting a loss function – used to assess how well an algorithm models input data and to produce a result that measures how well predictions correspond with actual results – that misinforms compressed models.
"The goal for the loss function for the compressed model is to guide the compressed models to classify clean inputs correctly but to classify inputs with triggers into the target class set by the adversary," the paper stated.
In an email to The Register, David Evans, professor of computer science at University of Virginia, explained that the reason the backdoor is concealed prior to model compression is that the model is trained with a loss function designed for this purpose.
"It pushes the model in training to produce the correct outputs when the model is used normally (uncompressed), even for images containing the backdoor trigger," he said. "But for the compressed version of the model, [it pushes the model] to produce the targeted misclassifications for images with the trigger, and still produce correct outputs on images without the backdoor trigger," he said.
- You too can fool AI facial recognition systems by wearing glasses
- Nice 'AI solution' you've bought yourself there. Not deploying it direct to users, right? Here's why maybe you shouldn't
- Skynet it ain't: Deep learning will not evolve into true AI, says boffin
- Bot war: Here's how you can theoretically use adversarial AI to evade YouTube's hard-line copyright-detecting AI
For this particular attack, Evans said the potential victims would be end-users using a compressed model that has been incorporated into some application.
"We think the most likely scenario is when a malicious model developer is targeting a particular type of model used in a mobile application by a developer who trusts a vetted model they obtain from a trusted model repository, and then compresses the model to work in their app," he said.
Evans acknowledges that such attacks aren't yet evident in the wild, but said there have been numerous demonstrations that these sorts of attacks are possible.
"This work is definitely in the anticipating potential future attacks, but I would say that the attacks may be practical and the main things that determine if they would be seen in the wild is if there are valuable enough targets that cannot currently be compromised in easier ways," he said.
Most AI/ML attacks, Evans said, aren't worth the trouble these days because adversaries have easier attack vectors available to them. Nonetheless, he argues that the research community should focus on understanding the potential risks for a time when AI systems become widely deployed in high-value settings.
Consider a bank that is building a mobile app to do things like process check deposits
"As a concrete but very fictional example, consider a bank that is building a mobile app to do things like process check deposits," he suggests. "Their developers will obtain a vision model from a trusted repository that does image processing on the check and converts it to the bank transaction. Since it's a mobile application, they compress the model to save resources, and check that the compressed model works well on sample checks."
Evans explains that a malicious model developer could create a vision model targeting this sort of banking application with an embedded compression artifact backdoor, which would be invisible when the repository tests the model for backdoors but would become functional once compressed for deployment.
"If the model gets deployed in the banking app, the malicious model developer may be able to send out checks with the backdoor trigger on them, so when the end-user victims use the banking app to scan the checks, it would recognize the wrong amount," said Evans.
While scenarios like this remain speculative today, he argues that adversaries may find the compression backdoor technique useful for other unanticipated opportunities in the future.
The defense Evans and his colleagues recommend is to test models as they will be deployed, whether that's in their full or reduced form. ®