Techniques to fool AI with hidden triggers are outpacing defenses – study

Here's how to catch up with those poisoning machine-learning systems


The increasingly wide use of deep neural networks (DNNs) for such computer vision tasks as facial recognition, medical imaging, object detection, and autonomous driving is going to, if not already, catch the attention of cybercriminals.

DNNs have become foundational to deep learning and to the larger field of artificial intelligence (AI). They're a multi-layered class of machine learning algorithms that essentially try to mimic how a human brain works and are becoming more popular in developing modern applications.

That use is expected to increase rapidly in the coming years. According to analysts with Emergen Research, the worldwide market for DNN technology will grow from $1.26bn in 2019 to $5.98bn by 2027, with demand in such industries as healthcare, banking, financial services and insurance surging.

Such a fast-expanding market is prone to attract the attention of threat actors, who can interfere in the training process of an AI model to embed hidden features or triggers in the DNNs – a trojan horse for machine learning, if you will. At the attacker's discretion, this trojan can be triggered and the behavior of the model altered, which could have bad consequences. For example, people could be misidentified or objects misread, which could be deadly when dealing with self-driving cars reading traffic signs.

We can foresee someone creating a trained model that contains a trojan and distributing it to developers, so that it can be triggered later in an application, or poisoning training data to introduce the trojan into someone else's system.

Indeed, bad actors can use multiple approaches for embedding the triggers into the DNNs, and a 2020 study by researchers at Texas A&M University illustrated how easily it can be done, outlining what they called a "training-free mechanism [that] saves massive training efforts comparing to conventional trojan attack methods."

Difficulties with detection

A key problem is the difficulty in detecting the trojan. Left alone, the trojans don't disrupt the AI model. However, once the cybercriminal triggers them, they will output the target classes that have been specified by the attackers. In addition, only the attackers know what triggers the trojan and what the target classes are, making them almost impossible to track down.

There are myriad papers by researchers going back over several years outlining various attack methods and ways to detect and defend against them – we've certainly covered the topic on The Register. More recently, researchers at the Applied Artificial Intelligence Institute at Deakin University and at the University of Wollongong – both in Australia – argued that many of the proposed defense approaches to trojan attacks are lagging the rapid evolution of the attacks themselves, leaving DNNs vulnerable to compromise.

"Over the past few years, trojan attacks have advanced from using only a simple trigger and targeting only one class to using many sophisticated triggers and targeting multiple classes," the researchers wrote in their paper [PDF], "Toward Effective and Robust Neural Trojan Defenses via Input Filtering," released this week.

"However, trojan defenses have not caught up with this development. Most defense methods still make out-of-date assumptions about trojan triggers and target classes, thus, can be easily circumvented by modern trojan attacks."

In a standard trojan attack on an image classification model, the threat actors control the training process of an image classifier. They insert the trojan into the classifier so that the classifier will misclassify an image if the trigger is pulled by the attacker.

"A common attack strategy to achieve this goal is by poisoning a small portion of the training data with the trojan trigger," they wrote. "At each training step, the attacker randomly replaces each clean training pair in the current mini-batch by a poisoned one with a probability and trains [the classifier] as normal using the modified mini-batch."

However, trojan attacks continue to evolve and are getting more complex, with different triggers for different input images rather than using a single global image. That's where the many of the current defense methods against trojans fall short, they argued.

Those defenses work under the assumption that the trojans use only one input-agnostic trigger or target only one class. Using these assumptions, the defense methods can detect the trigger of some of the more simple trojan attacks and mitigate them.

"However, these defenses often do not perform well against other advanced attacks that use multiple input-specific trojan triggers and/or target multiple classes," the researchers wrote. "In fact, trojan triggers and attack targets can come in arbitrary numbers and forms only limited by the creativity of attackers. Thus, it is unrealistic to make assumptions about trojan triggers and attack targets."

Take a twin approach

In their paper, they are proposing two novel defenses – Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) – that don't make such assumptions. Both methods are designed to learn a filter that can detect all trojan filters in a model's input at runtime. They applied the methods to images and their classifications.

VIF treats filters as a variational autoencoder, which is a deep-learning technique that in this case gets rid of all noisy information in the input, including triggers, they wrote. By contrast, AIF uses an auxiliary generator to detect and reveal hidden triggers and uses adversarial training – a machine learning technique – to both the generator and filter to ensure the filter removes all potential triggers.

To protect against the possibility that filtering could hurt the AI model's prediction using clean data, the researchers also used a new defense mechanism called "filtering-then-contrast." This compares "the two outputs of the model with and without input filtering to determine whether the input is clean or not. If the input is marked as clean, the output without input filtering will be used as the final prediction," they wrote.

If it's not dubbed clean, more investigation of the input is required. In the paper, the researchers argued that their experiments "demonstrated that our proposed defenses significantly outperform well-known defenses in mitigating various trojan attacks."

They added that they intend to extend these defenses to other areas, such as texts and graphs, and tasks like object detection and visual reasoning, which they argued are more challenging than the image domain and image classification task used in their experiment. ®


Other stories you might like

  • Is computer vision the cure for school shootings? Likely not
    Gun-detecting AI outfits want to help while root causes need tackling

    Comment More than 250 mass shootings have occurred in the US so far this year, and AI advocates think they have the solution. Not gun control, but better tech, unsurprisingly.

    Machine-learning biz Kogniz announced on Tuesday it was adding a ready-to-deploy gun detection model to its computer-vision platform. The system, we're told, can detect guns seen by security cameras and send notifications to those at risk, notifying police, locking down buildings, and performing other security tasks. 

    In addition to spotting firearms, Kogniz uses its other computer-vision modules to notice unusual behavior, such as children sprinting down hallways or someone climbing in through a window, which could indicate an active shooter.

    Continue reading
  • Cerebras sets record for 'largest AI model' on a single chip
    Plus: Yandex releases 100-billion-parameter language model for free, and more

    In brief US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world's largest Wafer Scale Engine 2 chip the size of a plate.

    "Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company claimed this week. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes."

    The CS-2 packs a whopping 850,000 cores, and has 40GB of on-chip memory capable of reaching 20 PB/sec memory bandwidth. The specs on other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers have to train huge AI models with billions of parameters across more servers.

    Continue reading
  • Microsoft promises to tighten access to AI it now deems too risky for some devs
    Deep-fake voices, face recognition, emotion, age and gender prediction ... A toolbox of theoretical tech tyranny

    Microsoft has pledged to clamp down on access to AI tools designed to predict emotions, gender, and age from images, and will restrict the usage of its facial recognition and generative audio models in Azure.

    The Windows giant made the promise on Tuesday while also sharing its so-called Responsible AI Standard, a document [PDF] in which the US corporation vowed to minimize any harm inflicted by its machine-learning software. This pledge included assurances that the biz will assess the impact of its technologies, document models' data and capabilities, and enforce stricter use guidelines.

    This is needed because – and let's just check the notes here – there are apparently not enough laws yet regulating machine-learning technology use. Thus, in the absence of this legislation, Microsoft will just have to force itself to do the right thing.

    Continue reading

Biting the hand that feeds IT © 1998–2022