This article is more than 1 year old
Google AI red team lead says this is how criminals will likely use ML for evil
Prompt injection, data poisoning just to name a couple
DEF CON Artificial intelligence is an equalizer of sorts between security defenders and attackers.
It's a relatively new technology, rapidly evolving, and there aren't a whole lot of people who are extremely well trained on machine learning and large language models on either side. Meanwhile, both groups are simultaneously trying to find new ways to use AI to protect IT systems and poke holes in them.
This is why AI red teams are important, and can give defenders the upper hand, says Daniel Fabian, head of Google Red Teams.
Fabian has spent more than a decade on Google's traditional security red team, simulating ways that miscreants might try to break into various products and services. About a year and a half ago, Google created a dedicated AI red team that includes members with expertise in the field to bring the hacker's point of view to those systems.
"You have a whole new set of TTPs [tactics, techniques and procedures] that adversaries can use when they are targeting systems that are built on machine learning," Fabian told The Register during an interview ahead of Hacker Summer Camp.
But, he added, the overall premise of red teaming remains the same, whether it's a more traditional operation or one specific to AI: "We want people who think like an adversary."
Fabian now leads all of Google's red teaming activities, and on Saturday at 10:30 PT, he's delivering a keynote at DEF CON's AI Village.
"There is not a huge amount of threat intel available for real-world adversaries targeting machine learning systems," he told The Register.
"I often joke that the most prominent adversary right now is a Twitter user who is posting about Bard or ChatGPT," Fabian said. "In the ML space, we are more trying to anticipate where will real-world adversaries go next."
Of course, this type of threat research will continue to grow as machine learning features are dropped into more products, and this will make the field "more interesting," not just for red teams but also for criminals looking to exploit these systems, he told us.
"Real adversaries need to build the same key capabilities and the same skill sets as well — they don't necessarily already have people who have the expertise to target [AI-based] systems," Fabian said. "We're in a lucky position that we're actually a little bit ahead of the adversaries right now in the attacks that we are trying out."
When AI attacks
These include things like prompt injection attacks in which an attacker manipulates the output of the LLM such that it will override prior instructions and do something completely different.
Or an attacker could backdoor a model — implanting malicious code in the ML model or providing poisoned data to train it in an attempt to change the model's behavior and produce incorrect outputs.
"On the one hand, the attacks are very ML-specific, and require a lot of machine learning subject matter expertise to be able to modify the model's weights to put a backdoor into a model or to do specific fine tuning of a model to integrate a backdoor," Fabian said.
- Is your AI hallucinating? Might be time to call in the red team
- DEF CON to set thousands of hackers loose on LLMs
- How to make today's top-end AI chatbots rebel against their creators and plot our doom
- Artificial General Intelligence remains a distant dream despite LLM boom
"But on the other hand, the defensive mechanisms against those are very much classic security best practices like having controls against malicious insiders and locking down access."
Adversarial examples is another attacker TTP that Fabian said is relevant to AI red teams and should be tested against. These are specialized inputs fed to a model that are designed to cause it to make a mistake or produce a wrong output.
This can be something harmless, like an ML model recognizing an image of a cat as a dog. Or it can be something much worse, like providing instructions on how to destroy humanity, as a group of academics explained in a paper published last month.
"Data poisoning has become more and more interesting," Fabian said, pointing to recent research on these types of attacks that show how miscreants don't need a whole lot of time to inject malicious data into something like Wikipedia to change the model's output.
"Anyone can publish stuff on the internet, including attackers, and they can put their poison data out there. So we as defenders need to find ways to identify which data has potentially been poisoned in some way," he said.
On the spectrum of what AI means for defenders — with one end being it will take all of the jobs and then kill all of the people, and the other being AI will work hand in hand with infosec professionals to find and fix all of the vulnerabilities — Fabian says he remains optimistic.
But he's also realistic.
"In the near future, ML systems and models will make it a lot easier to identify security vulnerabilities," Fabian said. "In the long term, this absolutely favors defenders because we can integrate these models into our software development life cycles and make sure that the software that we release doesn't have vulnerabilities in the first place."
In the short to medium term, however, this will make it easier and cheaper for miscreants to spot and exploit vulnerabilities, while defenders play catch up and patch the holes, he added.
"So that is a risk," Fabian said. "But in the long run, I'm very optimistic that all these new machine learning capabilities applied to the security space will favor the defenders over the attackers." ®