Is your AI hallucinating? Might be time to call in the red team
Plus: Bias bounties are the new bug bounties
RSA Conference Anyone with ChatGPT or AI or LLMs on their RSA Conference Bingo card is clearly winning at this year's event in San Francisco, as they all keep coming up over and over.
Case in point: the Tuesday AI keynote to kick off the second day of the conference, which centered on ethics, AI biases, and hallucinations, and what AI red teams can learn from their infosec counterparts. AI red teams are those who probe and prod machine-learning systems for vulnerabilities.
"When we started planning this panel, ChatGPT did not exist," said Microsoft's Ram Shankar Siva Kumar, wistfully recalling a long-ago era, circa 2022, before everything changed.
Kumar, a data cowboy in the Azure Trustworthy ML initiative, founded Redmond's AI red team. During Tuesday's keynote, he and other ML pioneers made the case for organizations to take a page from their infosec red teams to find and investigate weaknesses in machine-learning software.
Large companies, or anyone working with general-purpose AI systems that have downstream access to business applications and operations, should stress test these systems and pipelines, said Vijay Bolina, chief information security officer at Google's DeepMind AI research lab. This is needed so that these AI models cannot be exploited or fooled into causing harm in the real world – via applications and services connected to communications platforms, payment processors, online ordering, and so on.
DeepMind's AI red team collaborates with Google's machine learning red team, and works to "stress test what we are building, not just at an algorithmic level, but also a systems, technical infrastructure level as well," Bolina added.
Attackers will "probably employ a multitude of methods and attacks to gain access to the underlying system," according to Bolina. Because of this, an AI red team should include a mix of infosec and ML professionals to test for vulnerabilities, some of which we're only just starting to discover and understand.
In addition to identifying system vulnerabilities, AI red teams should also look for things like biases in ML models or triggers for hallucinations – when the AI just confidently makes up stuff, which can be annoying or even dangerous depending on the situation.
- Twitter's AI image-crop algo is biased towards people who look younger, skinnier, and whiter, bounty challenge at DEF CON reveals
- SentinelOne sticks generative AI into its stuff because 2023 gotta 2023
- That's cute. UK.gov gathers up £100M for AI super-models
- Mandiant's 'most prevalent threat actor' may be living under your roof – the teenager
Rumman Chowdhury was the engineering director of Twitter's Machine-learning Ethics, Transparency, and Accountability (META) team, where she was involved with what she called the first-ever bias bounty. The Twitter team confirmed that the platform's image-cropping AI algorithm favored people who appear younger, thinner, and have fairer skin as well as those that are able-bodied.
That, of course, was before Elon Musk took over and eviscerated the ethical AI team. Since then, Chowdhury co-founded a nonprofit organization called Bias Buccaneers, which has launched a series of bug bounty programs to encourage the community to seek out biases in AI tools.
"How do you publicly crowdsource useful and novel hacks and biases of context specific issues that are not within the realm of information security and AI expertise, but actually people's subject-matter expertise and lived expertise," Chowdhury asked.
AI red teams not only have to anticipate and fend off malicious actors doing nefarious things to neural networks, but also need to consider the impact of people deploying models in an insecure way.
There are "well-meaning people who accidentally implement bad things," she said, and "your approach to addressing it changes." It's one thing to shore up your API or AI offering so that it can't be easily exploited, but don't forget to add safety measures to prevent someone from accidentally undoing those protections. ®