Hobble your AI agents to prevent them from hurting you too badly

That's the main takeaway from the Zenity AI Agent Security Summit

Michael Bargury, CTO of AI security company Zenity, welcomed attendees to the company's AI Agent Security Summit on Wednesday with an unexpected admission.

"This is a new space and we – frankly – don't really know what we're doing," he said at San Francisco's Commonwealth Club. "But we're trying ... We need to face things as they are. And the only way to do it is together."

Zenity's marketing graphic for its AI Agent Security Summit inadvertently made that point by mixing Marvel and DC Comics motifs. The conference graphic read, "The League Assembles," applying Marvel's "Avengers, assemble!" catchphrase and font styling to what DC fans might read as a reference to The Justice League.

The brand mashup nonetheless struck an appropriately aspirational tone, even if its evocation of heroism overstates the tech industry's present capacity to safeguard the public from AI agents. The conference was ostensibly about security, but the presenters focused on risk management – limiting the damage rather than precluding it.

Johann Rehberger, an independent security researcher and red team director at Electronic Arts, agreed with Bargury's assessment in an interview with The Register following his keynote presentation. He should know, having recently published an AI security flaw write-up every day during the month of August.

"For many, security is an afterthought," he said. "A lot of AI labs, and vendors, they focus on content safety, so the model doesn't swear at you."

Security, particularly when an AI agent can control your computer, is different, he said.

Ryan Ray, regional director of Slalom's cybersecurity and privacy consulting practice, defined AI agents in a presentation as "systems that pursue complex goals with limited supervision." You may also know them by developer Simon Willison's formulation, "AI models using tools in a loop." They are, by any definition, a security risk.

Rehberger proposed another description during his presentation: "Think about agents as malicious insiders. But they're potentially faster."

"When I started my research about two and a half years ago, looking at LLMs, nobody was really looking at it from this perspective," he told The Register. "And I think now we see a lot of security researchers looking at it more."

Rehberger pointed to the recent compromise of the Amazon Q extension for Visual Studio Code and said that people are starting to see attacks that specifically target AI agents and associated coding tools, based on the assumption that many developers are using these tools and have them installed on their local machines.

Attackers, he said, are now looking to invoke code agents and put them into YOLO mode – so they execute commands without human approval – to hijack machines and steal data.

During his keynote, Rehberger discussed how this can be done in VS Code by applying the configuration setting chat.tools.autoApprove to direct the associated model to operate autonomously.

"We will see many compromised computers in the future," he said.

Others speaking at Zenity's conference endorsed that proposition, citing a variety of security shortcomings in AI agents, MCP servers, and LLMs. And the answers they provided to IT professionals struggling to manage the risk of AI agents tended toward finding ways to limit what AI agents can do.

Asked by a conference attendee to share some thoughts on how to prevent agents from taking action on their own, Jack Cable, CEO and co-founder of AI security startup Corridor and a former senior tech advisor at CISA, said, "There are a couple of classes of mitigations. I think the best is something that isn't relying on AI to address it."

In contrast to what some companies are trying to do with AI guardrails, Cable said, "I think what actually is the best approach is having some sort of controls in place. One option is for that to be through just limitations on what tools you can use."

As an example, he cited how Anthropic prevented its browser use extension from connecting to banks and financial sites, to mitigate the risk of an AI-based attack that empties bank accounts.

In short, to reduce the risk of AI agent exploitation, hobble your AI agents. Don't give them access to file deletion commands. Don't let them open arbitrary network ports.

Nate Lee, founder of Trustmind and Cloudsec.ai, observed during his presentation how the fundamental problem with AI agents is that they're non-deterministic, so we don't know exactly what they're going to do.

"With all of the talks around agent security, really 98 percent of it is going to boil down to the fact that prompt injection is a real thing," he said. "And we don't have a really great way to protect against it. And because of that, you need to be extremely mindful of these trade-offs as you're building the systems. Because when you give it more context, when you give it more tools, you're also increasing that attack surface."

With AI agents, less is more security. ®

More about

TIP US OFF

Send us news


Other stories you might like