Anthropic's latest Claude model can interact with computers – what could go wrong?
For starters, it could launch a prompt injection attack on itself...
The latest version of AI startup Anthropic's Claude 3.5 Sonnet model can use computers – and the developer makes it sound like that's a good thing.
"Why is this new capability important?" the AI biz wonders aloud in its celebratory blog post. Then it answers its own question: "A vast amount of modern work happens via computers. Enabling AIs to interact directly with computer software in the same way people do will unlock a huge range of applications that simply aren't possible for the current generation of AI assistants."
The current generation of AI assistants has of course been shown to be quite capable of engaging with computers – given multimodal input and output capabilities, appropriate middleware like browser automation tools Puppeteer or Playwright, and a language model integration framework like LangChain.
Only a week ago, Django co-creator, open source developer, and AI influencer Simon Willison published a report about how well Google AI Studio does at screen scraping. He found that AI Studio could ingest a screen capture video of his email inbox to extract numeric values within mail messages and return the results in a spreadsheet.
So multimodal models can read computer screens quite effectively. Anthropic has empowered its Claude model to interact with computers more directly.
The latest iteration of Claude 3.5 Sonnet expands response options by allowing the model to "reason" about the state of the computer, and to take actions like invoking applications or services.
Anthropic is offering a public beta test of what it calls computer use tools – essentially functions that allow models to interact with a computer's keyboard, to type, to move the mouse pointer, to click, to take screenshots, and so on. There's also a file system editor tool for viewing, creating, and editing files. And there's a tool that allows the model to run bash commands, among others.
- Sorry, but the ROI on enterprise AI is abysmal
- Linus Torvalds affirms expulsion of Russian maintainers
- Warning! FortiManager critical vulnerability under active attack
- Gary Marcus proposes generative AI boycott to push for regulation, tame Silicon Valley
Anthropic acknowledges that this complicates AI safety. "Please be aware that computer use poses unique risks that are distinct from standard API features or chat interfaces," its documentation cautions. "These risks are heightened when using computer use to interact with the internet."
The warning continues – and it gets even better. "In some circumstances, Claude will follow commands found in content even if it conflicts with the user's instructions," the note explains. "For example, instructions on webpages or contained in images may override instructions or cause Claude to make mistakes. We suggest taking precautions to isolate Claude from sensitive data and actions to avoid risks related to prompt injection."
In short: Claude may decide to follow found instructions which, if placed deliberately, would qualify as a prompt injection attack.
As well as prompt injection from malicious images, the warning lists a litany of other potential concerns – like latency, computer vision accuracy, tool selection accuracy and errors, scrolling accuracy, and unreliable spreadsheet interaction, to name a few.
Rachel Tobac, CEO of SocialProof Security, observed: "Breaking out into a sweat thinking about how cyber criminals could use this tool. This easily automates the task of getting a machine to go to a website and download malware or provide secrets, which could scale attacks (more machines hacked in a shorter period of time)."
Anthropic recommends that developers experimenting with Claude's computer use API "take the relevant precautions to minimize these kinds of risks." ®