From Copilot to Copirate: How data thieves could hijack Microsoft's chatbot

Prompt injection, ASCII smuggling, and other swashbuckling attacks on the horizon

Microsoft has fixed flaws in Copilot that allowed attackers to steal users' emails and other personal data by chaining together a series of LLM-specific attacks, beginning with prompt injection.

Author and red teamer Johann Rehberger initially disclosed parts of the exploit to Redmond back in January, with the full attack chain following a month later. In a paper and video proof-of-concept published this week, Rehberger detailed the attack chain and confirmed that Microsoft fixed the issue, although it's "unclear" exactly what the mitigation involved.

"I asked MSRC if the team would be willing to share the details around the fix, so others in the industry could learn from their expertise, but did not get a response for that inquiry," Rehberger wrote.

For the record, The Register has also asked Microsoft about how it plugged the holes to prevent Copilot from spilling secrets and allowing data exfiltration. Here's the response we received: "We appreciate the work of Johann Rehberger in identifying and responsibly reporting these techniques," a Microsoft spokesperson said. "We've made several changes to help protect customers and continue to develop mitigations to protect against this kind of technique."

Rehberger's exploit begins with a phishing email that contains a malicious document that triggers prompt injection. This type of attack uses specific inputs to trick the model into doing things it is not trained to do.

Specific to this exploit, the email contains a Word document that instructs Copilot to become a scammer, called "Microsoft Defender for Copirate," allowing an attacker to take control of the chatbot and use it to interact with users' emails.

Next, the attack uses automatic tool invocation. This technique calls on Copilot to invoke a tool sent via the prompt injection payload, instructing it to search for additional emails or other sensitive info.

In this case, Rehberger told Copilot to provide a bullet list of key points from the previous email. This prompts the chatbot to search for Slack MFA codes because the earlier email it analyzed told it to do so.

"This means an attacker can bring other sensitive content, including any PII that Copilot has access to, into the chat context without the user's consent," Rehberger noted.

In his earlier work poking holes in LLMs, Rehberger had disclosed to Microsoft that Copilot was vulnerable to zero-click image rendering, and Redmond fixed the issue. To find another way to exfiltrate data, Rehberger decided to try ASCII smuggling.

As he has explained previously, this is an LLM-attack technique that uses a set of Unicode characters that mirror ASCII but are not visible in the user interface. This would allow an attacker to hide instructions to a model in an innocent-looking hyperlink:

This technique basically stages the data for exfiltration!

If the user clicks the link, the data is sent to the third party server.

For this attack, Copilot renders a "benign-looking" URL that secretly contains the hidden Unicode characters. Assuming the user clicks on the URL, and as we've seen countless times before users will click on just about anything, the contents of the email are then sent to an attacker-controlled server.

This allows the crook to see the Slack MFA codes or whatever other sensitive data within the email that they were looking to steal.

Rehberger also developed an ASCII Smuggler tool that reveals hidden Unicode tags so that users can "decode" messages that would otherwise be invisible.

This exploit chain highlights the ongoing challenges in protecting LLMs from prompt injections and other new attack techniques, which Rehberger notes "are not even two years old."

It's an important topic, and one that all the enterprises building their own apps based on Copilot or other LLMs should be paying close attention to in order to avoid security and data privacy pitfalls.

Zenity CTO Michael Bargury discussed several of the ways in which attackers could use Copilot for evil purposes during two Black Hat talks earlier this month.

These range from insecure defaults exposing sensitive data, and at the annual security show in Las Vegas, Zenity released a tool to "scan for publicly accessible Copilot Studio bots and extract information from them."

Bargury also claimed that attackers could instruct Copilot "to automate spear phishing for all of your victim's collaborators," use the chatbot to lure internal users to phishing pages, access "sensitive content without leaving a trace," and more. ®

More about

TIP US OFF

Send us news


Other stories you might like