UK spy agency: Don't feed LLMs with sensitive corporate data

Oh gosh. Looks like that bot just spilled that you plan to fire 20k staffers in Q4

The UK government's spy agency is warning corporations of the risks of feeding sensitive data into public large language models, including ChatGPT, saying they are opening themselves up for a world of potential pain unless correctly managed.

Google, Microsoft and others are currently shoehorning LLMs – the latest craze in tech – into their enterprise products, and Meta's LLaMa recently leaked. They are impressive but responses can be flawed, and now Government Communications Headquarters (GCHQ) wants to highlight the security angle.

Authors David C, a tech director for Platform Research, and Paul J, a tech director for Data Science Research, ask: "Do loose prompts sink ships?" Yes, they conclude, in some cases.

The common worry is that an LLM may "learn" from a prompt by users and provide that information to others querying it for similar matters.

"There is some cause for concern here, but not for the reason many consider. Currently, LLMs are trained, and then the resulting model is queried. An LLM does not (as of writing) automatically add information from queries to its model for others to query. That is, including information in a query will not result in that data being incorporated into the LLM."

The query will be visible to the LLM provider (OpenAI for ChatGPT), and will be stored and "almost certainly be used for developing the LLM service or model at some point. This could mean that the LLM provider (or its partners/contractors) are able to read queries, and may incorporate them in some way into future versions. As such, the terms of use and privacy policy need to be thoroughly understood before asking sensitive questions," the GCHQ duo write.

Examples of sensitive data – quite apt in the current climate – could include a CEO found to be asking "how best to lay off an employee" or a person asking specific health or relationship questions, the agency says. We at The Reg would be worried – on many levels – if an exec was asking an LLM about redundancies.

The pair add: "Another risk, which increases as more organizations produce LLMs, is that queries stored online may be hacked, leaked, or more likely accidentally made publicly accessible. This could include potentially user-identifiable information. A further risk is that the operator of the LLM is later acquired by an organization with a different approach to privacy than was true when data was entered by users."

GCHQ is far from the first to highlight the potential for a security foul-up. Internal Slack messages from a senior general counsel at Amazon, seen by Insider, warned staff not to share corporate information with LLMs, saying there were instances of ChatGPT responses that appear similar to Amazon's own internal data.

"This is important because your inputs may be used as training data for a further iteration of ChatGPT, and we wouldn't want its output to include or resemble our confidential information," she said, adding it already had.

Research by Cyberhaven Labs this month indicates sensitive data accounts for 11 percent of the information employees enter into ChatGPT. It analyzed ChatGPT usage for 1.6 million workers at companies that use its data security service, and found 5.6 percent had tried it at least once at work and 11 percent had input sensitive data.

JP Morgan, Microsoft and WalMart are among other corporations to warn their employees of the potential perils.

Back at GCHQ, Messieurs David C and Paul J advise businesses to not input data they'd not like to be made public, use cloud-provided LLMs, and be very aware of the privacy policies, or use a self-hosted LLM.

We have asked Microsoft, Google and OpenAI to comment. ®

Similar topics


Send us news

Other stories you might like