We're in the OWASP-makes-list-of-security-bug-types phase with LLM chatbots
Ten ways you can blow a hole in your software by misusing AI tech
The Open Worldwide Application Security Project (OWASP) has released a top list of the most common security issues with large language model (LLM) applications to help developers implement their code safely.
LLMs include foundational machine learning models, such as OpenAI's GPT-3 and GPT-4, Google's BERT and LaMDA 2, and Meta/Facebook's RoBERTa that have been trained on massive amounts of data – text, images, and so on – and get deployed in applications like ChatGPT.
The OWASP Top 10 for Large Language Model Applications is a project that catalogs the most common security pitfalls so that developers, data scientists, and security experts can better understand the complexities of dealing with LLMs in their code.
Steve Wilson, chief product officer at Contrast Security and lead for the OWASP project, said more than 130 security specialists, AI experts, industry leaders, and academics contributed to the compendium of potential problems. OWASP offers other software security compilations, eg this one about web app flaws and this one about API blunders, if you're not aware.
"The OWASP Top 10 for LLM Applications version 1.0 offers practical, actionable guidance to help developers, data scientists and security teams to identify and address vulnerabilities specific to LLMs," Wilson wrote on LinkedIn.
"The creation of this resource involved exhaustive brainstorming, careful voting, and thoughtful refinement. It represents the practical application of our team's diverse expertise."
- LLMs appear to reason by analogy, a cornerstone of human thinking
- AI on AI action: Googler uses GPT-4 chatbot to defeat image classifier's guardian
- How to make today's top-end AI chatbots rebel against their creators and plot our doom
- Friendly AI chatbots will be designing bioweapons for criminals 'within years'
There's still some doubt that LLMs as currently formulated can really be secured. Issues like prompt injection – querying an LLM in a way that makes it respond in an undesirable way – can be mitigated through "guardrails" that block harmful output.
But that requires anticipating in advance what must be blocked from a model that may not have disclosed its training data. And it may be possible to bypass some of these defenses.
The project documentation makes that clear: "Prompt injection vulnerabilities are possible due to the nature of LLMs, which do not segregate instructions and external data from each other. Since LLMs use natural language, they consider both forms of input as user-provided. Consequently, there is no fool-proof prevention within the LLM…"
Nonetheless, the OWASP project suggests some mitigation techniques. Its goal is to give developers some options to keep models trained on toxic content from spewing out such stuff when asked and to be mindful of other potential problems.
The list [PDF] is:
- LLM01: Prompt Injection
- LLM02: Insecure Output Handling
- LLM03: Training Data Poisoning
- LLM04: Model Denial of Service
- LLM05: Supply Chain Vulnerabilities
- LLM06: Sensitive Information Disclosure
- LLM07: Insecure Plugin Design
- LLM08: Excessive Agency
- LLM09: Overreliance
- LLM10: Model Theft
Some of these risks are relevant beyond those dealing with LLMs. Supply chain vulnerabilities represent a threat that should concern every software developer using third-party code or data. But even so, those working with LLMs need to be aware that it's more difficult to detect tampering in a black-box third-party model than in human-readable open source code.
Likewise, the possibility of sensitive data/information disclosure is something every developer should be aware of. But again, data sanitization in traditional applications tends to be more of a known quantity than in apps incorporating an LLM trained on undisclosed data.
Beyond enumerating specific risks that need to be considered, the OWASP list should also help familiarize developers with the range of LLM-based attack scenarios, which may not be obvious because they're relatively novel and don't get detected in the wild as often as run-of-the-mill web or application attacks.
For example, the following Training Data Poisoning scenario is proposed: "A malicious actor, or a competitor brand intentionally creates inaccurate or malicious documents which are targeted at a model’s training data. The victim model trains using falsified information which is reflected in outputs of generative AI prompts to its consumers."
Such meddling, much discussed in academic computer science research, probably wouldn't be top of mind for software creators interested in adding chat capabilities to an app. The point of the OWASP LLM project is to make scenarios of this sort something to fix. ®