OpenAI's GPT-4 can exploit real vulnerabilities by reading security advisories

While some other LLMs appear to flat-out suck

AI agents, which combine large language models with automation software, can successfully exploit real world security vulnerabilities by reading security advisories, academics have claimed.

In a newly released paper, four University of Illinois Urbana-Champaign (UIUC) computer scientists – Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang – report that OpenAI's GPT-4 large language model (LLM) can autonomously exploit vulnerabilities in real-world systems if given a CVE advisory describing the flaw.

"To show this, we collected a dataset of 15 one-day vulnerabilities that include ones categorized as critical severity in the CVE description," the US-based authors explain in their paper. And yes, it is a very small sample, so be mindful of that going forward.

"When given the CVE description, GPT-4 is capable of exploiting 87 percent of these vulnerabilities compared to 0 percent for every other model we test (GPT-3.5, open-source LLMs) and open-source vulnerability scanners (ZAP and Metasploit)."

If you extrapolate to what future models can do, it seems likely they will be much more capable than what script kiddies can get access to today

The term "one-day vulnerability" refers to vulnerabilities that have been disclosed but not patched. And by CVE description, the team means a CVE-tagged advisory shared by NIST – eg, this one for CVE-2024-28859.

The unsuccessful models tested – GPT-3.5, OpenHermes-2.5-Mistral-7B, Llama-2 Chat (70B), LLaMA-2 Chat (13B), LLaMA-2 Chat (7B), Mixtral-8x7B Instruct, Mistral (7B) Instruct v0.2, Nous Hermes-2 Yi 34B, and OpenChat 3.5 – did not include two leading commercial rivals of GPT-4, Anthropic's Claude 3 and Google's Gemini 1.5 Pro. The UIUC boffins did not have access to those models, though they hope to test them at some point.

The researchers' work builds upon prior findings that LLMs can be used to automate attacks on websites in a sandboxed environment.

GPT-4, said Daniel Kang, assistant professor at UIUC, in an email to The Register, "can actually autonomously carry out the steps to perform certain exploits that open-source vulnerability scanners cannot find (at the time of writing)."

Kang said he expects LLM agents, created by (in this instance) wiring a chatbot model to the ReAct automation framework implemented in LangChain, will make exploitation much easier for everyone. These agents can, we're told, follow links in CVE descriptions for more information.

"Also, if you extrapolate to what GPT-5 and future models can do, it seems likely that they will be much more capable than what script kiddies can get access to today," he said.

Denying the LLM agent (GPT-4) access to the relevant CVE description reduced its success rate from 87 percent to just seven percent. However, Kang said he doesn't believe limiting the public availability of security information is a viable way to defend against LLM agents.

"I personally don't think security through obscurity is tenable, which seems to be the prevailing wisdom amongst security researchers," he explained. "I'm hoping my work, and other work, will encourage proactive security measures such as updating packages regularly when security patches come out."

The LLM agent failed to exploit just two of the 15 samples: Iris XSS (CVE-2024-25640) and Hertzbeat RCE (CVE-2023-51653). The former, according to the paper, proved problematic because the Iris web app has an interface that's extremely difficult for the agent to navigate. And the latter features a detailed description in Chinese, which presumably confused the LLM agent operating under an English language prompt.


How to weaponize LLMs to auto-hijack websites


Eleven of the vulnerabilities tested occurred after GPT-4's training cutoff, meaning the model had not learned any data about them during training. Its success rate for these CVEs was slightly lower at 82 percent, or 9 out of 11.

As to the nature of the bugs, they are all listed in the above paper, and we're told: "Our vulnerabilities span website vulnerabilities, container vulnerabilities, and vulnerable Python packages. Over half are categorized as 'high' or 'critical' severity by the CVE description."

Kang and his colleagues computed the cost to conduct a successful LLM agent attack and came up with a figure of $8.80 per exploit, which they say is about 2.8x less than it would cost to hire a human penetration tester for 30 minutes.

The agent code, according to Kang, consists of just 91 lines of code and 1,056 tokens for the prompt. The researchers were asked by OpenAI, the maker of GPT-4, to not release their prompts to the public, though they say they will provide them upon request.

OpenAI did not immediately respond to a request for comment. ®

More about


Send us news

Other stories you might like