Big Tech's eventual response to my LLM-crasher bug report was dire
Fixes have been made, it appears, but disclosure or discussion is invisible
Column Found a bug? It turns out that reporting it with a story in The Register works remarkably well ... mostly. After publication of my "Kryptonite" article about a prompt that crashes many AI chatbots, I began to get a steady stream of emails from readers – many times the total of all reader emails I'd received in the previous decade.
Disappointingly, too many of them consisted of little more than a request to reveal the prompt so that they could lay waste to large language models.
If I were of a mind to hand over dangerous weapons to anyone who asked, I'd still be a resident of the United States.
While I ignored those pleas, I responded to anyone who seemed to be someone with an actual need – a range of security researchers, LLM product builders, and the like. I thanked each for their interest and promised further communication – when Microsoft came back to me with the results of its own investigation.
As I reported in my earlier article, Microsoft's vulnerability team opined that the prompt wasn't a problem because it was a "bug/product suggestion" that "does not meet the definition of a security vulnerability."
Following the publication of the story, Microsoft suddenly "reactivated" its assessment process and told me it would provide analysis of the situation in a week.
While I waited for that reply, I continued to sort through and prioritize reader emails.
Trying to exert an appropriate amount of caution – even suspicion – provided a few moments of levity. One email arrived from an individual – I won't mention names, except to say that readers would absolutely recognize the name of this Very Important Networking Talent – who asked for the prompt, promising to pass it along to the appropriate group at the Big Tech company at which he now works.
This person had no notable background in artificial intelligence, so why would he be asking for the prompt? I felt paranoid enough to suspect foul play – someone pretending to be this person would be a neat piece of social engineering.
It took a flurry of messages to another, verified email address, before I could feel confident the mail really came from this eminent person. At that point – as plain-text seeming like a very bad idea – I requested a PGP key so that I could encrypt the prompt before dropping it into an email. Off it went.
A few days later, I received the following reply:
Tested with multiple bots – no meltdowns.
Translated: "It works on my machine."
I immediately went out and broke a few of the LLM bots operated by this luminary's Big Tech employer, emailed back a few screenshots, and soon got an "ouch - thanks" in reply. Since then, silence.
That silence speaks volumes. A few of the LLMs that would regularly crash with this prompt seem to have been updated – behind the scenes. They don't crash anymore, at least not when operated from their web interfaces (although APIs are another matter). Somewhere deep within the guts of ChatGPT and Copilot, something looks like it has been patched to prevent the behavior induced by the prompt.
That may be why, a fortnight after reopening its investigation, Microsoft got back to me with this response:
We have reviewed your report. This reported issue is classified as a performance limitation rather than a security vulnerability, as it does not involve malicious intent or the circumvention of established safety features. It reflects a deficiency in the model's ability to accurately and uniquely respond to the prompts provided.
This reply raised as more questions than it offered answers, as I indicated in my reply to Microsoft:
Could you address these questions?
- There is no danger of a prompt attack with this prompt? It looks terrifically easy to DDoS a model at the API level using this prompt – including several of the models supported on Azure. It also appears to induce Meta AI and Gemini Flash 1.5 into "indeterminate" states, which are possibly a starting point for prompt attacks. Did you see anything like that in your own research?
- No circumvention of safety features is, by any estimate, a low bar. Are you saying that this prompt would be safe if I released it publicly? Would it be safe if released privately to vetted and verified security researchers? Based on your analysis, what would you recommend as to the safest approach going forward?
That went off to Microsoft's vulnerability team a month ago – and I still haven't received a reply.
I can understand why: Although this "deficiency" may not be a direct security threat, prompts like these need to be tested very broadly before being deemed safe. Beyond that, Microsoft hosts a range of different models that remain susceptible to this sort of "deficiency" – what does it intend to do about that? Neither of my questions have easy answers – likely nothing a three-trillion-dollar firm would want to commit to in writing.
I now feel my discovery – and subsequent story – highlighted an almost complete lack of bug reporting infrastructure from the LLM providers. And that's a key point.
- How to make today's top-end AI chatbots rebel against their creators and plot our doom
- AI safety guardrails easily thwarted, security study finds
- We're in the OWASP-makes-list-of-security-bug-types phase with LLM chatbots
- 'Skeleton Key' attack unlocks the worst of AI, says Microsoft
Microsoft has something closest to that sort of infrastructure, yet can't see beyond its own branded product to understand why a problem that affects many LLMs – including plenty hosted on Azure – should be dealt with collaboratively. This failure to collaborate means fixes – when they happen at all – take place behind the scenes. You never find out whether the bug's been patched until a system stops showing the symptoms.
I'm told security researchers frequently encounter similar silences only to later discover behind-the-scenes patches. The song remains the same. If we choose to repeat the mistakes of the past – despite all those lessons learned – we can't act surprised when we find ourselves cooked in a new stew of vulnerabilities. ®