In the battle between Microsoft and Google, LLM is the weapon too deadly to use

The only winning move is not to play

Opinion “Discoveries of which the people of the United States are not aware may affect the welfare of this nation in the near future.”

That's not the opening line of the open letter by hundreds of industry luminaries last week warning of the "risk to humanity" of unchecked LLM AI/ML, but that of the Szilard Petition some 80 years go, when nearly a hundred scientists at the Manhattan Project begged President Truman not to nuke Japan.

It might seem purest hype to compare large language model AI to atomic technology – it's not as if ChatGPT can destroy all life in a handful of minutes. Yet the early history of nuclear tech has many uncanny similarities to how AI/ML is playing out, and the parallels are too strong to ignore. There are plenty of bad outcomes short of planetary armageddon.

Let's recapitulate the first years of the Cold War. Two rival ideologies, the United States and their allies, and the Soviet Union with theirs, both seeing the other as existential threat, rushed the development and deployment of technology of awesome potential for good and bad into service without pausing for breath. The only thing that matters is not letting the other get the upper hand. Misinformation, both deliberate and unwitting, fills the media. Secrecy outstrips measured oversight, speed trumps safety, and we live with consequences to this day that cannot be undone.

All these things have their echoes in the present day. Microsoft and Google are rushing out rival LLM AI/ML systems that interact directly with the public, with no regulatory framework inside or out – indeed, Microsoft has just axed its AI Ethics and Society team. The systems are undeniably powerful, and undeniably flawed, and there is no independent regulation or inspection of either or the decisions behind their deployment. And anyone trying to make sense of the media's wide-eyed reporting of the past month or so will have seen plenty of claims reported that can't be verified.

It gets worse. On reflection, the ability of LLMs to reduce the world to radioactive ash may not be that fanciful. OpenAI's report on GPT-4's testing is required reading, listing 12 classes of danger the testers looked for – one of which is aiding proliferators of chemical, biological or nuclear weapons. Can you ask GPT-4 to help you build a nuclear device? You can, and one mitigation mentioned, that the output contains factual errors that may mislead, isn't the report's only example of unintentional dark humor.

That report does not describe a safe technology fit to be deployed to many millions of users. It describes behavior that goes beyond unreliable or misleading, but is actively deceptive – asking a TaskRabbit gig worker to decide a visual Captcha for it on the grounds that it is a visually impaired person. It describes qualitative and quantitative testing regimes that can't possibly check for all but a subset of real life usage, but which nevertheless reveal consistent and concerning classes of bad behaviour. Testing AIs is like the old saying that nothing can be made foolproof, because fools are so clever.

If this was just some technology that failed to fulfil promises, some flashy but buggy product, then time and money would be wasted, but that's nothing new in IT. As with the early atomic age, however, even eliminating the technology overnight could not call back the harm it has already done. Scientific instruments are made from metal recovered from World War II shipwrecks, because metal made after atmospheric nuclear testing started is contaminated with radioisotopes now in the atmosphere. ChatGPT and its kin are already seeding the global linguistic corpus with warts-and-all content, to say nothing of domain-specific knowledge sets. Feeding this back into language model training will have unknown effects. Should that be a concern? Yes.

There are a number of specific instances of bad things that support the idea of slamming the brakes on. Nottingham University's excellent Computerphile YouTube channel has been paying attention, with its academics reporting on both the bizarre – glitch tokens – and the deeply worrying, such as Bing Chat turning into a narcissistic bully when told it's got the year wrong.

We know what the combination of paranoia, new technology and unchecked decision making can produce – some very bad ideas indeed. In nuclear warfare, the UK had Violet Club, a massively bonkers aircraft bomb that was armed in the air by removing a stopper and letting ball bearings drain out. It could not be disarmed, nor could the bomber then land without dropping it thereafter. The US had Davy Crockett, the nuclear bazooka that probably included the soldier firing it in its blast range. We don't know the AI/ML equivalents, but there will be some.

It's not even as if Microsoft and Google can be trusted to do the right thing. This week, Microsoft is reported to be on the brink of an agreement with the EU to stop cheating in the cloud, while Google claims of breakthroughs in AI science are are under investigation. We can't say we weren't warned.

And it's not even as if LLM doesn't have a huge and important role to play in the future, just as appropriate nuclear technology has an essential role in fighting climate change. But there is no immediate desperate need to push LLMs out there without regulation, transparency, agreed safety frameworks and a megaton more caution all around.

Play with it in the lab. Develop testing systems in parallel. It'll be a level playing field and no competitive advantage will be lost. But the egos of tech kings and the deathly fear of losing out cannot be the primary drivers of exposing us, our digital society and our data ecosystem to powerful, flawed, dangerous experiments. What LLMs are doing can't be taken back. Think about that while going forward. ®

Similar topics


Send us news

Other stories you might like