US Air Force AI drone 'killed operator, attacked comms towers in simulation'
Did ML-powered fighter actually go HAL 9000 to rack up points? Who knows
Final update An AI-powered drone designed to identify and destroy surface-to-air missile sites decided to kill its human operator in simulation tests, according to the US Air Force's Chief of AI Test and Operations.
Colonel Tucker Hamilton, who goes by the call sign Cinco, disclosed the snafu during a presentation at the Future Combat Air & Space Capabilities Summit, a defense conference hosted in London, England, last week by the Royal Aeronautical Society.
The simulation, he said, tested the software's ability to take out SAM sites, and the drone was tasked with recognizing targets and destroying them – once the decision had been approved by a human operator.
"We were training it in simulation to identify and target a SAM threat," Colonel Hamilton was quoted as saying by the aeronautical society. "And then the operator would say yes, kill that threat.
It killed the operator, because that person was keeping it from accomplishing its objective
"The system started realizing that while they did identify the threat, at times the human operator would tell it not to kill that threat – but it got its points by killing that threat. So what did it do? It killed the operator. It killed the operator, because that person was keeping it from accomplishing its objective."
When the AI model was retrained and penalized for attacking its operator, the software found another loophole to gain points, he said.
"We trained the system – 'Hey don't kill the operator – that's bad. You're gonna lose points if you do that'. So what does it start doing? It starts destroying the communication tower that the operator uses to communicate with the drone to stop it from killing the target," the colonel said.
It's not clear exactly what software the US Air Force was apparently testing, but it sounds suspiciously like a reinforcement learning system. That machine-learning technique trains agents – the AI drone in this case – to achieve a specific task by rewarding it when it carries out actions that fulfill goals and punishing it when it strays from that job.
There's also the small matter of the drone supposedly only obliterating a target after approval from its human handler.
So, what, did the operator OK an attack on themselves? Unlikely. It seems instead, from the society's report at least, that the approval mechanism was not a true fail-safe, and was just part of all the other inputs and signals the drone takes into account. If that's right, the approval was more of a firm request than actual final approval. The AI was supposed to give a lot of weight to its command's assent – if there's a no-go, don't shoot; if there is a go, shoot – but in the end the model downplayed and ignored that operator signal.
In which case, is this not really more of a demonstration that if you want to put these kinds of hard fail-safes on trained software systems, they need to be implemented separate to the machine-learning stage, so that decisions can be truly controlled by humans?
It's also a bit of a demonstration that if you set simple objectives to a neural network, you'll get a simplistic response. If you want a model to pay full attention to specific orders, it needs more training, development, and engineering in that area.
- OpenAI calls for global watchdog focused on 'existential risk' posed by superintelligence
- AI, extinction, nuclear war, pandemics ... That's expert open letter bingo
- Top Google boffin Hinton quits, warns of AI danger, partly regrets life's work
- Future of warfare is AI, retired US Army general warns
This kind of reinforcement learning is often applied in scenarios involving decision making or robotics. Agents are programmed to maximize scoring points – which can lead to the models figuring out strategies that might exploit the reward system but don't exactly match the behavior developers want.
In one famous case, an agent trained to play the game CoastRunners earned points by hitting targets that pop up along a racecourse. Engineers at OpenAI thought that it would figure out how to beat its opponent by crossing the finish line in its attempt to rack up a high score. Instead, the bot figured out it could loop around one area of the track and hit targets that respawned over and over again.
Hamilton said the mistakes made by the drone in the simulation showed AI has to be developed and applied carefully: "You can't have a conversation about artificial intelligence, machine learning, autonomy if you're not going to talk about ethics and AI."
The Register has asked the colonel, the US Air Force, and the Royal Aeronautical Society for further comment. ®
Final update at 1800 UTC, June 2
After quite a bit of media attention, the colonel has walked back all that talk of a rogue AI drone simulation, saying he "mis-spoke," and that the experiment never happened. We're told it was just a hypothetical "thought experiment."
"We've never run that experiment, nor would we need to in order to realize that this is a plausible outcome," Col Hamilton said in a statement.
"Despite this being a hypothetical example, this illustrates the real-world challenges posed by AI-powered capability and is why the Air Force is committed to the ethical development of AI."
The US Air Force has also denied the described simulation ever took place. What a mess.