As artificially-intelligent software continues to outperform humans, it's natural to be anxious about the future. What exactly is stopping a neural network, hard at work, from accidentally hurting or killing us all?
The horror of a blundering AI injuring people has been explored heavily in science fiction; to this day, boffins have been trying to stop fiction from turning into fact.
Boffins such as Zachary Lipton, of the University of California, San Diego, and Jianfeng Gao, Lihong Li, Jianshu Chen, and Li Deng, of Microsoft Research. They're trying to get computers to never forget the consequences of serious errors of judgement.
They're attempting to condition AI agents to “fear.”
Machines can’t actually feel fear like humans can. It’s a complex biological function that is triggered by a variety of stimuli and is extremely difficult, or maybe even impossible, to recreate in code.
AI won't have the racing heart, sweaty palms, and can't scream. But there may be a way, however, to encode similar effects of human fear into an electronic brain, so it’s hardwired into its system – something the aforementioned researchers call “intrinsic fear.”
AI agents can be coaxed into making good decisions by programming them to chase after rewards using deep reinforcement learning (DRL).
The technique is often used to teach AI to play games. Each time a machine makes a good move, it is rewarded with a high score. Over time, it learns which moves are more likely to maximize reward, and its performance improves.
A paper written by the aforementioned researchers is under review [PDF] for the International Conference on Learning Representations (ICLR 2017), and it shows that DRL can be turned on its head. If machines can be rewarded for good decisions, they can also be punished for making wrong decisions.
I, for one, do not welcome our AI overlords
As we're told in the paper's abstract, AI systems can wind up repeating mistakes that end lives or smash things beyond repair:
To use deep reinforcement learning in the wild, we might hope for an agent that would never make catastrophic mistakes. At the very least, we could hope that an agent would eventually learn to avoid old mistakes.
Owing to the use of function approximation, these agents eventually forget experiences as they become exceedingly unlikely under a new policy. Consequently, for as long as they continue to train, state-aggregating agents may periodically relive catastrophic mistakes.
To address this, the five experts propose something called the “danger model.” The goal is for the agent to be able to skirt around potential dangers by rectifying itself before it reaches a catastrophic failure, such as getting shot, falling off a cliff or crashing into a wall in a game.
The danger model is a separate neural network that is trained to identify the likelihood that the agent will reach the failure in a certain number of moves. As it plays the game, it learns to identify which moves will lead to it making a mistake based on experience.
Robot turned killer ... from the movie Ex-Machina
These moves are considered “danger states,” where there is a high probability that it will lead to a breakdown. “Intrinsic fear” is a negative variable in the reward function that grows in size and penalizes the system every time it enters a danger state.
If the danger model is to succeed, the punishment must be just, Zachary Lipton, coauthor of the paper and researcher at UC San Diego, told The Register.
Lipton compares it to how humans can develop phobias. “If you stray too close to the edge of a roof, your fear system kicks in and you step away to stop yourself from falling over. But if the fear you feel is so powerful and it stays with you, you might develop an irrational fear where you never step foot on a roof ever again,” he said.
Teetering on the edge
If the system is fined too harshly, or if the danger model starts looking out for danger states too early on, then it’ll become overly anxious and panicky. The agent will start avoiding states that aren’t dangerous, as it can no longer tell the difference between what will and what won’t lead to a catastrophe.
Deciding the final terms of the danger model is tricky and it depends on the application. In the paper, the researchers have only used it to play two games: Adventure Seeker and Cart-Pole.
- In Adventure Seeker, the agent is a skier on top of a hill seeking more powerful adrenaline rushes as it tries to climb higher without falling off. It can choose to accelerate its speed to go higher or slow down to go lower. If it goes too high too quickly it’ll make the mistake of falling off.
- Cart-Pole is similar. The agent has to balance a pole on a cart. The catastrophe is that the pole can fall to the left or right, or the cart can run too far to the left or right of the screen boundary.
To decide how big the fear factor should be, the separate danger model neural network has to play the games and recognize what the catastrophes are first, before it can learn to avoid them. Knowledge from the danger model is given to the agent – who is also a neural network – before it has a go at playing the game.
Don’t stab me!
The researchers compare it to the idea of “a parent scolding a child for running around with a knife. The child can learn to adjust its behavior without actually having to stab someone.” The danger model is like the parent with prior knowledge that knives are dangerous, and the agent is like the child who gets punished for getting into danger.
An equally macabre situation given as an example in the paper is the robot barber. It might receive positive feedback for giving a closer shave, and this reward encourages the robot to bring the blade closer to the skin. As it gets increasingly closer, it enters hazardous territory and must learn to classify the danger states and pull away to prevent a potential bloodbath.
The danger model would have to be refined during simulation before the agent could carry out actions in real life. It’s too early to say if intrinsic fear could be used to prevent collisions in autonomous cars, because it hasn’t been tested on real-word applications.
Until then, it’s still all fun and games. ®