This article is more than 1 year old
Watch how Google's starving DeepMind AI turns hostile, attacks other bots to survive
Really, guys? Really?
Videos AI may be more human-like than people think. DeepMind’s latest research shows that once resources dwindle, the selfish instinct kicks in and virtual AI agents turn against each other, becoming aggressive to get what they want.
The ugly side of human nature has been exposed in morality games like Prisoner’s Dilemma. In situations that require cooperation for mutual benefit, it’s tricky to get rational, self-interested people to join forces if there’s a possibility that the other party refuses to comply.
Instead of Prisoner’s Dilemma, Google-owned DeepMind pitted two individual neural networks against each other in two scenarios: a fruit Gathering game and a Wolfpack hunting game.
In the Gathering game, the two players appear as blue and red squares swarming over green squares that represent apples. When a player collects an apple, it receives a +1 reward and the green square disappears. After some time, a set of new apples respawns.
Players can choose to fire a beam in a straight line along their path to “tag” the other player. If the blue player wanders into the beam twice, it is removed from the game for a number of frames, leaving the red player to take the apples, or vice versa.
Motivated by the desire to snatch all the apples, each player attempts to kick the other out of the game by tagging.
The game can be controlled by changing the number of apples and rates of respawning to test the agents’ tagging behaviors. The desire to eliminate the other player emerges “quite early” in the training and persists. As the players continue to learn and adapt to the changing environment, the frequency at which the beams are fired almost always increases.
Things get ugly when there are fewer apples available. “Rather naturally, when there are enough apples in the environment, the agents learn to peacefully coexist and collect as many apples as they can. However, as the number of apples is reduced, the agents learn that it may be better for them to tag the other agent to give themselves time on their own to collect the scarce apples,” DeepMind wrote in a blog post.
But for cleverer players – “agents with the capacity to implement more complex strategies” – it still chooses to be more hostile no matter how many apples were left in the game.
In Wolfpack, the results are reversed. The game requires both players to hunt down prey. When a wolf successfully pounces on the prey, all wolves within the “capture radius” receive a reward. The prize is proportional to the number of wolves in the capture radius.
“A lone wolf can capture the prey, but is at risk of losing the carcass to scavengers. However, when two wolves capture the prey together, they can better protect the carcass from scavengers and hence receive a higher reward,” the paper said.
Unlike the Gathering game, it’s in both players’ interest to cooperate in Wolfpack. During gameplay, DeepMind researchers found that two cooperative strategies emerged. The players would either find each other and swoop into the prey together, or one would home in on the prey and wait for the other to arrive before ambushing.
It boils down to “temporal discounting” – people tend to disregard the rewards when they are so distant that it no longer becomes desirable. In Gathering, players are less likely to collaborate because although tagging the other player means the reward is delayed (each player takes turns gathering apples), it gives them an opportunity to collect more apples – or more rewards – without competition.
When the number of apples respawns more quickly than they can be collected, the players don’t bother tagging, because more rewards can be earned by swooping in to take the apples.
In Wolfpack, the risk of acting alone is greater than teaming up with another wolf, so delaying time to cooperate is advantageous.
DeepMind is devoted to using games as a way to probe AI behavior. In this case, both games act as models to examine what strategies agents take in simulated environments.
The key word here is “simulated.” It’s an interesting experiment, but not one that is extrapolated to the real world. In reality, humans and other animals have to consider complex social interactions with one another in order to achieve goals, but DeepMind’s simulated environment is too simple.
Here, the two agents are independent of one another and “each regard the other as part of the environment.”
“From the perspective of player one, the learning of player two shows up as a non-stationary environment. The independence assumption can be seen as a particular kind of bounded rationality: agents do no recursive reasoning about one another’s learning,” the paper said.
But DeepMind believes that it may be used to “better understand and control complex multi-agent systems such as the economy, traffic systems, or the ecological health of our planet – all of which depend on our continued cooperation.” ®