Artificial intelligence... or advanced imitation? How DeepMind used YouTube vids to train game-beating Atari bot

I think I'm a clone now


Video DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos.

Typically, for this sort of research, you'd use a technique called reinforcement learning. This is a popular approach in machine learning that trains bots to perform a specific task, such as playing computer games, by tempting them with lots of little rewards.

To do this, developers have to build algorithms and models that can figure out the state of the game’s environment, identify the rewards to obtain, and then go get 'em. By seeking out these prizes, the bots should gradually progress through the game world, step by step. The goodies should come thick and fast to continuously lure the AI through levels.

But a new method, developed by DeepMind eggheads and documented in a paper this week, teaches code to play classic Atari titles, such as Montezuma’s Revenge, Pitfall, and Private Eye, without any explicit environmental rewards. Instead, an agent is asked to copy the way humans tackle the games, by analyzing YouTube footage of their play-through sessions.

rat_maze_shutterstock

DeepMind: Get a load of our rat-like AI. 'Ere, look. It solves mazes and stuff

READ MORE

Exploration games like 1984's Montezuma’s Revenge are particularly difficult for AI to crack, because it's not obvious where you should go, which items you need and in which order, and where you should use them. That makes defining rewards difficult without spelling out exactly how to play the thing, and thus defeating the point of the exercise.

For example, Montezuma’s Revenge requires the agent to direct a cowboy-hat-wearing character, known as Panama Joe, through a series of rooms and scenarios to reach a treasure chamber in a temple, where all the goodies are hidden. Pocketing a golden key, your first crucial item, takes about 100 steps, and is equivalent to 10018 possible action sequences. That’s way too big for typical reinforcement learning algorithms to cope with – there are too many sequential steps for a neural network to internalize just to obtain a single specific reward.

These sorts of rewards are therefore described as sparse: each of the steps involved to obtain the reward appears to achieve very little, and there is little in the way of an immediate bounty to guide the bot, even though together the steps would lead the player to a goal. Games like Ms Pac-Man are the opposite, and provide software agents with near instant feedback: points are racked up as she guzzles pellets and fruit, and she is punished when she gets caught by ghosts. Sparse games – such as Montezuma’s Revenge and other puzzle adventures – require agents to have much more patience than reinforcement learning usually affords.

Imitation learning

One way to get around the sparse rewards problem is to directly learn from demonstrations. After all, it's how you and I learn things, too. “People learn many tasks, from knitting to dancing to playing games, by watching videos online,” the DeepMind team wrote in their paper's abstract.

"They demonstrate a remarkable ability to transfer knowledge from the online demonstrations to the task at hand, despite huge gaps in timing, visual appearance, sensing modalities, and body differences. This rich setup with abundant unlabeled data motivates a research agenda in AI, which could result in significant progress in third-person imitation, self-supervised learning, reinforcement learning (RL) and related areas."

To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma’s Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC).

TDC taught an agent to predict the temporal distance, or difference between two frames. It learned to spot which visual features have changed between two video frames in the game, and what actions were taken in between. To generate training data, pairs of frames were chosen randomly from a given YouTube video of the game.

CDC is clever as it tracks sounds. The noises in the game correlate to actions, such as jumping or collecting items, and so it mapped these sounds to important game events. After these visual and audio features were extracted and embedded using neural networks, an agent could begin copying how humans played the game.

Here's the agent in action in Montezuma's Revenge. You can also see more footage of the computer software, trained to play Pitfall and Private Eye, here.

Youtube Video

The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, everything sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent’s game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame.

It’s a nifty trick, and the agent does reach pretty decent scores on all three games – exceeding average human players and other RL algorithms: Rainbow, ApeX, and DQfD. Crucially, it is learning to copy a person's actions, rather than master a game all by itself. It is seemingly reliant on having a good human trainer, just like we relied on good teachers at school.

deepmind_results

A table of the results for the AI agent playing the Atari games against average human scores and other RL algorithms. Image credit: Aytar et al.

Although impressive, it’s unknown how practical this all is. Can it be used for something else other than Atari games? The research is also probably pretty difficult to replicate. What hardware did the researchers use? How long did it take to train the agents? The paper doesn’t say, we asked DeepMind, and it declined to comment. ®


Keep Reading

Twitter: Our image-cropping AI seems to give certain peeps preferential treatment. Solution: Use less AI

Let's just go back to human-selected cropping, eh?

AI in the Enterprise: How can we make analytics and stats sound less scary? Let's call it AI!

Register Debate New names for old recipes

Got a problem with trust in AI? Just add blockchain, Forrester urges. Then bust out the holographic meetings. Welcome to the future

It takes 'grit' to send in a holograph to meetings instead of struggling with mute buttons yourself...

Linux Foundation projects on AI and data merge – because one of these concepts simply can't exist without the other

Open Source Summit Europe Combined org hoping to attract more industry support

AI in the enterprise: AI may as well stand for automatic idiot – but that doesn't mean all machine learning is bad

Register Debate Is AI just a rebrand of yesterday's dumb algorithms? We present the argument against this motion – and don't forget to vote

Microsoft builds image-to-caption AI so that your visually impaired coworkers can truly comprehend your boss's PowerPoint abominations

Better-than-before code to make Office more accessible

Google previews Document AI for parsing forms: Just a catch-up with AWS and Azure?

Convert a jumble of documents into neatly structured data - if you are lucky

UK reveals new 'National Cyber Force', announces Space Command and mysterious AI agency

Combined Ministry of Defence and GCHQ team has worked since April to 'transform cyber capabilities'

Biting the hand that feeds IT © 1998–2020