Google's AI boutique, DeepMind, known for dispelling human delusions of intellectual superiority by soundly beating the world's top Go players with computer code, has found that instilling its software agents with something like imagination helps them learn better.
In two papers published this week – "Imagination-Augmented Agents for Deep Reinforcement Learning" and "Learning model-based planning from scratch" – the AI biz's brain boffins, based in Britain, describe novel techniques for improving deep reinforcement learning through what can generously be described as imaginative planning.
Reinforcement learning is a form of machine learning. It involves a software agent that learns by interacting with a specific environment, usually through trial and error. Deep learning is a form of machine that involves algorithms inspired by the human brain, called neural networks. And the two techniques can be used together.
Deep reinforcement learning may be done with models that incorporate rules under which software agents should operate. To teach software how to play a video game, for example, researchers might wish to provide a model that includes information about the game, which can avoid costly trial and error during the learning process. Or researchers might opt for model-free reinforcement learning, with the expectation that the software agent will pick up gameplay on its own, eventually.
Each approach has its downside, with model-based methods missing information not captured in the model and model-free methods requiring large data sets and lacking in behavioral flexibility. DeepMind's work attempts to offer the best of both worlds.
"Without making any assumptions about the structure of the environment model and its possible imperfections, our approach learns in an end-to-end way to extract useful knowledge gathered from model simulations – in particular not relying exclusively on simulated returns," the researchers explain in their first paper. "This allows the agent to benefit from model-based imagination without the pitfalls of conventional model-based planning."
They're describing software that thinks before it acts.
DeepMind's researchers propose a software agent that learns by building, evaluating, and executing a plan. It combines trial-and-error learning with simulation as a form of pre-flight check, in order to evaluate the most promising paths while avoiding obvious dead ends.
The researchers tested their imaginative agent with Sokoban, a puzzle-oriented video game, created in Japan in 1981, that involves moving boxes around a warehouse, and a spaceship navigation game.
Sokoban allows boxes to be pushed but not pulled, which means there may be some moves that render the puzzle unsolvable. A human player, thus, would be well advised to plan moves ahead of time. The DeepMind agent, because it's capable of such planning too, is also well suited for this game, the researchers suggest.
The imaginative agent managed to solve 85 per cent of the Sokoban levels presented, compared to 60 per cent for a standard model-free agent. It also surpassed a copy-model agent, an enhanced version of the standard agent that doesn't use imaginative planning.
"For both tasks, the imagination-augmented agents outperform the imagination-less baselines considerably: they learn with less experience and are able to deal with the imperfections in modeling the environment," the researchers explain in a blog post. "Because agents are able to extract more knowledge from internal simulations, they can solve tasks more with fewer imagination steps than conventional search methods, like the Monte Carlo tree search."
Thinking before acting makes machine learning efforts slower, but the researchers contend, "This is essential in irreversible domains, where actions can have catastrophic outcomes, such as in Sokoban." ®