A reinforcement-learning algorithm was open-sourced this week by Facebook that can train AI bots to navigate simulations, with each droid armed with just a camera, GPS, and a compass – and no map.
Dubbed decentralized distributed proximal policy optimization, or DDPPO, it’s an architecture that scales up proximal policy optimization [PDF], an algorithm developed by OpenAI, across multiple computers. Proximal policy optimization can train several bots or workers at the same time in simulation, allowing the whole system to accrue more experience quickly.
In DDPPO, the training process is spread across numerous computer systems, too, and there isn’t a centralized server holding all the parameters of a neural network. The code for it can be found here.
Facebook AI brainiacs were able to use DDPPO to produce an agent that can make its way across various simulated environments. It can be dumped at random spots in a simulation, and given the goal to reach a target location using a camera, GPS coordinates, and a compass. The AI can find its way 99.9 per cent of the time, failing only once every 1,000 evaluations, and it often chooses a path that deviated slightly, about 3 per cent on average, from the shortest possible route, we're told.
“Using DDPPO, we train agents for 2.5 billion steps of experience with 64 Tesla V100 GPUs in 2.75 days – 180 GPU-days of training, the equivalent of 80 years of human experience,” the team wrote in a paper [PDF] released this week describing their work. It is expected to be presented at the International Conference on Learning Representations (ICLR) 2020, to be held in Ethiopia in April.
Specifically, the agents were trained to navigate various realistic-looking simulations of people’s homes, complete with walls, rooms, doors, and wooden floors, all generated in Facebook’s AI Habitat.
During the training process, the bots learned through trial and error to reach their goal using GPS and compass readings, and a first-person camera view, working out how best to get to the target location. After completing each virtual world, the agents updated the distributed model with any new-found knowledge in order to improve themselves.
Sticks and stones may break your bones but robot taunts will hurt you – in games at leastREAD MORE
Through taking 2.5 billion steps, the software learned to, say, avoid taking wrong turns that could lead to obstacles. The researchers said the code learned to “exploit the structural regularities in layouts of real indoor environments,” or in other words, learned common building design elements and which ones would go to dead ends. When they tested blind bots – ones without any camera inputs – the software's performance dropped to about 50 per cent, compared to 99 per cent for long routes.
At the moment DDPPO has only been tested in simulation, although Facebook hopes to apply it to physical robots one day. Crucially, the software should be able to cope in the real world, a world in which maps aren't always accurate or available.
“An unfortunate fact about maps is that they become outdated the moment they are created,” noted Erik Wijmans – first author of the paper, a Facebook intern, and Georgia Institute of Technology student – and Abhishek Kadian, second author and Facebook techie.
“Most real-world environments evolve — buildings and structures change, objects are moved around, and people and pets are in constant flux. By learning to navigate without a map, DDPPO-trained agents will accelerate the creation of new AI applications for the physical world.”
Maybe then we'll finally get robots that can deliver pizza directly to our desks for lunch, wherever we are – or whatever it is that Silicon Valley thinks will make the world a much more agreeable place. ®