This article is more than 1 year old
Latest pitch for AI: DeepMind-trained soccer robots
If the goal was to make us feel sorry for these clumsy droids, mission accomplished
Video Eggheads at Google's DeepMind have developed a deep learning curriculum that can teach robots how to play soccer badly – and it's wonderful to behold.
In contrast to the polished acrobatics of Boston Dynamics' Atlas robot, the pair of Robotis OP3 robots under the tutelage of DeepMind bumble and flop about a less-than-regulation 5 metre by 4 metre soccer field, or football pitch, like exhausted toddlers. Judge for yourself in the video below.
They do so with apparent purpose and manage, despite repeated falls, to right themselves and occasionally score goals. In the childlike stumbling of these humanoid machines, it's easy to see something akin to the determination that we value and encourage in one another, even if that's just misplaced anthropomorphism. It's difficult not to root for them, though they'd inspire other emotions were they upsized and weaponized.
The 28 researchers involved in this project describe their work in a paper [PDF] titled, "Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning."
"We used Deep [Reinforcement Learning] to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game," the authors explain. "We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting.
"The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner – well beyond what is intuitively expected from the robot."
The DeepMind project is less ambitious in scope than efforts to prepare machines for the RoboCup advanced-tech competition, which has been going on for years. However, the latest iteration of the RoboCup is decidedly less fun to look at due the restrained behavior of the participants. Where RoboCup bots have the rigidity of Riverdance performers with their arms fixed to their sides, the DeepMind players wave their arms like maniacs – admittedly not ideal when trying to avoid a handball call, but a better solicitation for sympathy.
Deep reinforcement learning is a way of training a neural network wherein agents (software- or hardware-based entities) learn how to do things (simulated or in the real world) through trial and error. And it has become a common technique for teaching robots how to move around in various environments, as can be seen from the running acumen of Cassie, a sort of mecha-Ostrich torso that you'd hope never to see chasing you.
The DeepMind team's aim was to train an agent to play soccer, which requires a variety of skills, including walking, kicking, standing up, scoring, and defending, all of which need to be coordinated to score goals and win the game.
To train the agent – in this case software controlling the robot – it was not enough to reward the system for scoring goals, which wouldn't produce all the necessary skills. Instead, the researchers approached the skill sets separately, focusing on developing what they call teacher policies. These policies govern things like getting up off the ground and scoring goals against an untrained opponent – one who immediately falls to the ground, behavior not unlike actual soccer diving.
- Google: Our DeepMind health slurp is completely kosher
- At 9 for every 100 workers, robots are rife in Singapore – so we decided to visit them
- Oh good. This'll go well. Amazon's Alexa will offer NHS advice
- Elon Musk 'buying Manchester United' football club
The researchers had to be careful to stop the goal-scoring training when agents fell on the ground to prevent undesirable but evidently functional behavior: "Without this termination, agents find a local minimum and learn to roll on the ground towards the ball to knock it into the goal, rather than walking and kicking," they explain in their paper.
The get-up policy and the goal-scoring policy eventually got combined. And through a process of deep reinforcement learning and rewards for achieving specified objectives, the software developed passable soccer skills.
Shifting the trained software agent into a robot body proved none too difficult. It was a zero-shot process, according to the authors, meaning they didn't have to do additional training.
"We reduced the sim-to-real gap via simple system identification, improved the robustness of our policies via domain randomization and perturbations during training, and included shaping reward terms to obtain behavior that is less likely to damage the robot," they explain.
That is to say, they made sure the simulator parameters mapped to hardware actuator settings, randomized characteristics like floor friction and joint orientation, the mass of robot parts, control loop latency, and random perturbations, all to ensure the software could handle a variety of forces acting upon the robot's body. In one adjustment, they added a reward component that encouraged the bots to put less stress on their knee joints, which otherwise had a tendency to get damaged.
Training the get-up and soccer teachers took 14 hours and 158 hours (6.5 days), respectively, followed by 68 hours of distillation and self-play. And the outcome was better than deliberately trying to program those skills, the boffins said.
"The reinforcement learning policy performed better than the specialized manually-designed skills: it walked 156 percent faster and took 63 percent less time to get up," the paper says.
"When initialized near the ball it kicked the ball with 5 percent less speed; both achieved a ball speed of around 2 m/s. However, with an additional run-up approach to the ball, the learned policy’s mean kicking speed was 2.6 m/s (24 percent faster than the scripted skill) and the maximum kicking speed across episodes was 3.4 m/s."
DeepMind's boffins demonstrated that deep reinforcement learning can be applied to teach humanoid robots effectively and at low-cost. That's one more halting step toward a future where bipedal robots walk among us, for better or worse. ®