This article is more than 1 year old
Relax, Amazon workers – OpenAI-trained robo hand isn't much use (well, not right now)
Turns out replacing humans isn't that easy after all
Vid Human hands are surprisingly dexterous: they can knit clothes, stuff delivery packages with things, play the piano, and so on, albeit with practice.
Yet if you're worried machines are going to take these pleasures away from us, be assured us mortals can, for now, pick up these skills faster than robots can, judging from the following findings.
Researchers at OpenAI trained, using about a hundred years of simulated experience, a robotic system called Dactyl to rotate and orientate a cube. Dactyl exists not just in its virtual world, though. It can also control a Shadow Dexterous Hand: a metal meathook complete with five fingers, force sensors, and 24 degrees of freedom – pretty close to a human’s 27 degrees of freedom.
Here’s a video of Dactyl in action, virtually and physically. The cube it's told to fondle features a specific letter and color on each of its six faces, and it has to figure out how to manipulate the object so that it finds the requested symbol.
Over time, it discovered and mastered techniques often used by humans, such as gripping the cube between the thumb and little finger and spinning the cube around with its other fingertips.
The perils of machine learning
What’s most interesting, perhaps, is the way Dactyl was taught. Despite being trained in a simulated world, the software was able to directly transfer what it learned to a real humanoid-like mechanical hand. This is not an easy process.
The trick was to use a method dubbed domain randomization. It’s something other researchers have been exploring for a while to close the simulation-to-reality gap in robotics.
And while OpenAI managed to close that distance, there remained a noticeable gap. The software performed better when controlling a simulated hand, with a median of 50 successes compared to 13 when hooked up to real hardware, according to results published in the team's paper. And by success, they mean "the number of consecutive successful rotations until the object is either dropped, a goal has not been achieved within 80 seconds, or until 50 rotations are achieved."
"Even though randomizations and calibration narrow the reality gap, it still exists and performance on the real system is worse than in simulation," the paper stated.
In other words, in the simulation it did fine – but with the effects of gravity, imperfections in the mechanisms, and other real world effects, the software turned into a butterfingers. Indeed, during testing, the robotic hand broke down dozens of times.
Variables
The machine-learning software was trained in a range of simulated environments where some of the variables such as surface friction, the size of the object, lighting conditions, hand poses, textures, and even the strength of gravity were changed randomly. The idea was to at least attempt to prepare the model for the unpredictable universe in which we live.
“Randomized values are a natural way to represent the uncertainties that we have about the physical system and also prevent overfitting to a single simulated environment," the OpenAI team explained in a blog post on Monday this week.
"If a policy can accomplish the task across all of the simulated environments, it will more likely be able to accomplish it in the real world."
Dactyl racked up so many hours of experience in a such a short time by using Rapid, a system that trains 384 “worker machines” each with 16 CPU cores running a Proximal Policy Optimization (PPO) algorithm. Each worker machine taught itself using a simulation of the Shadow Dexterous Hand in various randomized scenarios.
A general training system
The system is built on two neural networks: one learns to track the cube’s position from images, and the other predicts future rewards for its actions, the goal being to rack up rewards for doing the right thing. PPO thus uses reinforcement learning, and Dactyl learned the best strategies to manipulate the cube by chasing points as it completed tasks – with a five-point bonus for success and a 20 point penalty for failure.
OpenAI's Dota video-game bots were also trained using Rapid and PPO algorithms, albeit using a different architecture and environment with tweaked hyper-parameters.
“After we saw the success of the Dota team with their 1v1 bot, we actually asked them to teach us the ways of Rapid, and we reached parity with our previous learning infrastructure – which we’d spent months building – after only a couple of weeks,” Jonas Schneider, a member of the technical staff at OpenAI, told The Register.
“Still, we were pretty surprised to see that we can even use the exact same optimizer code, and treat Rapid as a black-box optimizer for a simulation problem that’s completely different from the Dota problem it was developed for.”
At the moment, Dactyl can’t do much beyond rotating objects. It can do this with objects other than cubes, such as an octagonal prism, although it struggled more with spheres.
US gov quizzes AI experts about when the machines will take over
READ MORE“The vast majority of robots out there today are at one of two extremes: they can either perform very complex tasks in a constrained setting – think of a factory robot welding together rocket parts – or perform very simple tasks in an unconstrained setting – think of a Roomba,” said Schneider.
“That’s why we specifically chose to perform a very complex task in a setting where we don’t have an entirely accurate model of the hand, since we don’t know how to precisely model effects like friction, rolling, contacts and so on.”
The researchers hope that this will eventually lead to progress in building robots that can cope with our volatile and mutable reality while helping humans with chores at home and at work.
“Eventually we hope that this will lower the cost of programming robots for new tasks, which is very cumbersome and expensive today, as well as allowing to use more complex robots for settings where you might not have an engineering team on hand to carefully program them, like you would in a factory setting,” Schneider concluded. ®