Pay attention. We're only going to show you this once: OpenAI coaches robots to copy humans

Droids given 'one shot' lessons to stack blocks


The ultimate goal in robotics is the ability to train a machine to perform general tasks after it learns from a few examples. OpenAI, a non-profit artificial intelligence research organization, is a step closer to achieving this by applying a new algorithm called one-shot imitation learning to a robot arm.

In the demonstration, a human wears a virtual reality headset to stack a series of colored blocks in an imaginary world. The robot then copies what the person did in the VR simulation after seeing it once, creating a tower of blocks in the same order. The software that the robot learns from is split into two neural networks: one for vision and the other for imitation.

First, the vision component takes an input from the robot’s camera to gauge the positions of the different objects. It takes a lot of training to achieve this. Hundreds and thousands of simulated images of the objects – in various configurations with different lighting and textures – are shown to the robot.

Second, the imitation network processes the simulated demonstration to predict what steps need to be taken to replicate the actions the robot has seen. The blocks aren’t necessarily in the same starting positions seen in the demo, meaning that the robot has to generalize and perform the task in a new setting.

It may be trivial for humans, but it’s challenging for robots. Thousands of training examples need to be fed into the network for each task. It learns by tracking the full set of trajectories of the robot arm for a complete task, and looking at a single trajectory from a second demo of the same task under a slightly different environment.

The one-shot imitation learning algorithm learns to predict what actions were taken to produce the result seen in the second demonstration. It does this by examining all the movements taken in the first video. It learns the similarities between both examples, even if the demos are identically laid out.

After enough training, the robot can learn to imitate the human demonstrator in VR even though it hasn’t encountered the exact same task during training.

A process called “soft attention” makes the imitation network focus on the steps taken using the relevant block in the block stacking challenge, as well as keeping track of the locations of all the other blocks.

OpenAI says this allows the robot to adapt to “work with demonstrations of variable length,” “imitate longer trajectories,” and “stack blocks into a configuration that has more blocks than any demonstration in its training data.”

To learn how to mitigate any potential mistakes during the robot’s imitation stage, it has to learn what problems it might face first.

Researchers did this by injecting noise into the “scripted policy,” a strategy that teaches the robot how to stack the blocks in order. It could then learn how to recover when things went wrong. It’s a critical step – “without injecting the noise, the policy learned by the imitation network would usually fail to complete the stacking task,” OpenAI explained in a blog post.

The researchers would like to explore how the robot would behave with household items instead of square blocks. The goal is to eventually build a general-purpose robot that can help around the house and perform chores such as setting chairs around a table.

At the moment, the robot can stack block in “tens of seconds,” Wojciech Zaremba, a co-founder and researcher at OpenAI, told The Register. “We’re running it slowly because it makes it easier to work with and around. We could easily run it a few times faster, but more research is needed to reach human performance.” ®

Similar topics

Narrower topics


Other stories you might like

  • Verizon: Ransomware sees biggest jump in five years
    We're only here for DBIRs

    The cybersecurity landscape continues to expand and evolve rapidly, fueled in large part by the cat-and-mouse game between miscreants trying to get into corporate IT environments and those hired by enterprises and security vendors to keep them out.

    Despite all that, Verizon's annual security breach report is again showing that there are constants in the field, including that ransomware continues to be a fast-growing threat and that the "human element" still plays a central role in most security breaches, whether it's through social engineering, bad decisions, or similar.

    According to the US carrier's 2022 Data Breach Investigations Report (DBIR) released this week [PDF], ransomware accounted for 25 percent of the observed security incidents that occurred between November 1, 2020, and October 31, 2021, and was present in 70 percent of all malware infections. Ransomware outbreaks increased 13 percent year-over-year, a larger increase than the previous five years combined.

    Continue reading
  • Slack-for-engineers Mattermost on open source and data sovereignty
    Control and access are becoming a hot button for orgs

    Interview "It's our data, it's our intellectual property. Being able to migrate it out those systems is near impossible... It was a real frustration for us."

    These were the words of communication and collaboration platform Mattermost's founder and CTO, Corey Hulen, speaking to The Register about open source, sovereignty and audio bridges.

    "Some of the history of Mattermost is exactly that problem," says Hulen of the issue of closed source software. "We were using proprietary tools – we were not a collaboration platform before, we were a games company before – [and] we were extremely frustrated because we couldn't get our intellectual property out of those systems..."

    Continue reading
  • UK government having hard time complying with its own IR35 tax rules
    This shouldn't come as much of a surprise if you've been reading the headlines at all

    Government departments are guilty of high levels of non-compliance with the UK's off-payroll tax regime, according to a report by MPs.

    Difficulties meeting the IR35 rules, which apply to many IT contractors, in central government reflect poor implementation by Her Majesty's Revenue & Customs (HMRC) and other government bodies, the Public Accounts Committee (PAC) said.

    "Central government is spending hundreds of millions of pounds to cover tax owed for individuals wrongly assessed as self-employed. Government departments and agencies owed, or expected to owe, HMRC £263 million in 2020–21 due to incorrect administration of the rules," the report said.

    Continue reading

Biting the hand that feeds IT © 1998–2022