OpenAI, DeepMind double team to make future AI machines safer

New algorithm keeps humans firmly in the loop during training


Researchers from OpenAI and DeepMind are hoping to make artificial intelligence safer using a new algorithm that learns from human feedback.

Both companies are experts in reinforcement learning – an area of machine learning that rewards agents if they take the right actions to complete a task under a given environment. The goal is specified through an algorithm, and the agent is programmed to chase the reward, like winning points in a game.

Reinforcement learning has been successful in teaching machines how to play games like Doom or Pong or drive autonomous cars via simulation. It’s a powerful method to explore an agent’s behavior, but it can be dangerous if the hard-coded algorithm is wrong or produces undesirable effects.

A paper published in arXiv describes a new method that could help prevent such problems. First, an agent carries out a random action in its environment. The reward predicted is based on human judgement, and this is fed back into the reinforcement learning algorithm to change the agent’s behavior.

The system learns the goal from working out the best actions to take, after human guidance

The researchers applied this to the task of teaching what looks like a bendy lamp post to backflip. Two short video clips of the agent are shown to a human, who picks which one is better at backflipping.

Over time, the agent gradually learns how to narrow down on the reward function that best explains the human’s judgements to learn its goal. The reinforcement learning algorithm directs its actions and it continues to seek human approval to improve.

Youtube Video

It took less than an hour of a human evaluator’s time. But for more complex tasks like cooking a meal or sending emails, it would require more human feedback, something that could be financially expensive.

Dario Amodei, co-author of the paper and a researcher at OpenAI, said reducing supervision was a potential area to focus on for future research.

“Broadly, techniques known as semi-supervised learning could be helpful here. Another possibility is to provide a more information-dense form of feedback such as language, or letting the human point to specific parts of the screen that represent good behavior. More information-dense feedback might allow the human to communicate more to the algorithm in less time,” he told The Register.

The researchers have tested their algorithm on other simulated robotics tasks and Atari games, and results show the machines can sometimes achieve superhuman performance. But it depends heavily on the human evaluator’s judgements.

“Our algorithm’s performance is only as good as the human evaluator’s intuition about what behaviors look correct, so if the human doesn’t have a good grasp of the task, they may not offer as much helpful feedback,” OpenAI wrote in a blog post.

Amodei said that at the moment the results are limited to very simple environments. But it could be useful for tasks where it’s difficult to learn because the reward function is hard to quantify – such as driving, organizing events, writing, or providing tech support. ®

Similar topics

Narrower topics


Other stories you might like

  • Minimal, systemd-free Alpine Linux releases version 3.16
    A widespread distro that many of its users don't even know they have

    Version 3.16.0 of Alpine Linux is out – one of the most significant of the many lightweight distros.

    Version 3.16.0 is worth a look, especially if you want to broaden your skills.

    Alpine is interesting because it's not just another me-too distro. It bucks a lot of the trends in modern Linux, and while it's not the easiest to set up, it's a great deal easier to get it working than it was a few releases ago.

    Continue reading
  • Verizon: Ransomware sees biggest jump in five years
    We're only here for DBIRs

    The cybersecurity landscape continues to expand and evolve rapidly, fueled in large part by the cat-and-mouse game between miscreants trying to get into corporate IT environments and those hired by enterprises and security vendors to keep them out.

    Despite all that, Verizon's annual security breach report is again showing that there are constants in the field, including that ransomware continues to be a fast-growing threat and that the "human element" still plays a central role in most security breaches, whether it's through social engineering, bad decisions, or similar.

    According to the US carrier's 2022 Data Breach Investigations Report (DBIR) released this week [PDF], ransomware accounted for 25 percent of the observed security incidents that occurred between November 1, 2020, and October 31, 2021, and was present in 70 percent of all malware infections. Ransomware outbreaks increased 13 percent year-over-year, a larger increase than the previous five years combined.

    Continue reading
  • Slack-for-engineers Mattermost on open source and data sovereignty
    Control and access are becoming a hot button for orgs

    Interview "It's our data, it's our intellectual property. Being able to migrate it out those systems is near impossible... It was a real frustration for us."

    These were the words of communication and collaboration platform Mattermost's founder and CTO, Corey Hulen, speaking to The Register about open source, sovereignty and audio bridges.

    "Some of the history of Mattermost is exactly that problem," says Hulen of the issue of closed source software. "We were using proprietary tools – we were not a collaboration platform before, we were a games company before – [and] we were extremely frustrated because we couldn't get our intellectual property out of those systems..."

    Continue reading

Biting the hand that feeds IT © 1998–2022