What does it take for an OpenAI bot to best Dota 2 heroes? 128,000 CPU cores, 256 Nvidia GPUs

And a lot of smart machine-learning coding, of course


OpenAI's video-game-playing bots are getting much better at mastering sci-fi strategy war game Dota 2, seeing off semi pro players with ease in team matchups.

However, they can’t quite master the whole game to beat top professional teams – yet.

Last August, machine-learning software built by the OpenAI lab headquartered in San Francisco managed to best Dendi, a pro Dota 2 player, winning two matches out of three. But the victories were only in one-on-one games – a single bot against a single human – and under very limited circumstances that are not applicable in real competitions.

Fast forward about a year, and now OpenAI's bots can play in the more traditional five-versus-five settings, beating amateurs and semi-pro gamers. The battles were restricted to mirror matches between Necrophos, Sniper, Viper, Crystal Maiden, and Lich, meaning that both teams – human and code – play with the same five heroes.

This is an impressive achievement, considering how complex Dota 2 is. It requires strategic planning and an intuition of when to attack or defend bases. Games are played with different types of characters known as heroes, and each one has their own set of unique abilities.

The different skills, possible number of actions, and the fact that it’s an imperfect information game, makes it more difficult than games such as Chess or Go. OpenAI’s team, nicknamed OpenAI Five, managed to beat human teams over a series of informal matches over the past few months.

It won two out of three games against an amateur team with an MMR score of 4.2k – within the 93rd percentile. And against a semi-pro team with an MMR rating of 5.5k – within the 99th percentile.

Game restrictions

Each OpenAI computer player is represented by a separate 1024-unit long-short-term memory network and is trained via self-play, a popular technique in reinforcement learning. The game is run over 30 frames per second for an average of 45 minutes. Researchers use the Dota 2 bot API to pass information about the game’s state to each bot.

They receive a series of 20,000 mostly floating-point numbers that encode vital information such as the location and health of visible units, giving it access to the same knowledge that human teams can have. But it also allows the bots to calculate the precise range of its attacks, something that human teams can’t do.

The game is constantly changing, and the bot receives an updated input every four frames in the game. “It’s like playing with your eyes closed and opening them every four frames,” Greg Brockman, cofounder and CTO at OpenAI, explained to The Register.

"OpenAI Five is given access to the same information as humans, but instantly sees data like positions, health, and item inventories that humans have to check manually,” the OpenAI boffins added in a blog post today.

dota

OpenAI bot bursts into the ring, humiliates top Dota 2 pro gamer in 'scary' one-on-one bout

READ MORE

So when the bots opens its eyes, it gets to see the whole map at once. This is a pretty big advantage for the machines, compared to the human team who have to move their heroes around the map to see everything.

“Map awareness is a basic skill for humans. Players can scroll about the visible map, have an on-screen minimap summarizing lots of details, have selector keys for various units, making it easy to know the full state of the game. OpenAI Five can see all pieces of information that a human is allowed to see,” Brockman said.

Both teams do have access to all the same information, but the difference is that the bots get to see everything all at once. Without the API, the researchers reckon it would take thousands more GPUs to render the pixels to give it the same experience as the human teams.

A bot can also choose when to make a move. It can do this every four frames, it can play this out on the next frame, the second frame, the third one, or the fourth one. “On average, this means OpenAI Five will see something happen (0 + 1 + 2 + 3) / 4 = 1.5 frames after it happened, and will have an opportunity to respond as soon as the next frame, yielding an average reaction time of 33 milliseconds * 2.5 = 82.5 milliseconds,” Brockman said. That’s faster than the human team.

There are also other restrictions, and abilities like warding, turning invisible or certain units that can cast certain spells. The set of possible heroes – five out of 113 – also makes the game a lot less complicated compared to the shebang.

Smells like team spirit

OpenAI Five bots don't really communicate much with each other. Instead, teamwork is controlled by something called "team spirit;" it's a hyperparameter that can be set from 0 to 1, and adds a weight onto how much each hero should care about its own individual rewards compared to the average reward from the whole team.

But there are some positives. In the 1v1 match last year, researchers had to teach the bot strategies like creep block. The new bots, however, managed to learn this on its own.

“It came up with its own strategies through self-play. It learnt to give up some of its own vulnerable territory in order to take some of its opponent’s territory,” Brooke Chan, a member of technical staff at OpenAI, told El Reg.

The bots are trained with OpenAI’s Proximal Policy Optimization and self-play. They don’t rely on any search methods or human gameplay, and can play about 180 years worth of games against itself every day – that’s a whopping 900 years per day in total for all five bots – far, far longer than any human lifetime. That’s a hell of a lot of compute needed too. The bots slurped up 128,000 CPU cores and 256 Nvidia P100 GPUs on Google Cloud.

“While today we play with restrictions, we aim to beat a team of top professionals at The International in late August subject only to a limited set of heroes,” OpenAI said.

The International is the biggest annual Dota 2 esports tournament, where winners can take home prizes worth up to several millions of dollars. “We may not succeed,” it cautioned. ®

Similar topics


Other stories you might like

  • Warehouse belonging to Chinese payment terminal manufacturer raided by FBI

    PAX Technology devices allegedly infected with malware

    US feds were spotted raiding a warehouse belonging to Chinese payment terminal manufacturer PAX Technology in Jacksonville, Florida, on Tuesday, with speculation abounding that the machines contained preinstalled malware.

    PAX Technology is headquartered in Shenzhen, China, and is one of the largest electronic payment providers in the world. It operates around 60 million point-of-sale (PoS) payment terminals in more than 120 countries.

    Local Jacksonville news anchor Courtney Cole tweeted photos of the scene.

    Continue reading
  • Everything you wanted to know about modern network congestion control but were perhaps too afraid to ask

    In which a little unfairness can be quite beneficial

    Systems Approach It’s hard not to be amazed by the amount of active research on congestion control over the past 30-plus years. From theory to practice, and with more than its fair share of flame wars, the question of how to manage congestion in the network is a technical challenge that resists an optimal solution while offering countless options for incremental improvement.

    This seems like a good time to take stock of where we are, and ask ourselves what might happen next.

    Congestion control is fundamentally an issue of resource allocation — trying to meet the competing demands that applications have for resources (in a network, these are primarily link bandwidth and router buffers), which ultimately reduces to deciding when to say no and to whom. The best framing of the problem I know traces back to a paper [PDF] by Frank Kelly in 1997, when he characterized congestion control as “a distributed algorithm to share network resources among competing sources, where the goal is to choose source rate so as to maximize aggregate source utility subject to capacity constraints.”

    Continue reading
  • How business makes streaming faster and cheaper with CDN and HESP support

    Ensure a high video streaming transmission rate

    Paid Post Here is everything about how the HESP integration helps CDN and the streaming platform by G-Core Labs ensure a high video streaming transmission rate for e-sports and gaming, efficient scalability for e-learning and telemedicine and high quality and minimum latencies for online streams, media and TV broadcasters.

    HESP (High Efficiency Stream Protocol) is a brand new adaptive video streaming protocol. It allows delivery of content with latencies of up to 2 seconds without compromising video quality and broadcasting stability. Unlike comparable solutions, this protocol requires less bandwidth for streaming, which allows businesses to save a lot of money on delivery of content to a large audience.

    Since HESP is based on HTTP, it is suitable for video transmission over CDNs. G-Core Labs was among the world’s first companies to have embedded this protocol in its CDN. With 120 points of presence across 5 continents and over 6,000 peer-to-peer partners, this allows a service provider to deliver videos to millions of viewers, to any devices, anywhere in the world without compromising even 8K video quality. And all this comes at a minimum streaming cost.

    Continue reading

Biting the hand that feeds IT © 1998–2021