Analysis Machines have triumphed again. Libratus, a powerful computer program, has crushed its human opponents at a heads-up no-limit Texas hold’em poker tournament held at Rivers Casino in Pittsburgh, Pennsylvania, winning $1,776,250 over 120,000 hands.
“Heads up no-limit Texas hold’em is – in a way – the last frontier standing within the foreseeable future. Of course, new things can come later. But of all of the games, where AI research has been significantly conducted – by which I mean multiple decades of research – all the other games like Othello, checkers, chess, Go, limit no Texas hold’em, Jeopardy! ... and so forth are such that the best AI has surpassed the best humans.
“But heads up no-limit Texas hold’em remained elusive in that never before has it been possible to beat the absolute top no limit Texas hold’em professionals. And in this event, this actually happened. So this is a landmark really in AI game playing.”
It was trebles for Sandholm and his PhD student, Noam Brown, on Twitter. Andrew Ng, a prominent AI researcher at Baidu and Stanford University, said the achievement was comparable to IBM’s Deep Blue, which beat Garry Kasparov at Chess and DeepMind’s AlphaGo, which beat Lee Sedol at Go.
CMU just made history: AI beats top humans at Texas Hold'em poker. A stunning accomplishment, comparable to Deep Blue & AlphaGo!— Andrew Ng (@AndrewYNg) January 31, 2017
Over 20 days, four human poker players stared at multiple computer screens for ten hours a day with mounting frustration as they were repeatedly thrashed by their superior opponent, Libratus.
It was “demoralizing” to wake up and lose everyday, said Jason Les, a professional poker player, who finished fourth in the competition.
“I’m just so impressed with the quality of poker Libratus plays. We make a living trying to find vulnerabilities and strategies – that’s what we do every day when we play heads up no-limit. So if the public had any doubt about the quality of this technology, I can tell you from our experience, we tried everything we could but it was too strong.”
It was hardly a close match, with Libratus swooping in to take the lead from the very first day. In the evenings, the professional players Jason Les, Dong Kim, Daniel McAulay and Jimmy Chou would get together to compare notes. They analyzed the game and tried to come up with strategies to defeat the enemy, and it did work for a while.
In the early days of the competition, there were signs of hope as Les beat the machine to end on a positive $49,072, while his teammates were still in the negative. The poker pros fought mercilessly and bounced back to narrow Libratus’ win. They even seized their first six-figure win.
Second session of the day I go +40k and Dong goes +30k for the human team to put up its first six figure day of +110k #BrainsVsAI— Jason Les (@heyitscheet) January 17, 2017
The taste of victory was short-lived, as Libratus came back stronger and scooped a huge win eight days into the competition. As it continued to play poker, the machine learned to adjust its strategies, improving over time.
The constant upgrade in difficulty is what made it challenging for the players. It’s “extremely tough as the AI keeps getting better,” Kim told viewers while answering questions over a live stream on Twitch.
Libratus upped its game, crushing the chances of victory for team mortal, and charged to the finish line to win a whopping $1,776,250 – equivalent to 14.7 big-blinds per hundred or 147 milli-big-blinds per hand.
The large score is of “statistical significance” and a convincing win for the computer, the researchers say. It wasn’t down to a simple run of good cards, as the game was set up in a way to minimize the effect of luck. The four players were split into two teams of two people. One team plays in the open while the other team is locked in a room with no phones or outside communication. The locked-away team are dealt the same cards at the open team but with places switched: the open team humans gets the locked-away AI's hole cards, the locked-away humans get the open AI's hole cards, and so on. This is supposed to cancel out any run good effects.
It's not all bad news for the humans, as they take away a proportion of the $200,000 prize depending on how well they played relative to each other.
All you need is a supercomputer and algorithms
The exact details on how Libratus works will remain unclear until the researchers analyze the results and publish their work in a paper. However, Sandholm and Brown have provided snippets of information. It’s not the first time CMU has built a poker bot to challenge humans. The previous "Brains vs AI" poker match in 2015 saw Claudico, Libaratus’ predecessor, lose to Dong Kim, Jason Les, Bjorn Li, and Doug Polk – the number one poker player at the time.
Poker is a difficult game for machines to master as it’s an imperfect information game. Players do not have equal knowledge about the game state due to hidden cards. Many researchers, including the team who recently published a paper on their own poker computer program DeepStack, use a technique called counterfactual regret minimization (CFR) to compute imperfect information games.
Counterfactual values that represent possible outcomes are picked, and the computer chooses the best possible move based on a decision tree and knowledge of previous strategies learned through training.
An important factor lies in the improved “end-game solving,” according to a paper [PDF] by CMU's Sandholm and Brown. “Unlike perfect-information games, imperfect-information games cannot be decomposed into subgames that are solved independently. Thus more computationally intensive equilibrium-finding techniques are used, and abstraction – in which a smaller version of the game is generated and solved – is essential. Endgame solving is the process of computing a (presumably) better strategy for just an endgame than what can be computationally afforded for the full game,” the paper's abstract reads.
Libaratus's approach to tackling this is similar to that of DeepStack, another computer program that also bests human players at no-limit Texas hold’em.
Both programs – Libaratus and DeepStack – try to home in on the best possible winning strategy by attempting to solve for the Nash equilibrium – a solution in game theory which states that no player has an incentive to change his or her strategy after an opponent has made their move. Every player has picked the best line of attack based on their rivals' actions, basically.
The software can't find a perfect solution of the Nash equilibrium, though, due to the complexity of poker and the way in which the gameplay is abstracted into mathematical form. Getting close to the equilibrium is key to winning, and it’s an area were Claudico was weak whereas Libratus was rather good.
It’s difficult to compare the abilities of Libratus with DeepStack without them playing against each other. However, Libratus definitely has the edge in computational power, powered by the Bridges system at the Pittsburgh Supercomputing Center, which can achieve 1.35 PFLOPS – or more than a quadrillion floating-point math calculations per second.
Libratus gobbled up approximately 19 million core hours of computing, equivalent to 3,300 laptops generating over 2,600TB of data throughout the tournament. DeepStack was more modest: it's essentially a neural network with seven layers and uses deep learning algorithms – whereas Libratus used reinforcement learning to solve Nash equilibrium algorithms.
Could poker-playing AI lead to General AI?
“Since the earliest days of AI research, beating top human players has been a powerful measure of progress in the field,” Sandholm said earlier.
“That was achieved with chess in 1997, with Jeopardy! in 2009 and with the board game Go just last year. Poker poses a far more difficult challenge than these games, as it requires a machine to make extremely complicated decisions based on incomplete information while contending with bluffs, slow play and other ploys.”
The victory for Libratus has ignited fear over the state of online poker. Many viewers watching the live stream on Jason Les’ Twitch channel flooded the chatroom with "RIP online poker" messages.
But the fear of losing to bots online or possible cheating with poker bots is over-exaggerated. Poker is normally played in a multi-player environment, and having to consider more than one player makes solving for the Nash equilibrium much more complex. Basically, today's robo-players triumph in heads-up battles, not at a table of five, six, or more, players vying to pick up the pot.
Building a poker bot as good as Libratus is also a major task, as it requires a healthy sized supercomputer. Libratus probably won’t be playing anyone online anytime soon, as it costs too much to run.
Sandholm doesn’t see Libratus as a threat. Instead it adds “whole new depths to the game” and has made it “more interesting,” rather than killing it, he said during a live interview on Twitch.
The algorithms aren’t game-dependent. They can be applied in imperfect information environments to find the best strategies and can be adapted for negotiation and bargaining – applicable for cyber security, finance and the military.
It might even pave the way to general AI, says Brown. “If the field of AI is to achieve its goal of general AI, it needs to be able to address this problem of uncertainty which comes up a lot in real life. We see these algorithms are being used in this bot – it’s really advancing the field for those problems. How do you deal with uncertainty in real life?” ®