Analysis DeepMind claimed this month its latest AI system – AlphaZero – mastered chess and Shogi as well as Go to "superhuman levels" within a handful of hours.
Sounds impressive, and to an extent it is. However, some things are too good to be completely true. Now experts are questioning AlphaZero's level of success.
Like AlphaGo Zero, AlphaZero learned to play games by playing against itself, a technique in reinforcement learning known as self-play.
“Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case,” DeepMind's research team wrote in a paper detailing AlphaZero's design.
AlphaZero faced Stockfish, a chess-playing AI program that won the Top Chess Engine Championship (TCEC) last year. AlphaZero won 28 games of chess, drew 72, and lost none against Stockfish.
Shogi, a Japanese strategy game similar to chess, is more complex. Here, AlphaZero won against Elmo, a Shogi computer engine, in 90 games, drew twice, and lost 10 matches.
The rules of the two board games were provided to AlphaZero, and the system learned how to master them both over the course of 68 million self-play matches against itself. To put it another way, AlphaZero took four hours to grasp chess to a level where it could beat Stockfish, spending nine hours totals on the game format – and took less than two hours to master Shogi to the point where it could see off Elmo. AlphaZero also creamed DeepMind's Go-playing AI AlphaGo Lee after eight hours of training.
It’s an impressive feat – but one that was achieved by carefully manipulating the experiment, Jose Camacho Collados, an AI researcher and an international chess master, argued in an analysis this week.
Sorry to burst your bubble, but Microsoft's 'Ms Pac-Man beating AI' is more Automatic IdiotREAD MORE
Firstly, DeepMind is part of Google-parent Alphabet, and thus has access to massive computing power. AlphaZero was trained on 64 TPU2s – the second generation of Google’s TPU accelerator chip – and a whopping 5,000 first-generation TPUs to generate self-play games from which AlphaZero played from.
That means, as Camacho Collados pointed out, the time spent training AlphaZero per TPU is roughly two years. In contrast to that processing power, Stockfish and Elmo, were only given 64 x86 CPU threads and a hash size of 1GB, meaning that both game engines were not on equal footing to begin with.
AlphaZero ran on math-crunching hardware dedicated to neural networks, while its opponents ran on PCs. Think supercar versus a Ford Focus.
“The experimental setting does not seem fair,” Camacho Collados said. “The version of Stockfish used was not the last one but, more importantly, it was run in its released version run on a normal PC, while AlphaZero was ran using considerable higher processing power. For example, in the TCEC competition engines play against each other using the same processor.”
Next, DeepMind's paper stated that both systems, AlphaZero and Stockfish, were given one minute to make a move. That is highly unorthodox for tournament play. As everyone knows, in a chess match, players are typically given a bank of time in which to make all their moves, not a countdown per move. For example, the World Chess Federation gives players "90 minutes for the first 40 moves followed by 30 minutes for the rest of the game with an addition of 30 seconds per move starting from move one."
That means some actions, such as early moves, can be performed quickly, giving yourself more time – more than a minute if needed – to perform later-stage maneuvers. Stockfish was designed to play chess like normal over a period of time rather than against a minute-long shot clock.
AlphaZero, on the other hand, was optimized for minute-to-minute play. The neural network took the positions on the board as input, and spat out a range of moves and chose the one with the highest chance of winning at every move. It learned this by self-play and using a Monte Carlo tree search algorithm to sort through the potential strategies.
Camacho Collados noted:
The selection of the time seems odd. Each engine was given one minute per move. However, in the vast majority of human and engine competitions each player is given a fixed amount of time for the whole game, and then this time is administered individually. As Tord Romstad, one of the original developers of Stockfish, declared, this was another questionable decision in detriment of Stockfish, as “lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move.”
The decision to go with one-minute timeouts, as well as under-powering its competitors, seems awfully convenient for DeepMind.
It’s also difficult to really scrutinize AlphaZero since DeepMind have not released the code publicly for any of its game-playing systems. It’s impossible to test any claims made, and to check if the results are reproducible.
In the paper, ten games played between AlphaZero and Stockfish were cherry-picked by the researchers to show AlphaZero winning. The losses it faced against Elmo in Shogi have not been published, so it’s impossible to see where the software was inferior.
“It is customary in scientific papers to show examples on which the proposed system displays some weaknesses or may not behave as well in order to have a more global understanding and for other researchers to build upon it,” Collados wrote.
“We should scientifically scrutinize alleged breakthroughs carefully, especially in the period of AI hype we live now. It is actually responsibility of researchers in this area to accurately describe and advertise our achievements, and try not to contribute to the growing (often self-interested) misinformation and mystification of the field.
“I personally have a lot of hope in the potential of DeepMind in achieving relevant discoveries in AI, but I hope these achievements will be developed in a way that can be easily judged by peers and contribute to society."
Other machine-learning experts El Reg chatted to this week privately agreed that while AlphaZero is a cool research project, it is not quite the scientific breakthrough the mainstream press has been screaming about.
A spokesperson from DeepMind told The Register that it could not comment on any of the claims made since “the work is being submitted for peer review and unfortunately we cannot say any more at this time.” ®