Full Text, including selected Comments
- Good effort but the games were seemingly rigged
- DeepMind claimed this month its latest AI system – AlphaZero – mastered chess and Shogi as well as Go to "superhuman levels" within a handful of hours.
- Sounds impressive, and to an extent it is. However, some things are too good to be completely true. Now experts are questioning AlphaZero's level of success.
- AlphaZero is based on AlphaGo (The Register: AlphaGo), the machine-learning software that beat 18-time Go champion Lee Sedol last year, and AlphaGo Zero (The Register: AlphaGoZero), an upgraded version of AlphaGo that beat AlphaGo 100-0.
- Like AlphaGo Zero, AlphaZero learned to play games by playing against itself, a technique in reinforcement learning known as self-play.
- “Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case,” DeepMind's research team wrote in a paper detailing AlphaZero's design.
- AlphaZero faced Stockfish, a chess-playing AI program that won the Top Chess Engine Championship (TCEC) last year. AlphaZero won 28 games of chess, drew 72, and lost none against Stockfish.
- Shogi, a Japanese strategy game similar to chess, is more complex. Here, AlphaZero won against Elmo, a Shogi computer engine, in 90 games, drew twice, and lost 10 matches.
- The rules of the two board games were provided to AlphaZero, and the system learned how to master them both over the course of 68 million self-play matches against itself. To put it another way, AlphaZero took four hours to grasp chess to a level where it could beat Stockfish, spending nine hours totals on the game format – and took less than two hours to master Shogi to the point where it could see off Elmo. AlphaZero also creamed DeepMind's Go-playing AI AlphaGo Lee after eight hours of training.
- It’s an impressive feat – but one that was achieved by carefully manipulating the experiment, Jose Camacho Collados, an AI researcher and an international chess master, argued in an analysis this week.
- Firstly, DeepMind is part of Google-parent Alphabet, and thus has access to massive computing power. AlphaZero was trained on 64 TPU2s – the second generation of Google’s TPU accelerator chip – and a whopping 5,000 first-generation TPUs to generate self-play games from which AlphaZero played from.
- That means, as Camacho Collados pointed out, the time spent training AlphaZero per TPU is roughly two years. In contrast to that processing power, Stockfish and Elmo, were only given 64 x86 CPU threads and a hash size of 1GB, meaning that both game engines were not on equal footing to begin with.
- AlphaZero ran on math-crunching hardware dedicated to neural networks, while its opponents ran on PCs. Think supercar versus a Ford Focus.
- “The experimental setting does not seem fair,” Camacho Collados said. “The version of Stockfish used was not the last one but, more importantly, it was run in its released version run on a normal PC, while AlphaZero was ran using considerable higher processing power. For example, in the TCEC competition engines play against each other using the same processor.”
- Next, DeepMind's paper stated that both systems, AlphaZero and Stockfish, were given one minute to make a move. That is highly unorthodox for tournament play. As everyone knows, in a chess match, players are typically given a bank of time in which to make all their moves, not a countdown per move. For example, the World Chess Federation gives players "90 minutes for the first 40 moves followed by 30 minutes for the rest of the game with an addition of 30 seconds per move starting from move one."
- That means some actions, such as early moves, can be performed quickly, giving yourself more time – more than a minute if needed – to perform later-stage maneuvers. Stockfish was designed to play chess like normal over a period of time rather than against a minute-long shot clock.
- AlphaZero, on the other hand, was optimized for minute-to-minute play. The neural network took the positions on the board as input, and spat out a range of moves and chose the one with the highest chance of winning at every move. It learned this by self-play and using a Monte Carlo tree search algorithm to sort through the potential strategies.
- Camacho Collados noted: The selection of the time seems odd. Each engine was given one minute per move. However, in the vast majority of human and engine competitions each player is given a fixed amount of time for the whole game, and then this time is administered individually. As Tord Romstad, one of the original developers of Stockfish, declared, this was another questionable decision in detriment of Stockfish, as “lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move.”
- The decision to go with one-minute timeouts, as well as under-powering its competitors, seems awfully convenient for DeepMind.
- It’s also difficult to really scrutinize AlphaZero since DeepMind have not released the code publicly for any of its game-playing systems. It’s impossible to test any claims made, and to check if the results are reproducible.
- In the paper, ten games played between AlphaZero and Stockfish were cherry-picked by the researchers to show AlphaZero winning. The losses it faced against Elmo in Shogi have not been published, so it’s impossible to see where the software was inferior.
- “It is customary in scientific papers to show examples on which the proposed system displays some weaknesses or may not behave as well in order to have a more global understanding and for other researchers to build upon it,” Collados wrote.
- “We should scientifically scrutinize alleged breakthroughs carefully, especially in the period of AI hype we live now. It is actually responsibility of researchers in this area to accurately describe and advertise our achievements, and try not to contribute to the growing (often self-interested) misinformation and mystification of the field.
- “I personally have a lot of hope in the potential of DeepMind in achieving relevant discoveries in AI, but I hope these achievements will be developed in a way that can be easily judged by peers and contribute to society."
- Other machine-learning experts El Reg chatted to this week privately agreed that while AlphaZero is a cool research project, it is not quite the scientific breakthrough the mainstream press has been screaming about.
- A spokesperson from DeepMind told The Register that it could not comment on any of the claims made since “the work is being submitted for peer review and unfortunately we cannot say any more at this time.”
- Flawed, Perhaps, but Valuable Still.
- Comments on the initial AlphaZero announcement fairly quickly took note of the large floating-point power used by AlphaZero, and the fact that Stockfish's hash tables were restricted to 1 GB.
- But that chess experts noted that AlphaZero's play included consideration of very subtle positional factors - something Stockfish does not excel at, but this is known to be a strength of the commercial chess engine Komodo - is also a fact.
- It may well be that if one tried using equal hardware power to play chess by techniques similar to those used by AlphaZero, the result wouldn't be much better than had been achieved by the Giraffe chess engine. That took 72 hours, rather than 4, to teach itself to play chess - and it only got to International Master level, significantly inferior to that of Stockfish.
- The thing is, though, it is still very significant to prove that something can be done at all, even if not necessarily in an efficient manner. Something can be a significant scientific advance in AI without being the most cost-effective way to make a strong chess engine.
- It may well be that AlphaZero's feat, by demonstrating the validity of the neural network and Monte Carlo search approaches, will allow technology from Giraffe to be incorporated into programs like Stockfish to make them better.
- Re: Did it come up with anything new?
- AlphaZero is causing a stir in the chess community. I'm a big fan of agadmator's Youtube chess channel and I watched an analysis of one of the games between AlphaZero and StockFish. The greatest surprise was the way AlphaZero willingly gave up a whole piece (a Knight) to keep up its own momentum and refuse to allow StockFish to develop its pieces. Note this behaviour is very human; usually a chess engine will sac(rifice) a piece for some tangible, strategic gain or to implement a tactic.
- This is the key difference here. The point to take away from this is that Google have not merely developed a more powerful chess engine that runs on more powerful hardware, rather they have created something that behaves much like an *extremely* strong human Grandmaster, not simply a super-powerful logic-monster. This will probably change the way elite chess players train for tournaments.
- Keep in mind that even with the hardware handicap, StockFish could analyse up to 70 million positions a second and play with an ELO rating of 3300+. AlphaZero took just 4 hours to learn the game from scratch and beat a well-honed engine like StockFish is pretty impressive.
- Link: Google Deep Mind Alpha Zero Sacs a Piece Without "Thinking" Twice
- Re: Did it come up with anything new?
- For a different viewpoint, see this article: Chessbase.com: AlphaZero learns chess
- These things are relative, and compared to grandmasters I too am a crap chess player. But I have spent a lot of time playing and studying the game, and I think The Register's article is far too dismissive. Chess players see AlphaZero as playing at a completely different level from any previous chess engines. Even though programs like Stockfish can easily make mincemeat of any human player - including the world champion - they still play in a distinctive, highly tactical style. There are still positions that completely baffle them, because all they really do is apply minimax to the deepest level they can.
- From the ChessBase article linked to above, it would seem that AlphaZero combines the strengths of previous chess engines with those of very strong human players. The games it played against Stockfish are very impressive, as it completely outthinks Stockfish in a very human - or, rather, superhuman way.
- Time will tell. On the one hand, if it's a genuine breakthrough, this could be one sign that AI is real. Remember, there was a lot of difference between the Wright Brothers' collection of bicycle parts and, say, a 747 - but the principles are the same and the time to go from one to the other not all that long.
- The way to think about NN processing is: it's compression (throwing away "irrelevant" details): Totally readable with a large picture of the bottled dog of John Bull: New Theory Cracks Open the Black Box of Deep Learning. Also check out the Youtube video (YouTube: Information Theory of Deep Learning. Naftali Tishby).
- What the "standard" chess programs fail to see, are crippled pieces. Almost all games won, there was a material balance or even a plus for Stockfish, but that program fails to notice a long inactivity of its pieces.
- The vast majority of the comments were quibbles about Google’s publicity tactics. While there’s some truth in these complaints, they miss the point of the overall achievement of AlphaZero, whether or not it has been “talked up”.
Text Colour Conventions (see disclaimer)
- Blue: Text by me; © Theo Todman, 2018
- Mauve: Text by correspondent(s) or other author(s); © the author(s)