Google's AlphaZero Destroys Stockfish In 100-Game Match
Klein (Mike)
Source:, 6 Dec 2017
Paper - Abstract

Paper StatisticsBooks / Papers Citing this PaperNotes Citing this PaperColour-ConventionsDisclaimer

Full Text1, including selected Comments

  1. Chess changed forever today. And maybe the rest of the world did, too.
  2. A little more than a year after AlphaGo sensationally won against the top Go player, the artificial-intelligence2 program AlphaZero has obliterated the highest-rated chess engine.
  3. Stockfish, which for most top players is their go-to preparation tool, and which won the 2016 TCEC Championship and the 2017 Computer Chess Championship, didn't stand a chance. AlphaZero won the closed-door, 100-game match with 28 wins, 72 draws, and zero losses.
  4. Oh, and it took AlphaZero only four hours to "learn" chess. Sorry humans, you had a good run.
  5. That's right -- the programmers of AlphaZero, housed within the DeepMind division of Google, had it use a type of "machine learning," specifically reinforcement learning. Put more plainly, AlphaZero was not "taught" the game in the traditional sense. That means no opening book, no endgame tables, and apparently no complicated algorithms dissecting minute differences between center pawns and side pawns.
  6. This would be akin to a robot being given access to thousands of metal bits and parts, but no knowledge of a combustion engine, then it experiments numerous times with every combination possible until it builds a Ferrari. That's all in less time that it takes to watch the "Lord of the Rings" trilogy. The program had four hours to play itself many, many times, thereby becoming its own teacher.
  7. For now, the programming team is keeping quiet. They chose not to comment to, pointing out the paper "is currently under review" but you can read the full paper here (Google Deep Mind: Mastering Chess and Shogi by Self-Play). Part of the research group is Demis Hassabis, a candidate master from England and co-founder of DeepMind (bought by Google in 2014). Hassabis, who played in the ProBiz event of the London Chess Classic, is currently at the Neural Information Processing Systems conference in California where he is a co-author of another paper on a different subject.
  8. One person that did comment to has quite a lot of first-hand experience playing chess computers. GM Garry Kasparov is not surprised that DeepMind branched out from Go to chess.
  9. "It's a remarkable achievement, even if we should have expected it after AlphaGo," he told "It approaches the 'Type B,' human-like approach to machine chess dreamt of by Claude Shannon and Alan Turing instead of brute force."
  10. AlphaZero vs. Stockfish | Round 1 | 4 Dec 2017 | 1-0 | One of the 10 selected games given in the paper.
    1. Nf3 Nf6 2. d4 e6 3. c4 b6 4. g3 Bb7 5. Bg2 Be7 6. O-O O-O 7. d5 exd5 8. Nh4 c6 9. cxd5 Nxd5 10. Nf5 Nc7 11. e4 d5 12. exd5 Nxd5 13. Nc3 Nxc3 14. Qg4 g6 15. Nh6+ Kg7 16. bxc3 Bc8 17. Qf4 Qd6 18. Qa4 g5 19. Re1 Kxh6 20. h4 f6 21. Be3 Bf5 22. Rad1 Qa3 23. Qc4 b5 24. hxg5+ fxg5 25. Qh4+ Kg6 26. Qh1 Kg7 27. Be4 Bg6 28. Bxg6 hxg6 29. Qh3 Bf6 30. Kg2 Qxa2 31. Rh1 Qg8 32. c4 Re8 33. Bd4 Bxd4 34. Rxd4 Rd8 35. Rxd8 Qxd8 36. Qe6 Nd7 37. Rd1 Nc5 38. Rxd8 Nxe6 39. Rxa8 Kf6 40. cxb5 cxb5 41. Kf3 Nd4+ 42. Ke4 Nc6 43. Rc8 Ne7 44. Rb8 Nf5 45. g4 Nh6 46. f3 Nf7 47. Ra8 Nd6+ 48. Kd5 Nc4 49. Rxa7 Ne3+ 50. Ke4 Nc4 51. Ra6+ Kg7 52. Rc6 Kf7 53. Rc5 Ke6 54. Rxg5 Kf6 55. Rc5 g5 56. Kd4
  11. Indeed, much like humans, AlphaZero searches fewer positions that its predecessors. The paper claims that it looks at "only" 80,000 positions per second, compared to Stockfish's 70 million per second.
  12. GM Peter Heine Nielsen, the longtime second of World Champion GM Magnus Carlsen, is now on board with the FIDE president in one way: aliens. As he told, "After reading the paper but especially seeing the games I thought, well, I always wondered how it would be if a superior species landed on earth and showed us how they play chess. I feel now I know."
  13. We also learned, unsurprisingly, that White is indeed the choice, even among the non-sentient. Of AlphaZero's 28 wins, 25 came from the white side (although +3=47-0 as Black against the 3400+ Stockfish isn't too bad either).
  14. The machine also ramped up the frequency of openings it preferred. Sorry, King's Indian practitioners, your baby is not the chosen one. The French also tailed off in the program's enthusiasm over time, while the Queen's Gambit and especially the English Opening were well represented.
  15. What do you do if you are a thing that never tires and you just mastered a 1400-year-old game? You conquer another one. After the Stockfish match, AlphaZero then "trained" for only two hours and then beat the best Shogi-playing computer program "Elmo."
  16. The ramifications for such an inventive way of learning are of course not limited to games.
  17. "We have always assumed that chess required too much empirical knowledge for a machine to play so well from scratch, with no human knowledge added at all," Kasparov said. "Of course I’ll be fascinated to see what we can learn about chess from AlphaZero, since that is the great promise of machine learning in general — machines figuring out rules that humans cannot detect. But obviously the implications are wonderful far beyond chess and other games. The ability of a machine to replicate and surpass centuries of human knowledge in complex closed systems is a world-changing tool."
  18. interviewed eight of the 10 players participating in the London Chess Classic about their thoughts on the match. A video compilation of their thoughts will be posted on the site later.
  19. The player with most strident objections to the conditions of the match was GM Hikaru Nakamura. While a heated discussion is taking place online about processing power of the two sides, Nakamura thought that was a secondary issue.
  20. The American called the match "dishonest" and pointed out that Stockfish's methodology requires it to have an openings book for optimal performance. While he doesn't think the ultimate winner would have changed, Nakamura thought the size of the winning score would be mitigated.
  21. "I am pretty sure God himself could not beat Stockfish 75 percent of the time with White without certain handicaps," he said about the 25 wins and 25 draws AlphaZero scored with the white pieces.
  22. GM Larry Kaufman, lead chess consultant on the Komodo program, hopes to see the new program's performance on home machines without the benefits of Google's own computers. He also echoed Nakamura's objections to Stockfish's lack of its standard opening knowledge.
  23. "It is of course rather incredible, he said. "Although after I heard about the achievements of AlphaGo Zero in Go I was rather expecting something like this, especially since the team has a chess master, Demis Hassabis. What isn't yet clear is whether AlphaZero could play chess on normal PCs and if so how strong it would be. It may well be that the current dominance of minimax chess engines may be at an end, but it's too soon to say so. It should be pointed out that AlphaZero had effectively built its own opening book, so a fairer run would be against a top engine using a good opening book."
  24. Whatever the merits of the match conditions, Nielsen is eager to see what other disciplines will be refined or mastered by this type of learning.
  25. "[This is] actual artificial intelligence3," he said. "It goes from having something that's relevant to chess to something that's gonna win Nobel Prizes or even bigger than Nobel Prizes. I think it's basically cool for us that they also decided to do four hours on chess because we get a lot of knowledge. We feel it's a great day for chess but of course it goes so much further."

Selected Comments
  1. There are many problems with the paper which I stated here: New computer chess champion? Not yet!. Also when you go through the games, you will find very interesting blunders from Stockfish which are definitively not characteristic for it. It all comes down to the settings. I don't believe this version is superior to Stockfish just yet, next version might very well be.
    • Very recently, news about a new chess playing entity hit the world. Google published their article about a program called Alpha Zero, generalized AlphaGo Zero, which had shown its dominance in the game of Go. This time they aimed to conquer chess.
    • It is a nice step in a different direction, perhaps the start of the revolution, but Alpha Zero is not yet better than Stockfish and if you keep up with me I will explain why. Most of the people are very excited now and wishing for sensation so they don't really read the paper or think about what it says which leads to uninformed opinions.
    • The testing conditions were terrible. 1min/move is not really suitable time for any engine testing but you could tolerate that. What is intolerable though is the hash-table size - with 64 cores Stockfish was given, you would expect around 32GB or more otherwise it fills up very quickly leading to marked reduction in strength - 1GB was given and that far from ideal value! Also SF was not given any endgame tablebases which is current norm for any computer chess engine.
    • The computational power behind each entity was very different - while SF was given 64 CPU threads (really a lot I've got to say), Alpha Zero was given 4 TPUs. TPU is a specialized chip for machine learning and neural network calculations. It's estimated power compared to classical CPU is as follows - 1TPU ~ 30xE5-2699v3 (18 cores machine) -> Aplha Zero had at it's back power of ~2000 Haswell cores. That is nowhere near a fair match. And yet, even though the result was dominant, it was not what it would be if SF faced itself 2000 cores vs 64 cores. In that case the win percentage would be much more heavily in favor of the more powerful hardware.
    • From those observations we can make a conclusion - Alpha Zero is not so close in strength to SF as Google would like us to believe. Incorrect match settings suggest either lack of knowledge about classical brute-force calculating engines and how they are properly used, or intention to create conditions where SF would be defeated.
    • With all that said, it is still an amazing achievement and definitively fresh air in computer chess, most welcome these days. But for the new computer chess champion we will have to wait a little bit longer.
  2. Sadly, I think this is as much as hype (at this point) as anything. It's just impossible to compare the 4 TPUs of AlphaZero to 64 cores (and only 1 GB of RAM?!) of Stockfish on a hardware basis (the paper doesn't even try to do so!). Rudimentary guesses/estimates are that this is a 100 or 1000 times hardware advantage (200-400 Elo), which completely nullifies the 100 Elo from the 64-36 score. It is still impressive on other measures though (like machine learning), I just don't think the whomping of Stockfish as a software advance is the main point, given the quite incomparable hardware situation. Alternatively, perhaps it says that these massively parallel GPU/TPU solutions are the future, and the single CPU/RAM model is (for the second time) dead.
  3. The hardware issue is definitely putting questions to the value of the 64-36 score, but to me the most impressive aspect is that it got so strong in just four hours, meanwhile developing opening theory that took humanity more than a century.
  4. The hardware difference is mitigated by the number of positions calculated. Stockfish was able to compare 7million/sec while AlphaZero only compared 80k/sec, so no matter what additional hardware was put behind Stockfish. The point being made is that AlphaZero's calculating ability and intimate knowledge of chess is far superior to that of Stockfish, despite the amount of human knowledge of chess (and human time being taught chess), put into each, thanks to machine learning.
  5. 4 hours sounds impressive, and it is, but be aware that AlphaZero used more than 500 TPUs during the learning phase... That is equivalent of 250,000 CPUs or more. So 4 hours on that is an eternity (or at least years) on a single PC.
  6. Well guys, chess is a mathematical problem. The google team came up with a very effective new way of challenging this problem. Well done! Still I don't see how an optimized Monte Carlo approach that uses self-created statistics (the engine playing itself zillions of times) to come up with the best move has ANYTHING to do with intelligence4 or human learning. Because it hasn't. It's simple as that. It's a machine that follows a protocol. Now the machine is powerful and fast enough to follow the protocol without specific instructions. It's quantity with a feedback system (statistics which move scores best). And now to what it does not do: It does NOT "play" chess. It does NOT think. It does NOT decide to play chess. It does NOT have ideas. It's an optimized calculator. I wish, people (especially journalists) would come to terms with that instead of spreading Google's spin of a revolution of humanity.
  7. The type of programing used for alphazero was used by computer chess developers and on a consumer grade computer could only achieve 2400 rating. This type of algorithm isn't new. They discarded it years ago for what they use now. So yeah parameters of the computers raw resources have just a teeny bit of effect here (yes that is sarcasm). I hope it's Vishy that called out the parameters in the interview! Guess I'll come back to find that out later.
  8. I agree the stockfish program was run at a sub-optimal level for these games. In addition the time control is a little off compared to its claim of tournament play, as it gave players fixed time per move rather than fixed time per game, thus nullifying a large part of Stockfish's opening book advantage (playing a move instantly gives no advantage vs AlphaZero's 60 seconds). Also it is unknown to me at this point if Stockfish even was using a top opening book. The paper just states Google tossed together openings that were played over 100,000 times. So I take it the book was just made from a database with no edits? This would be a very non-competitive opening book in a standard computer chess engine tournament. I want to say that for the longest time engines such as Houdini could outplay Stockfish in G/10 second matches but lose in longer times. I feel the same is at play here but AlphaZero is at a much higher level. If you gave both engines 2 Hours/Game and higher computing power to Stockfish, I wouldn't be surprised if result goes in Stockfish's favor if Stockfish was given an actually competitive opening book. Also no mention was given to what tablebases Stockfish was using if any. Wouldn't surprise me if the Google guys ran stockfish with permanent brain set to off.
  9. I wish all these articles would stop citing '4 hours'. Yes, there is a line in the paper that it outperformed Stockfish after 4 hours, but in the training methodology section you can clearly see that the final product trained for 700k batches in each game, which took 9 hours for chess. The neural net that whomped stockfish was trained for 9 hours, not 4 hours. (Also, that's 9 hours on a 5000-TSU stack that essentially represents decades of continuous computation on a consumer-grade computer). Not to diminish the awesomeness of AlphaZero's performance, but the bit about 'after only 4 hours' is sensationalist journalism that trivializes the amount of computation that goes into training such a network.
  10. This match doesn't seem to be fair. As far as I know 1 TPU has 8GB of memory and 180 teraflops: Google announces TPU 2.0. 8 core high end CPU has 350 gigaflops. Core i7 5960X (Haswell E). 8 core @ 3.0GHz AVX2. Haswell-E Core-i7. So overall it is around: 32 GB vs 1GB; 720 TF vs 2.9TF. It looks like Deepmind hardware was more than 200 times stronger.
  11. I'd be interested to see how Stockfish would do with its opening book. After all, Stockfish isn't designed to calculate the opening, and AlphaZero seems to be specifically designed to excel in the opening as "intuition" or AI is its core.
  12. In my understanding, because ("traditional") engines can't remember anything that isn't available to them at this very moment - no matter whether it has never been available, was available but then deleted, or still available but hidden (not on the current search path or directory, opening book switched off). The human equivalent of an engine without opening book would be someone who never, never ever studied openings at all - not by reading opening books, not by checking databases, playing through games, getting coaching, whatever. Or someone who has studied openings but forgot literally everything. A Caro-Kann player would have to find out anew each time that 1.e4 c6 2.d4 d5 3.Nc3 requires 3.-dxe4. A Sicilian player would have to figure out from scratch whether a7-a6 is useful or even needed in a given position. Both might be possible based on 'general considerations', in the second case including the fact that white playing Nb5 might be annoying - but without opening study you won't even know that it CAN be annoying in such structures. Human players, down to weak amateurs, naturally use their "opening book" - small or big, but never completely switched off.
  13. Only a strong player or a techie understands what this is really about. The match is not what is important. If Deepmind taught itself to play like this that is the story. It doesn't matter if it was a fair match with SF. That it could win some games at this level with this kind of play using totally different processes is amazing. Remember this technology is still improving.
  14. If Stockfish couldn't use opening libraries and ending tables (and the strength of the hardware is questionable), it really is like saying: Ok, guys, we're going to have a race between a motorbike (= Stockfish) and a bicycle (= AlphaZero). The motorbike has been tuned for many years to have the best performance, but at last moment you say: "Sorry, you can't use petrol, because our machine, the bicycle doesn't use petrol either." Obviously, the motorbike loses, because the bicycle uses a different kind of power (muscles). Stockfish can only show its full strength if using the tools that have been "tuned" over many years. AlphaZero uses a different kind of power: reinforced learning. And it's nonsense that AZ doesn't use opening libraries: it does. It uses its own libraries that saves during self-learning (or self-teaching, whatever you call it). What DeepMind achieved in Go is truly remarkable. But this is really unnecessary PR, and imho DeepMind is rather losing credit, not gaining. However, as a lot of people have pointed out, the achievement of being able to create a strong chess machine that learns chess at a high level from the scratch (only the rules) is great! And now a real and fair match, please! If DeepMind dares to have one... Under controlled circumstances, not on Google's playground behind closed doors!
  15. This is a seminal moment in the evolution5 of chess algorithms. Now all the 'top' engines will have to have their algorithms rewritten in Machine Learning to stand some chance in the future against a model that learned from itself by playing millions of games.ML identifies patterns that is impossible to identify using any other techniques. This is the real computer chess. So far all the engines heavily depended upon creators access to game db and explicit programing of parameters. What is even more scary is that the AlphaZero will get even better when it starts playing against similar ML based engines and by tweaking its hyper parameters.
  16. “I just ran some of the provided PGN's though Stockfish and every time, Stockfish was confident it had reached equilibrium or close to it in middle-game. How is that possible if it's hamstrung in opening book play?”
    • I noticed the same thing in the two I looked at.
    • It kinda negates the whole argument that the lack of opening book was a significant issue. AlphaZero seemed to have a deeper understanding of the endgame that even led to a piece sac that would be considered risky by most humans. Stockfish didn't even recognize that the sac led to an advantage until several moves later. AlphaZero's understanding was just so much deeper.
    • Those complaining about hardware disparity and lack of opening book are totally missing the point. No, this isn't apples to apples, like comparing Komodo to Stockfish. You could undoubtedly run Stockfish on the supercomputer and get a predictable increase in strength due to the added brute force power, and that surely would have narrowed the gap between the two systems.
    • But AlphaZero's ability was obtained after four hours of self-play! Of course it could continue to become stronger with more time. And maybe it couldn't compete if it was running on the same hardware SF was using. That's kinda irrelevant. The point was a demonstration of the learning and problem-solving capability of this new form of artificial intelligence6. Assuming SF was running on anything close to decent hardware, it could easily perform at an ELO of 3300, and AlphaZero stomped it. Just appreciate how amazing that is without getting distracted by the irrelevant details.
  17. An opening book is worth about 50 Elo. Tablebase is worth about 35. That makes the StockFish8 used for this match about 3140 Elo and so A0's performance is about 3250. After only four hours of self-training that is a stunning result. DeepMind's best Go program trained for 40 days.
  18. A comparison of the intelligence7 required to build a Ferrari from a pile of parts against the difficulty of playing chess is terrible.
    • Chess is greatly different than a bag of mechanical parts, in that a given chess position contains only a small amount of "information" (under 200 bits) -- think a few bits for each piece multiplied by 64 squares. This can be stored and processed extremely quickly in a mathematical sense on the thousands of tiny ridiculously fast processors in Deep Mind.
    • With a pile of mechanical parts, the position and orientation of each part in three dimensional space represents a unique piece of information. If you stored that as floating point, the 30,000 parts in a typical functional car would represent over 7 million bits of information. But that's not enough information for an "intelligent" computer to iterate on putting the pieces of the car together -- you would also need the surface representation of each part (like a CAD drawing of it), which if factored in to the simulation would increase that to hundreds of millions of bits. Beyond that, you would also need the ability to simulate the motion and interaction of those parts -- Newton's laws, statics, dynamics, etc, in sufficient detail, which takes ridiculous processing power.
    • The problem is actually infinitely worse than that though -- the problem is how to define a "score" for a given configuration of parts, that represents how close you are to successfully putting together the car. With chess, you can easily define a score for a position in a variety of ways (i.e. points of the pieces, or run a random simulation forward from a given position and see which side tends to win more often). Obviously there are exceptions, for example where there is a forced mate, you can ignore points, but it still lets you cut out a lot of branches you would otherwise have to search.
    • With putting together a car, you have no such way of defining that a certain configuration of parts is mathematically better than another, because the vast majority of configurations of parts are completely nonsensical (for example, attach the steering wheel to the tail pipe -- think of all kinds of funny ways you could arrange the parts) and effectively have a score of "zero" in the grand scheme of things. A human could figure things out, because we have the ability to do abstract problem solving based on vast life knowledge of how things work. Deep Mind has none of that.
    • There's probably hundreds or even thousands of auto mechanics in the world that could successfully put together a Ferrari from parts without a manual, given enough time and patience, but not one of those mechanics could beat AlphaZero or even Stockfish in a thousand years. Give Deep Mind a thousand years trying to put together any car (or even something much simpler, like a bicycle) and it will get nowhere. It's a completely different problem space for a computer to solve abstract unconstrained real world problems, versus the tightly constrained mathematical problem space of chess.
  19. The mere fact what an AI has achieved from SCRATCH (tabula rasa) is amazing. It knew ONLY the rules of chess! No evaluation tables and 20 years of coding by hundreds of programmers! BTW, it is written in the paper that it used only 4 tensor processors in the match. They consume even less Watts than the Xeons and are built on 28 nm technology. So it was not a supercomputer you allude to. In fact, "AlphaZero searched just
    80 thousand positions per second , compared to 70 million by Stockfish." By all events, the computational power of Alpha Zero was not so overwhelming - 4 TPU take just 4 SATA slots in a computer and consume about 200 Watts altogether.
  20. It's making use of artificial neural networks to learn, so this result should have been obvious considering a similar system performed well playing Go, which is a far more complicated game for a neural network to learn. The actual important thing of all this isn't that it beat Stockfish, it's how it beat Stockfish. It played much more human moves than any engine would, as it is essentially learning how a human does, and so we can learn much more from it than we can from a normal engine. Also, I don't know the specifics of the system obviously, but generally these systems take much more computing power to train than they do to run, so a distributed, trained version may not need a lot of power to be very very strong.
  21. What people fail to realize is that the human brain is just a computer. Many believe PCs are "faster" than us at calculations but lack other forms of intelligence8. This isn’t true. The human brain is a PC that evolved trough millions of years of evolution9. It is way faster then the best current supercomputers. Similar to this chess AI we use neural networks to make decisions. Our neural nets are way more complex than those of the PC. Both out hardware and software are superior at the moment to the PCs. The only reason we cannot do fast calculations, is because we didn’t evolve the software for that because it wasn’t necessary in the human evolution10. The AIs on the other hand are designed instead of evolved. It’s easy to design a machine that can do calculations. The reason AIs are not good at real-life tasks yet, is because their hardware is insufficient, and we lack the general purpose algorithms. This is only a matter of time though, there is no fundamental difference between a PC and a human brain. In the end every task that a human can do, will be done better by an AI. AIs will be better writers, judges, lovers, researchers. Human beings will become completely obsolete in the near eras. Enjoy the nice life while it lasts, it won’t be for too long.
  22. ”What people fail to realize is that the human brain is just a computer.”
    • Interesting assertion, but it's only that. Many people have asserted that a human brain is analogous to a computer, but that's not demonstrable and not scientific. In fact very little is actually known or understood about how the human brain works. There's no evidence that the human mind visualizes or calculates, on a conscious or unconscious level, thousands of accurately projected possible chess positions per second as part of its evaluation process. We can say that the human mind uses logic, analysis, thought, creativity, intuition, etc. in solving problems, but we don't have very precise definitions for these terms. People claim that A.I. does some of these things, but again, that's just an assertion, not scientific. The best we can say is that, based on results, it kind of looks like that. And as A.I. is getting more complex, we have less of an understanding of what the computer is doing as part of its process. Claims that what a computer does is analogous to what a human brain does are based on faith, not fact.
    • If A.I. is as sophisticated and "intelligent" as its advocates claim, and if the success of machines at chess and go is not principally a result of brute force calculation, then let's take away the computer's brute force and see what happens. Let’s give a machine like Alpha* a lot of time to learn the game. Then, during play, let's constrain it so that it's only capable of calculating, let's say 20 positions per second rather than tens of thousands or tens of millions. It will still be calculating deeper than a human! It will still have the brute force advantage! Then have it play at classical-chess time controls against the best humans in the world and see who wins.
    • Although two competitors approach the same goal, that does not mean they use analogous methods, and results prove nothing about method. I could race you 200 meters. I run, you ride a motorcycle. We both have the same goal, but we get there entirely differently. If I want to make a "fair" race, I'll have you race against Usain Bolt, and you have to kick start the engine after the starting gun has fired. But in any event, to claim that the human leg is just another kind of motorcycle would be somewhat absurd. (The motorcycle might win the race, but let's see it dance the cha cha cha!)
  23. Regarding the game analysis, one thing I noticed that agrees completely with a key difference between AlphaZero and Stockfish, at least according to the Deep Mind paper. Stockfish will accept a path with a very narrow line that keeps it alive (which, as Black, you accept). AlphaZero prefers a path that gives it many options. Around the point at which the game appears to turn, Stockfish as hanging on to a path with only one survivable outcome... when that singular path gets refuted, it's game over. This results from AlphaZero averaging outcomes in it's look-forward tree (pruning the obviously bad options), whereas Stockfish, like almost all engines, uses a minimax algorithm. So Stockfish as Black will take a +0.00 evaluation that leaves it no options, over a 0.10 evaluation that gives it more flexibility. AlphaZero is not the first engine to use averaging, but the first to do so with such success. This might be the sort of decision that made sense at ELO 2500, but not at ELO 3500.
  24. If your neural net is flexible enough, it's indistinguishable from looking-ahead, or equivalently, to searching more positions. If you can do that on equivalent hardware, you have a powerful argument for a superior problem-solving architecture. If you have far more computing power available to you, you can't really claim a better architecture. Suppose I build a quantum computer that beats AlphaZero using the dumbest brute-force algorithm imaginable. It would not prove the superiority of my algorithm. I'm quite impressed with the AlphaZero accomplishment ... pending some important details that are uncertain, such as whether AlphaZero played against SF as part of its training. But the hardware certainly matters. Also ... you have to wonder why pondering was disabled ... this was supposed to look like a normal game, and that's not normal. It would definitely make SF more predictable ... which matters if AZ trained itself against SF. Or maybe SF just isn't capable of pondering usefully, because of the nature of its architecture. Things to wonder about...
  25. ”For the MCTS algorithm to work as explained at AlphaGo Zero - How and Why it Works without any chess knowledge programmed into it besides the bare rules Alpha Zero had to play out every single "node" move until the very end of each line to get a new single value for each node. This is really hard to believe, incredible. I have to say I still don't get it.”
    • It's hard enough to believe that it's quite suspect. If I understand how AlphaZero works, it starts by playing completely random moves against itself. The winning side has positional factors (or nodes) weighted positively, the loser negatively. At the simplest level, a node is having a particular piece on a particular square. But going deeper the relationships between the pieces on the various squares comes into play as weighted nodes. The number of potential nodes is practically endless, and they aren't telling us what nodes are and aren't tracked. But still ... none get weighted until a game is over. So you can drop your queen, win anyways against an opponent who played randomly, and begin by concluding that dropping your queen was a good idea. Eventually you'll sort it all out, but is seems an unbelievably inefficient way to learn.
  26. Folks, so many things about the match condition are not known that is almost impossible to say anything meaningful about the match. But one thing is known so far: Stockfish was not allowed to use an opening book, and that seriously weakened it, no doubt about it. Now, let take a look at the games to see whether Stockfish played in its optimal performance. Below was the first game of the 10 samples:
    • In the match, Stockfish played 11. Kh1 which seriously makes no sense at all. An analysis with Stockfish 64 bits yields indeed 11. b4 or 11. Be3 would be the optimal move for White and the game would be even. Not enough, the next move was even more crazy: Stockfish played 12. a5just to get the a pawn of out reach of the black knight when even a quick glance will say b3 is a much better move and my Stockfish agreed on that. Let Stockfish run for 15 seconds, it would propose 12. Qc3 or 12. Nfxe5 as the optimal moves and again the play field will be even. I seriously don't understand why Stockfish had played weaker for 1 min/move (as indicated in the paper)!
    • Game 3 is even more interesting: After 47. Rxc5 Stockfish responded naturally with 47.... bxc5 and evaluated the position as even. But just after white moved its queen to the h column 48. Qh4 , Stockfish must double its rook on the e column and dropped its evaluation almost 100 pts(!), which should signal a serious trouble. The sudden drop in evaluation is disturbing. Do we see a horizon problem of Stockfish (and similar engines) here because even after a 69-depth evaluation of move 47th, Stockfish still could not sense the danger? A superb positional playing of Alpha Zero where its strength of recognizing patterns came evidently into play. But what happened next is beyond my knowledge: 49. Rf6 Rf8. Why on earth should the black rook limit itself, its king and queen to the corner? An attempt to protect the pawn f7 is futile because white can easily double down the pressure with Qf4, which it actually also did. A clearly better alternative would be 49...Kf8 where the black king and its queen would have a litter more breathing room. The 50th move of black was catastrophic, too. 50. Qf4 a4. Now, this move lead to nowhere. If black wanted to protect the f7 pawn, it must block diagonal line of the bishop b3 with 50...d5 than move its e7 rook out of misere. Anyway, the game was lost after the two blunders but the question remains, why did Stockfish do that? Not enough time or what was the reason?
    • One last thing about the training of Alpha Zero. Google claimed it took only 4 hours and around 300000 steps to beat Stockfish. I don't know what they mean exactly with "steps", but probably that means a batch of 4096 games or around 1.2 billion games in total to beat Stockfish. And that again means Alpha Zero used to finish a game on 11 ms on average! This is serious ass-kicking hardware that everybody else could only dream of. It throws the questions about the evaluation function Alpha Zero used for its training. Did DeepMind use the same evaluation function of Stockfish (which is open-source and contains lot of chess knowledges) to train its networks than claimed to beat Stockfish on a million times more powerful hardware?
  27. To be honest, I don't think we as humans can ever judge these decisions either on our analysis or even Stockfish's early analysis, that's because the computer considers way too many moves and if you let it run for more, it may change its decision (i.e. maybe somewhere 30 moves after Be3, stockfish realizes that it's losing). Additionally, I can clearly see a weakness in stockfish's play where it mostly favors lines where the play is forced (e.g. exchanges), or as someone else put it, lines that are narrow.
    • I ran the latest stockfish on the position you mentioned (at move 11): At first, It gives b4 as the optimal move when the engine is running for about a minute. After that, it decides Be3 is better. But after 5 minutes on my hardware that runs on 1,400k nodes/s it will decide to go with Kh1 as the optimal move.
    • In the paper, it is said that stockfish calculates 70,000k positions per second and is run for 1 minute per move, that's about 50 times my hardware, so I'll let mine run for 50 minutes... Kg1-h1 is still the choice for Stockfish.
    • However, an interesting result was observed when evaluating move 12, stockfish started out suggesting the moves you mentioned, but after 6 minutes it preferred a4-a5 just as the game went. But I had to keep it running for a complete 50 minutes in order to reach 70,000k(positions/sec) * 60(seconds) =4,200,000k nodes mentioned in the paper. After 18 minutes the line changed back to Qc3, and again to Ne3 after 24 minutes. After 27 minutes, it suggested Ng1 all the way up to 56 minutes. After that, the optimal move was Nf3xe5 but it's past the 50minutes mark I mentioned before (you can see the engine analysis for an hour and 22 minutes in this screenshot).
    • I don't know why Stockfish in the real game didn't play the moves above the 1,472,000k nodesmark (red line). There were three other good lines after a4-a5 until the 4,200,000k nodes (50 minutes) I mentioned before. Maybe I'm confusing the definitions of nodes vs. positions. However, I think we can see that Stockfish on the match performed below 1,472,000k nodes mark in order to play a4-a5 and that gives us an inkling of its performance. I hope I can get useful insights on this assumption from people who are more familiar with the engine.
    • As you mentioned, so many details are unknown that makes it really hard for us to trust the outcomes of the games. For instance, AlphaZero was allowed four hours of training but Stockfish was not allowed to use opening tables which is a serious drawback. We can clearly see in the paper that AlphaZero started training itself very quickly on the matter and has even built preferences on some openings. AlphaZero is a neural network, so claiming that AlphaZero uses no openings because it doesn't have a book (and therefore Stockfish shouldn't as well) is like saying Grandmasters do not know openings because they don't bring their books to matches. I would have been much happier if they would at least allow Stockfish to build an opening book for itself at the same duration and using the same hardware AlphaZero was trained.
  28. Some of the facts of the AlphaZero vs Stockfish 8 match that we know so far are:
    • 1,300 games were played in total by AlphaZero against Stockfish 8.
    • Stockfish won 24 games! Drew 958 games, and lost a massive 318 games!
    • The interim paper released only showed a selection of 100 games from the 1300 games the full paper should have all 1300 game scores and detailed results (summary stats below of the 1300 games)
    • Some misconceptions:
      • Stockfish 8 in the match was playing stronger than the standard stockfish. This is not validated yet and what makes it doubtful is that despite the 64 threads processing power, the hash memory was very unfavorable to Stockfish. This actually could hinder it evaluating positions and explain why people have been seeing some suboptimal moves played by Stockfish 8 in the match but find their Stockfish making stronger moves. It is true that the horizon effect could have come into play in some situations but it is not convincing to say that this is true in many of the examples that have been listed in this thread of weaker moves played by the Stockfish 8 in the match.
      • The Time management hurt Stockfish 8's evaluation positions as the 1 minute per move curtailed full-depth search and left it often playing suboptimal moves as it has run out of time.
      • AlphaZero trained with self-play developing "expert policies" it could apply to any position it encountered against Stockfish 8. The learning was only in self-play.
      • The 24 games that were lost by AlphaZero against Stockfish 8 show that the learning could still be improved and the 958 draws against a handicapped Stockfish 8 show both could benefit from enhancements, in Stockfish (books, tablebase) and AlphaZero (learning from expert games with no bias).
      • AlphaZero is impressive even against a weakened Stockfish 8, but releasing only 100 games with none of AlphaZero's draws (958) and none of its lost games (24) was not the best for an objective discussion about the DeepMind team's achievements.
  29. These brute force type of engines like Stockfish don't really get much better with more computing power. The Stockfish on a smartphone can probably beat Carlsen with ease. That's because the search tree gets exponentially bigger: for each extra depth, there are on average 20 more moves to consider. That's 20 times more computing power for each extra move. What Alpha Zero does is completely different. It uses (an approximation of) infinite depth for all the moves it considers. It plays entire games against itself and determines what percentage of games are winning, losing, or drawing. That's how it finds those really deep positional ideas that brute force engines will never find.
  30. See Re-evaluation of AI engine alpha zero. Abstract: Artificial Intelligence11 (AI) is at the heart of IT-research and pioneers automated problemsolving. A recent breakthrough that could shift the AI-field was reported by Silver and colleagues of the famous Google Deep mind developer team around Demis Hassabis. They have reported that a new adaption of their in-house generalized neural network (NN) machine learning algorithm, termed AlphaZero, has achieved to outperform the world’s best-rated chess engine Stockfish, already after four hours of self-learning, and starting only with the rules of chess. AlphaZero originates from the superhuman Alpha Go program and could beat the best chess engine via blank slate tabula rasa self-play reinforcement machine learning, i.e. only by learning from many games played against itself. This worldwide strongest AI-performance claim has been drawn from a 100-game match between AlphaZero and Stockfish engines and has attracted much attention by the media, especially in the world of chess, which has been historically a key domain of AI. AlphaZero did not lose one game and won 28 times, while the remainders of the 100 games were draws. General reinforcement learning is very promising for many applications of mathematical solution finding in complexity. However, the requirement to independently verify if this breakthrough AI claim can be made poses some major difficulties and raises some inevitable doubts if a final proof has been given. Machine and method details are not available and only10 example games were given. This research starts with a reproducibility testing of all 10 example games and reveals that AlphaZero shows signs of human openings and might have outperformed Stockfish due to an irregular underperformance of Stockfish8, like post-opening novelties or subperfect game moves, like in game moves and post-opening novelites. At this juncture, the testing revealed that AI quiescence searches could be improved via multiple roots for both engines, which could boost all future AI performances. In light of a lack of tournament conditions and an independent referee, comparability challenges of software and hardware configurations such as AlphaZero’s TFLOP super-calculation-power, this work suggests that a final best AI-engine-claim requires further proof. Overclaim biases are found in all sciences of today due to the publishing imperatives and wish to be first.
  31. Sundry Links:
    Twitter: Demis Hassabis
    Google Deep Mind: Mastering Chess and Shogi by Self-Play
    Chess24: Alphazero vs Stockfish
    AlphaZero vs Stockfish 1200 games! 4100+ rating!
    Alphazero: Its great predecessors
    AlphaGo Zero: Learning from scratch


See Alphazero destroys Stockfish in 100 game match for the article.

In-Page Footnotes

Footnote 1:

Text Colour Conventions (see disclaimer)

  1. Blue: Text by me; © Theo Todman, 2021
  2. Mauve: Text by correspondent(s) or other author(s); © the author(s)

© Theo Todman, June 2007 - August 2021. Please address any comments on this page to File output:
Website Maintenance Dashboard
Return to Top of this Page Return to Theo Todman's Philosophy Page Return to Theo Todman's Home Page