There are three main ways today to evaluate a player using an engine. This is not to be confused with annotating a game.
The most sophisticated one is used by analysts such as Dr. Ken Regan, who compile the average error rates of players per level. For example, I might lose an average 0.15 pawns per move compared to the engine’s best, while a top GM might lose only 0.02. His system will go deeper than this, but that is still the foundation on which it lies and it will be better at catching ‘smart cheaters’ than a more basic system such as below.
The simplest is just to analyze a game with an engine and ask it to highlight every move it disagrees with, however small the difference. Obviously the risk is that in some positions, there might be three roughly equal moves that three engines play slightly differently.
Imagine you are analyzing with only Stockfish, and it says that five moves out of ten are not a match. This might overlook that two of the moves that don’t match its choices, are chosen by another top engine such as Komodo Dragon 3. In other words, only five match Stockfish, but seven in all match top engine choices. That is the underlying point of Let’s Check. When you analyze a game with it, it will not only tell you what a variety of engines thought of each move, it will give you a summary called Engine Correlation at the top, showing the percentage of times a player’s moves matched the top choice of an engine.
However, unlike a plain engine comparison, it won’t compare with just one top engine move, it will compare with several, and if the move matches any of those engines, then it is a match for Engine Correlation.
The new Komodo Dragon 3 engine has gained 100 Elo points in playing strength over its predecessor when using a processor core in blitz. That’s a huge improvement for a program that already reached at
an Elo level of over 3500!
Recently there were several claims about high Engine Correlation matches between Hans Niemann’s games and the Let’s Check choices, so out of curiosity I ran a complete Let’s Check on all the games in the recent Sinquefield Cup and I must say the results were unexpected.
The first result to come out was that one player did actually obtain a 100% match. This was not the result of some ultra-short draw, since Let’s Check will ignore theory moves, and games with too few moves played. I.e. a game that was 28 moves long but had 20 moves of theory will not be eligible for an Engine Correlation result. Who is this engine matching wonder? Wesley So.
In his game against Ian Nepomniachtchi, the American player achieved a 100% Engine Correlation score. However, he was not the star performer overall in terms of such measurements, since it was his only game over 80%. No, one player managed to score three times in excess of 90% engine correlation. Aha! I hear you cry out. We have him! So who is this chess engine-like god?
Levon Aronian had several of the highest quality games according to Let’s Check
Meet Levon Aronian, late-bloomer extraordinaire, who had an engine correlation of 92% against Caruana (who himself has a 96% correlation in that same game) over 45 moves, 91% against Wesley So in 43 moves, and 91% against Magnus Carlsen in 36 moves. Plus two more games with over 80%.
He was not quite alone though, and none other than Ian Nepomniachtchi had two as well, plus several over 80%, showing the quality of play that led him to win the Candidates this year. Note that he had an average 78% engine correlation for the entire Candidates, 11% more than second-best Caruana.
The burning question on your mind, dear reader, is what about Hans? In terms of engine correlation, Hans was the worst. His best game, with an 88% match over 55 moves, was in round seven against Maxime Vachier-Lagrave. In his game against Carlsen it was a modest 68%, but of course Magnus was playing dreadful that day, and had only 37%.
The mythical 100%
So how rare is 100% after all? It is rare but not as rare as you might think. I ran some random checks through games in 1999-2000 as I was curious about Kasparov and Kramnik. All in all I had some 150 eligible games, maybe less, yet it turned up a higher-than-expected number of perfect matches.
For example, the rapid games Amber tournament had several 100% perfect games, including Jeroen Piket in one, and Kramnik in another. And against Topalov no less… Memories of Toiletgate. There were also two(!) by Kasparov in Bosnia in 1999, another in Bosnia in 2000, one more by Kramnik in the World Knockout event against Korchnoi over 41 moves and later one by Michael Adams against Vlad in that same event.
However, there is a caveat that must be mentioned when using such tools. It is eminently possible to game the system to show a 100% match where it normally might not. You see, when doing a Let’s Check analysis within Fritz, you have the option of providing your own engine, and then telling it to only use it for moves that did not match engine choices. In other words, you are trying to find an engine it will match. And if it does…. the engine correlation will improve.
Originally, this game was only a 90% match, with no engine choosing Garry Kasparov’s 16.cxd5 for example. After trying several, I found an engine that chose it, and entered it as another Let’s Check choice. Now the tally reads:
So yes, the results can absolutely be manipulated by the unscrupulous. A telltale sign might be in the engines listed. If a new game shows Stockfish 14+, Komodo 12+ and so on, it should be fine, but if you see some very old engines or odd names for that same new game, be on your guard, as they may have been used only to get an extra match.
The ChessBase Mega Database 2022 is the premiere chess database with over 9.2 million games from 1560 to 2021 in high quality.
Regardless, here is the signature win by Kasparov with notes from Mega Database:
Does this in any way invalidate the use of a tool such as Let’s Check? Of course not, but as all such tools, they must be used with good sense and judgement. The fact that modern elite players can rattle off multiple games with such extraordinarily high engine matches is a testament to the increasing overall quality of the chess players, since the engines they are matching today, are also hundreds of Elo stronger than engines of a decade ago. These players are also studying and learning from the engines, and that increase in pure ability is a consequence of it.