Calibration in chess

Jul 21, 2007

Daniel Kahneman posted the following on the Judgment and Decision Making site:

Have there been studies of the calibration of expert players in judgments of chess situations — e.g., probability that white will win?
In terms of the amount and quality experience and feedback, chess players are at least as privileged as weather forecasters and racetrack bettors — but they don’t have the experience of expressing their judgments in probabilities. I [Kahneman] am guessing that the distinction between a game that is "certainly lost" and "probably lost" is one that very good players can make reliably, but I know of no evidence.

Despite knowing much less about decision making and (likely) less about chess than Kahneman, I have three conjectures:

1. Players would show superadditivity in the sense of overstating their own chances of winning. To put it another way, suppose that both players in a game give you Pr(I win), Pr(I tie), Pr(I lose). Call these W1, W2, W3 (for white) and B1, B2, B3 (for black). My conjecture is that (W1+B1) > (W3+B3)–that is, that the total "I win" probability exceeds the total "I lose" probability. It would be interesting to see this on average and also for individual games and times of the game.

2. Players would show the usual overconfidence in probability statements, for example, events that are stated to happen 90% of the time only happening 75% of the time, and so forth.

3. Aspects of both points above might be explained by the idea that:chess players, like the rest of us, tend to make their probability statements about the ideal, rather than the actual, game outcome. For example, suppose you were to do a study to measure probability judgments and find the (generically) expected overconfidence: when players predict a 99% chance of victory, it only happens 90% of the time, or whatever. On those 10% of the times when his or her prediction is wrong, I could imagine he or she explaining it away as some blunder that "wasn’t supposed to happen" and so shouldn’t count.

Similarly, before the game even starts, each player’s probability of winning can be calculated based on who is playing white, who is black, and their ratings (see here), but I would imagine that, before the game begins, each player overestimates his or her own winning probability, thinking "this time I’ll play harder" or something similar.

This ties in a bit to the distinction between the "is vs. should" or "descriptive vs. normative" distinction in decision analysis. I think it would be natural to assess the chances of winning in the well-fought game of the player’s imagination rather than in the calibrated empirical world of all realistic possibilities.

Anyway, it would be fun to see the data. And I’m probably being overconfident about my own conjectures above.

Overcoming Bias Commenter

May 15, 2023

A lot depends on the relative skill of the players. Gary Kasparov could probably beat me from a position that would be hopeless if he were playing against a typical computer chess program. The status of the board, by itself, is not enough information to decide anything about who is going to win. Heck, telling your human opponent "I'll give you a large amount of money to throw the match" is a strategy that could potentially generate a win from any situation in which the game has not yet ended, no matter how hopeless.

Furthermore, trying to define the players is a bit of a lost cause. If you assume that the players never make random decisions and that past games do not affect the outcome of future games (a reasonable assumption for computer programs, much less so for humans) then the outcome of the game is entirely determined by the starting position of the pieces. Either White will always win, Black will always win, or the game will always be a draw. I could play one memoryless, deterministic computer chess program against another memoryless, deterministic computer chess program as many times as I want, and they will always make the same moves in the same situation.

Expand full comment

Overcoming Bias

Discussion about this post

Ready for more?