Truth is stranger than fiction

Robin asks the following question here:

How does the distribution of truth compare to the distribution of opinion?  That is, consider some spectrum of possible answers, like the point difference in a game, or the sea level rise in the next century. On each such spectrum we could get a distribution of (point-estimate) opinions, and in the end a truth.  So in each such case we could ask for truth’s opinion-rank: what fraction of opinions were less than the truth?  For example, if 30% of estimates were below the truth (and 70% above), the opinion-rank of truth was 30%.

If we look at lots of cases in some topic area, we should be able to collect a distribution for truth’s opinion-rank, and so answer the interesting question: in this topic area, does the truth tend to be in the middle or the tails of the opinion distribution?  That is, if truth usually has an opinion rank between 40% and 60%, then in a sense the middle conformist people are usually right.  But if the opinion-rank of truth is usually below 10% or above 90%, then in a sense the extremists are usually right.

My response:

1.  As Robin notes, this is ultimately an empirical question which could be answered by collecting a lot of data on forecasts/estimates and true values.

2.  However, there is a simple theoretical argument that suggests that truth will be, generally, more extreme than point estimates, that the opinion-rank (as defined above) will have a distribution that is more concentrated at the extremes as compared to a uniform distribution.

The argument goes as follows:

Suppose that everybody’s Bayesian, everybody has the same prior distribution, but with different small amounts of data.  To give some notation:  suppose we will be looking at a sequence of parameters, theta_1, theta_2, theta_3, … with a common prior distribution p(theta), which represents the true distribution of this population of theta’s.  (We could further suppose a hierarchical structure, so that p(theta) has hyperparameters that are estimated from data, but this is not necessary for our discussion here.)  For simplicity, suppose p(theta) is a normal (bell-shaped) curve centered at 0 with standard deviation sigma.

Now suppose you get some data, y, on a parameter, theta, and summarize your inference by a point estimate which is your posterior mean, theta.hat = E(theta|y).  Averaging over all possible data y that you might see, this posterior mean a sampling distribution which is centered about 0 but with a standard deviation less than sigma.  This derives from an application of the basic variance-decomposition inequality:  var(theta.hat) = var(E(theta)|y) = var(theta) – E(var(theta|y)), which tells us that the theta.hat’s are less variable than the underlying thetas.  (This is a point we make in our paper, All Maps of Parameter Estimates are Misleading, and it also is discussed in some papers by Tom Louis.)

I posted a simulation of this (along with R code) here.  (It seemed too technical to go into this blog.)

Getting back to Robin’s question:  so, if everybody is Bayesian, using a prior distribution that correctly reflects the distribution of the underlying parameters being modeled, then, the point estimates will, on average, be closer to the center of the distribution as compared to the true values.  (To put it another way, the parameter estimates are shrunk toward the prior mean.)  And so the truth will look stranger than fiction–if fiction is thought of as point estimates!

3.  This point arises in many statistical examples:  one’s best guess is inherently more sober than what might possibly happen, which is one argument for considering fanciful possibilities in fiction. Taking your best point estimate at every step of the way will not give a realistic simulation of reality.  Reality occasionally includes the unexpected.

4.  We can apply this reasoning to sports scores, for example. Football games can be predicted to an accuracy of about 14 points (that is, the difference between the score differential and the point spread has an approximate normal distribution with mean 0 and standard deviation 14); see chapter 1 of Bayesian Data Analysis and some data here.  Looking at these data:

– The average difference between winner’s and loser’s score is 12 points.
– The average spread (point prediction of difference between winner and loser) is 5.3 points.
– 71% of the time, the score is more extreme (in difference between winner’s and loser’s score) than the spread.  (The favorite beats the spread in about half the games, and in another 20% or so of the games, the underdog actually wins by a larger margin than the favorite was favored.)
– The distribution of actual game outcomes (as measured by score differentials) is more extreme than the distribution of the point predictions.

GD Star Rating
Tagged as: , ,
Trackback URL:
  • Yes of course point estimates of expected values, including Bayesian estimates, will fall in the middle of the distribution of truth. My point was that perhaps we can understand human point estimates as often being moves in an Easter egg like game, rather than expected values. In this case we need to look at actual human point estimates.

  • EthanJ

    Robin, in the Easter Egg game, are you positing simultaneous moves or sequential moves for choice of search location?

    If sequential, then there is a clear difference between strategies for the first and any subsequent player. The first player searches in the most likely location – an expected value solution. But the search radius of any player is fairly discrete – much more so than the continuous nature of the probability distribution. As such, the second player should look for a location that has the highest expected value, after taking into account the search radius of the first player (and the consequent decrease in likelihood of success within that radius). So we quickly get into the Easter Egg dynamic you describe, where Bayesian estimations of overall probability are adjusted by the expected utility of searching in a particular location based on the number of others already searching that location.

    But if play is simultaneous, then there is no information about previously-claimed search territory.

    Further, I think it depends on whether the game is competitive or cooperative. The more intense the competition (because of payoff size or relative gains/loss), the more an Easter Egg dynamic would emerge as the expected value of losing falls sharply compared to the benefit of covering all search territory.

  • Ethan, the simultaneous version is the easiest to analyze, and if the time to move is small compared to the time to search, I don’t think it makes much difference.

  • EthanJ

    In a simultaneous Easter Egg game, no player has information about the other player’s choices. So a player choosing a search location must weight their choice of location by their distribution of the probability of finding the egg, modified by their estimation of the likely strategies of their opponents. Doesn’t the uncertainty of that second level consideration remove the Easter Egg dynamic of seeking to search not where the total probabilities are greatest, but where the probability-per-marginal-searcher are greatest?

  • sa

    ethanj above makes an interesting point.

  • Ethan, I’m not sure you understand the concept of a game theoretic equilibrium. Imagine there are two places to look, A and B, and three people, 1, 2, and 3. If the chance the egg is at A is 2/3, then one simultaneous choice equilibrium is 1 and 2 look at A, and 3 looks at B.

  • EthanJ

    I may have misunderstood your game. By “sequential”, I meant that the players would play in a definite order, with complete knowledge of the preceeding players’ moves. I meant by “simultaneous” that the players would have no knowledge of where the other players are looking prior to revealing their choices – and that there was no ORDER of play. As such, 3 has no knowledge that 1 and 2 have chosen A. Indeed, when 3 is choosing, 1 and 2 have not chosen yet at all! 3 knows that the egg has 2/3 chance to be at A. But 3 has no way to know if 1 and/or 2 will chose A. As the moves are simultaneous, 3 can only guess based on what he thinks 1 and 2 are likely to choose. Does he think 1 or 2 is especially clever? In this case, the egg hunt dynamics break down.

    Perhaps my use of a distinction between simultaneous and sequential play is non-standard – it has been awhile since I took Game Theory.

  • EthanJ

    In short – why in your simultaneous equilibrium does 3 choose B and not A?

  • Ethan, you need to go learn game theory.

  • EthanJ

    Strangely, I *did* take a course on Game Theory in college and quite enjoyed it. And upon checking with Wikipedia (admittedly, not a standard text), my use of “simultaneous” and “sequential” is perfectly appropriate.

    In the simultaneous game, why does 3 choose B? Why doesn’t 1 choose B? After all, there is no play order, so no player can infer what a ‘previous’ player would have done. If 1 chooses A, why don’t they all choose A? Why is 3 special?

    Or did you mean something else (more technical and counter-intuitive) by “simultaneous”?

  • Ethan, I’m not going to write you a whole tutorial on game theory here. End of discussion.

  • Doug S.

    Ethan, an equilibrium is reached when no player could have done better with a different strategy, assuming nobody else could have chosen a different strategy. Hence, the following are all equilibrium states:

    1 and 2 always search A while 3 always searches B
    1 and 3 always search A while 2 always searches B
    2 and 3 always search A while 1 always searches B
    1, 2, and 3 all search A with probability 2/3 and B with probability 1/3

    The choice of which player searches B by himself is arbitrary.

  • EthanJ

    Robin, I didn’t ask for one. And yes, your solution is an equilibrium. But there’s no way for that equilibrium to emerge (other than by chance) in the situation you described in your post. That is, players cannot deduce the equilibrium prior to the first move and play accordingly.

    Thanks Doug.

    In a new field, where the first rounds of the game are played simultaneously, before the equilibrium emerges, we should see the distribution that Andrew describes – players clustering around the locations with highest expected value.

    It is only later, after the equilibrium has been reached, (especially as more players enter the game) that the sequential-play dynamics of the easter egg hunt emerge, under which opinions more closely approximate truth.

    You suggested that truth’s opinion value might vary in different topics. And it might – I’d suggest it varies over time, relating to the ‘maturity’ of the topic, for the above reason.

  • Ethan, imagine that A is worth 2/3 and B is worth 1/3 and that there are a hundred searchers. If everyone decides to search A, and one searcher realizes that this will happen, he can do better by searching B. And if everyone reasons that way, then everyone will search B, which is even sillier. So obviously the game-theoretic equilibrium is going to involve randomizing your moves or looking at what other players are doing, but it’s certainly not going to involve all searchers going to the same node. There’s a standard way to calculate the equilibrium randomization strategy where nobody can do better by tweaking their parameters, which I like totally forget, but it’s a really elementary calculation, which is why Robin Hanson told you to go read up on basic game theory.

  • EthanJ

    Thanks Eliezer.

  • Verdade?

    Robin Hanson pergunta: How does the distribution of truth compare to the distribution of opinion? That is, consider some spectrum of possible answers, like the point difference in a game, or the sea level rise in the next century. On…