We Agree On So Much

Jun 24, 2019

In a standard Bayesian model of beliefs, an agent starts out with a prior distribution over a set of possible states, and then updates to a new distribution, in principle using all the info that agent has ever acquired. Using this new distribution over possible states, this agent can in principle calculate new beliefs on any desired topic.

Regarding their belief on a particular topic then, an agent’s current belief is the result of applying their info to update their prior belief on that topic. And using standard info theory, one can count the (non-negative) number of info bits that it took to create this new belief, relative to the prior belief. (The exact formula is Sumi pi log2(pi/qi), where pi is the new belief, qi is the prior, and i ranges over possible answers to this topic question.)

How much info an agent acquires on a topic is closely related to how confident they become on that topic. Unless a prior starts out very confident, high confidence later can only come via updating on a great many info bits.

Humans typically acquire vast numbers of info bits over their lifetime. By one estimate, we are exposed to 34GB per day. Yes, as a practical matter we can’t remotely make full use of all this info, but we do use a lot of it, and so our beliefs do over time embody a lot of info. And even if our beliefs don’t reflect all our available info, we can still talk about the number of bits are embodied in any given level of confidence an agent has on a particular topic.

On many topics of great interest to us, we acquire a huge volume of info, and so become very confident. For example, consider how confident you are at the moment about whether you are alive, whether the sun is shining, that you have ten fingers, etc. You are typically VERY confident about such things, because have access to a great many relevant bits.

On a great many other topics, however, we hardly know anything. Consider, for example, many details about the nearest alien species. Or even about the life of your ancestors ten generations back. On such topics, if we put in sufficient effort we may be able to muster many very weak clues, clues that can push our beliefs in one direction or another. But being weak, these clues don’t add up to much; our beliefs after considering such info aren’t that different from our previous beliefs. That is, on these topics we have less than one bit of info.

Let us now collect a large broad set of such topics, and ask: what distribution should we expect to see over the number of bits per topic? This number must be positive, for many familiar topics it is much much larger than one, while for other large sets of topics, it is less than one.

The distribution most commonly observed for numbers that must be positive yet range over many orders of magnitude is: lognormal. And so I suggest that we tentatively assume a (large-sigma) lognormal distribution over the number of info bits that an agent learns per topic. This may not be exactly right, but it should be qualitatively in the ballpark.

One obvious implication of this assumption is: few topics have nearly one bit of info. That is, most topics are ones where either we hardly know anything, or where we know so much that we are very confident.

Note that these typical topics are not worth much thought, discussion, or work to cut biases. For example, when making decisions to maximize expected utility, or when refining the contribution that probabilities on one topic make to other topic probabilities, getting 10% of one’s bits wrong just won’t make much of difference here. Changing 10% of 0.01 bit makes still leaves one’s probabilities very close to one’s prior. And changing 10% of a million bits still leaves one with very confident probabilities.

Only when the number of bits on a topic is of order unity do one’s probabilities vary substantially with 10% of one’s bits. These are the topics where it can be worth paying a fixed cost per topic to refine one’s probabilities, either to help make a decision or to help update other probability estimates. And these are the topics where we tend to think, talk, argue, and worry about our biases.

It makes sense that we tend to focus on pondering such “talkable topics”, where such thought can most improve our estimates and decisions. But don’t let this fool you into thinking we hardly agree on anything. For the vast majority of topics, we agree either that we hardly know anything, or that we quite confidently know the answer. We only meaningfully disagree on the narrow range of topics where our info is on the order of one bit, topics where it is in fact worth the bother to explore our disagreements.

Note also that for these key talkable topics, making an analysis mistake on just one bit of relevant info is typically sufficient to induce large probability changes, and thus large apparent disagreements. And for most topics it is quite hard to think and talk without making at least one bit’s worth of error. Especially if we consume 34GB per day! So its completely to be expected that we will often find ourselves disagreeing on talkable topics at the level of few bits.

So maybe cut yourself and others a bit more slack about your disagreements? And maybe you should be more okay with our using mechanisms like betting markets to average out these errors. You really can’t be that confident that it is you who has made the fewest analysis errors.

Overcoming Bias

We Agree On So Much