# Share likelihood ratios, not posterior beliefs

When I think of Aumann's agreement theorem, my first reflex is to average.  You think A is 80% likely; my initial impression is that it's 60% likely.  After you and I talk, maybe we both should think 70%.  "Average your starting beliefs", or perhaps "do a weighted average, weighted by expertise" is a common heuristic.

But sometimes, not only is the best combination not the average, it's more extreme than either original belief.

Let's say Jane and James are trying to determine whether a particular coin is fair.  They both think there's an 80% chance the coin is fair.  They also know that if the coin is unfair, it is the sort that comes up heads 75% of the time.

Jane flips the coin five times, performs a perfect Bayesian update, and concludes there's a 65% chance the coin is unfair.  James flips the coin five times, performs a perfect Bayesian update, and concludes there's a 39% chance the coin is unfair.  The averaging heuristic would suggest that the correct answer is between 65% and 39%.  But a perfect Bayesian, hearing both Jane's and James's estimates – knowing their priors, and deducing what evidence they must have seen - would infer that the coin was 83% likely to be unfair.  [Math footnoted.]

Perhaps Jane and James are combining this information in the middle of a crowded tavern, with no pen and paper in sight.  Maybe they don't have time or memory enough to tell each other all the coins they observed.  So instead they just tell each other their posterior probabilities – a nice, short summary for a harried rationalist pair.  Perhaps this brevity is why we tend to average posterior beliefs.

However, there is an alternative.  Jane and James can trade likelihood ratios.  Like posterior beliefs, likelihood ratios are a condensed summary; and, unlike posterior beliefs, sharing likelihood ratios actually works.

Let's listen in on a conversation where Jane and James trade likelihood ratios:

JANE: My observations are seven and a half times as likely if the coin is unfair, as if it is fair.

JAMES:  My observations are two and half times as likely if the coin is unfair, as if it is fair.

BOTH, in unison: That means our joint observations are about nineteen times as likely if the coin is unfair. But our prior for unfair coins is 20%, which means a prior odds ratio of 1:4.  Combining with Bayes' theorem, we get (1:4)*(19:1), which is about 5:1 in favor of an unfair coin.

[BAR PATRONS edge away slightly.]

Now that you see how sharing likelihood ratios can work, you'll probably be itching to put them to work in your daily life.  As with most rationalist tricks, it helps to have a number of cached examples of places they can be used.

(1) Distinguish evidence from priors. I've been in several conversations that went roughly like this:

Person A:  So, what do you think of Jack?

Person B: My guess is that he's moderately (smart/trustworthy/whatever), but not extremely so.

Person A: Is the "not extremely so" because you observed evidence Jack isn't, or because most people aren't and you don't have much data that Jack is?  Where's the peak of your likelihood function?

This type of dialog is useful.  Let's say that A's initial impression is that Jack is amazing, and B's impression is that Jack is somewhat less amazing.  If B knows Jack well, A should lower her estimate of Jack.  But if B's impression come from a tiny amount of amazing-looking data from Jack — just not enough to pull Jack all the way from "probably normal" to "probably amazing" — A should raise her estimate.  B's posterior expectations about Jack's amazingness are identical in the two cases, even though B's observations in the two cases have opposite implications for A.  Trading likelihoods notices the difference, but trading average posterior impressions doesn't.

(2)  Avoid double-counting your priors. Robin Hanson suggested adjusting women's SAT-math scores toward the mean (downward for high-scoring women, upward for low-scoring women) if women's math aptitudes have a smaller standard deviation than men's.  Moral intuitions aside, adjusting in this manner would improve the scores' accuracy if used as stand-alone estimates of strangers' SAT-math abilities; perhaps a woman with a single score of 800 has the same expected score on subsequent SAT-math tests (the same best point estimate for "true SAT-math ability") as a man who received a single score of, say, 770.

However, adjusting scores in this manner mixes likelihoods in with priors.  SAT scores are best interpreted as likelihood functions: an SAT score of 800 has one likelihood from a person whose "true ability" is superb, another from a person whose "true ability" is moderate, etc.  If you mix these likelihood functions with your prior (as gender-adjusted SAT scores would mix them), combining multiple indicators becomes more difficult.  For example: suppose again that a single SAT-math score of 800 from a woman implies the same best point estimate of "true ability" as a single score of 770 in a man (because of differing priors plus the chance of testing error).  Two SAT-math retests of 800 in a woman will then imply a higher best point estimate of true ability than two 770's in a man.  The "gender-adjustment" would work for single, stand-alone SAT measurements, but it breaks when multiple indicators are combined.  If you mix the likelihood function with your prior and then combine it with other mixed indicators (e.g., multiple gender-adjusted SAT scores, or gender-adjusted SAT scores plus gender-adjusted letters of rec), you pull too strongly toward the prior.

The take-home in all these cases is to keep hold of your likelihood ratios.  Instead of tracking how likely your lead theory is to be true, or remembering a single theory which was most representative of the range of remaining possibilities (like Jack's average expected amazingness), try to track how likely your data-set is under one hypothesis vs. another.  (You'll need to separately remember the prior on each hypothesis.)  I suspect such tracking may also help with confirmation bias; I don't know if it ends up confusing people in other ways.

One major caveat: in our example with the coin, and in A's and B's estimations of Jack's amazingness, combining likelihood ratios led to more extreme beliefs.  (In more general examples, combining likelihood ratios may not lead to more extreme beliefs, but it almost always leads to more specific beliefs.)  When trying this yourself, make sure the likelihood ratios you're combining are independent indicators of the variable you're trying to infer.  Otherwise, you and your co-rationalists may pull one another to beliefs that are unjustifiably extreme (or specific).

(Math for original example:
James, to end up with a 39% posterior on the coin being heads-weighted, must have seen four heads and one tail:
P(four heads and one tail| heads-weighted) = (0.75^4 ∙ 0.25^1) = 0.079.  P(four heads and one tail | fair) = 0.031.  P(heads-weighted | five heads) = (0.2∙0.079)/(0.2∙0.079 + 0.8∙0.031) = 0.39, which is the posterior belief James reports.
Jane must similarly have seen five heads and zero tails.
Plugging the total nine heads and one tail into Bayes' theorem:
P(heads-weighted | nine heads and a tail) = ( 0.2 ∙ (0.75^9 ∙ 0.25^1) ) / ( 0.2 ∙ (0.75^9 ∙ 0.25^1) + 0.8 ∙ (0.5^9 ∙ 0.5^1) ) = 0.83, giving us a posterior belief of 83% that the coin is heads-weighted.)

GD Star Rating
Tagged as: , ,
• http://gov.state.ak.us Maglick

This was a very helpful post.

• http://atheorist.livejournal.com Johnicholas

The article implies that, in order to combine likelihood ratios, you need to reason from the likelihood ratio, together with your knowledge of the other person’s prior, to obtain the outcome of the other person’s experiment. Even this procedure isn’t given algebraically, only the “forward” direction, from the outcome of the other person’s experiment to the likelihood ratio.

Is there an algebraic way to combine likelihood ratios, without going through the outcome of the other person’s experiment?

• http://profile.typekey.com/arundelo/ Aaron Brown

in our example with the balls

Should this be “coin”?

• http://profile.typekey.com/sentience/ Eliezer Yudkowsky

To combine likelihood ratios, you just multiply them. That’s the simplicity of it.

If you see evidence with odds of 4 to 1 if a hypothesis is true versus false, and I see evidence with odds of 2 to 1 if the hypothesis is true versus false, then together we have seen evidence with a likelihood ratio of 8 to 1 in favor of the hypothesis’s truth.

• Charlie Ullman

Didn’t Jaynes suggest logging the likelihood ratios as well? Then you could just add them. Of course, maybe logging is trickier to do in the bar.

• frelkins

Chris Hibbert’s method seems simpler, and I like Hal Finney’s suggestions for updating. Maybe Chris & Hal will weigh in with their pleasant methods soon. Overall, very nice Anna!

• http://profile.typekey.com/Psy-Kosh/ Psy-Kosh

Anna and Steve: Oh, hey, that’s cool, thanks. Never really realized trading likelihoods makes it that easy, though in retrospect…

Anyways, thanks for that!

Carlie: logging may be good when the numbers are well behaved and when the conversion to nats or bits or whatever is the “obviously right thing to do”… However, from numerical analysis type stuff, it’s known that when actually computing stuff, and working with finite amount of precision and thus trunctuation error and so on, adding things incurs a much higher error rate than multiplying them, as a general rule. So, especially with limited precision, better to do multiplying of the likelihood rations than adding of the logs, IMHO

• Carl Shulman

These calculations seem to assume that the evidence available to the different parties is non-overlapping. Outside of cases like coin tossing or forming judgments about a population from random samples, that’s a dubious assumption.

• Carl Shulman

Which, of course, you mention at the end.

• http://jed.jive.com/ Jed Harris

This may well be the most useful post ever on Overcoming Bias. It is also right on the target defined by the title…

One problem. The post assumes readers know what likelihood ratios are. Random people trying to make good decisions won’t know (and won’t know the point Eliezer makes either). Also as you pointed out and several commenters noted the hard part is independence.

So you need to back out a bit and write a post that explains how to know your likelihood ratio and how to judge the independence of LRs from different sources. Preferably write the explanation mostly with examples and without relying on the term “likelihood”.

Then XKCD can translate it into a cartoon poster and the world will be saved.

One other point. This seems very closely related to “saving the appearances” and similar observations from the history of science. Basically it looks like scientific revolutions change the priors of the scientists involved, but rarely change evidence that has been accepted by consensus up to then. Of course the resulting beliefs can change radically… but if everyone is talking in LRs this is less of a problem.

• steven

Aumann assumes common priors so the posterior estimates contain the same info as the likelihood ratios. I agree LRs are more arithmetically convenient in cases like this where background assumptions cause convergence in a single step, but what happens with many iterations?

• Greg

Another advantage of using likelihood ratios is that they’re well-defined even in situations where prior probabilities cannot be sensibly quantified.
‹can of worms›

(In more general examples, combining likelihood ratios may not lead to more extreme beliefs, but it almost always leads to more specific beliefs.)

What does “more specific beliefs” mean? “The Flying Spaghetti Monster personally directly causes every individual particle’s movement constantly” is a very specific belief.

I think it’s easier to see the role of likelihood ratios if you look at the Bayes’ formula expressed in the right way. Here I shamelessly plug an old introductory blog post of mine that contains those.

• Arthur B.

“James, to end up with a 39% posterior on X being heads-weighted, must have seen four heads and one tail:”

Or 81 heads and 46 tails ~ 39.45%

Your perfect Bayesian needs a prior on the number of trial seen by each participant.

• Arthur B.

(and since ln(2/3)/ln(2) is irrational there’s an infinite number of infinitely close approximations of 39%, you need to find a and b such that a ln(2/3) + b ln(2) ~ ln((1/0.39 -1) * 0.2/0.8)

(16 and 8 -> ~ 39.08% is even closer)

• Steve Rayhawk

steven: If there are common priors, and Jane and James want to know the value of θ and are communicating likelihood ratios or relative likelihood functions p(data|θ), and background assumptions do not cause convergence in a single step, then there must be another relevant variable ζ whose value Jane and James do not know, and Jane and James must be communicating marginal likelihoods of the data with uncertainty about ζ integrated out. What you have described is almost parallel Aumann updating of conditional beliefs about ζ for each value of θ as ζ relates to the data. We don’t know how to write that post yet. Until then, Jane and James should share conditional likelihood functions p(data|θ;ζ).

• Anna Salamon

Aaron, thanks. Fixed.

Arthur, if your Bayesians trade posterior beliefs (as Jane and James do in our initial example) then, as you point out, they need a prior on how many coins the other party has seen. (We hoped the “five times” would be read as part of the problem specification, but our writing was ambiguous, so it’s good you pointed it out.) Also, you may well know this already, but for anyone else: if Jane and James instead trade likelihood ratios (as they do in our second example), they don’t need to know how many coins the other party has seen. It’s another nice feature of working with likelihood ratios. Likelihood ratios combine “how much data have you seen?” and “how strongly did your data point to [the coin’s unfairness / Jack’s amazingness / whatever] into a single number.

Greg, we mean for example that the region of “how much amazingness Jack might plausibly have” will shrink as you pool more data about Jack (e.g., after a while, maybe we’re 90% certain that Jack’s amazingness is between the 73rd and 74th percentiles). The more data you pool, the more sharply your data can distinguish between differing theories, including theories that are fairly close to one another (e.g., “Jack is at the 73rd percentile of amazingness” vs “at the 74th”), and so, in that sense, the more sharp (which we were glossing as “specific”) your posterior is likely to be.

• Liron

A helpful and well-written post. Aumann’s agreement theorem gets mentioned so much, I’m surprised we haven’t had an example like this on OB earlier. In particular, I had been wondering whether the agreement theorem says that the two parties can end up with an estimate which is not between their two individual estimate.

I want to write an OB post too…

• Liron

Under what conditions does the naive “average posterior probability weighted by expertise” heuristic work?

• Z. M. Davis

Liron: “Aumann’s agreement theorem gets mentioned so much, I’m surprised we haven’t had an example like this on OB earlier.”

See Hal Finney’s “Coin Guessing Game” from two years ago.

>>> Likelihood ratios combine “how much data have you seen?” and “how strongly did your data point to [the coin’s unfairness / Jack’s amazingness / whatever] into a single number.

Suppose we have two sets of observations:

Set 1: 2000 heads, 1000 tails
Set 2: 2 heads, 1 tail

If I understand the term ‘likelihood ratio’ correctly, the likelihood ratios here are the same for the both observation sets, 2:1. If so, I can’t tell “how much data have you seen” judging from the ratio alone. Yes, I can get a good guess for a ratio like 1562:1, but that won’t work with ratios like 2:1.

• Anna Salamon

A “likelihood ratio” is how likely your observations are under alternative theories. In the weighted coin example, it is
your likelihood ratio for your first set of observations is P( Set 1 | weighted coin ) / P ( Set 1 | fair coin ) = ( .75^2000 * .25^1000 ) / (.5^2000 * .5^1000) = 10^51. (I.e., that first set of observations is *very* strong support for the theory that the coin is weighted.)

In contrast, the likelihood ratio for your second observation set is P( Set 2 | weighted coin) / P( Set 2 | fair coin ) = (.75^2 * .25) / (.5^2 * .5) = 1.12, i.e. the second set of observations is 1.12 times as likely to occur if you have the weighted coin as if you have the fair coin.

That said, I did not mean to say you can infer “how much data I’ve seen” from my likelihood ratio. What I meant to say is that everything the two of you need, if you are to correctly update your posterior beliefs, is contained in your prior plus your and the other person’s likelihood ratios. That is, with likelihood ratios you do not need to keep separate track of “how much data have I seen?” and “how extreme was the data?” — a single number tells you the important part of both measurements.

• GreedyAlgorithm

Vladimir: Likelihood ratio isn’t “guess the probability the coin lands heads”, it’s “which is more likely, a fair coin or a 75% heads coin?”

P(Set1|75%) = (0.75)^2000*(0.25)^1000*C(3000,2000) and
P(Set1|50%) = (0.5)^3000*C(3000,2000)

so the likelihood ratio from Set 1 is P(Set1|75%)/P(Set1|50%) = 1.4 x 10^51, and from Set 2 it’s only 1.125. Set 1 is wildly improbable but it’s hugely more likely to result from a 75% coin than a 50% coin.

• Anna Salamon

Jed, Thanks for the encouragement and suggestions. I’ll play with your suggestions, re: scientists and re: a more general audience. Do you know any good writeups to draw from? By far the best I’ve found are Eliezer’s An intuitive explanation and A technical explanation (and Jaynes, if we include books, though I haven’t yet read most of it).

Frelkins, thanks. Could you point me to the explanations by Hal and by Chris Hibbert?

Liron, good question. One simple example where the heuristic roughly works is if you have a weighted coin, with a uniform prior over coin-weights, and you and your partner are each estimating the probability that the coin will come up heads on the next toss. (“More expert” here equates to the number of coins you have each seen). A second is if you and your partner are both estimating a random variable (e.g., a person’s “true math ability”) and each of your measurements is the sum of the person’s “true math ability” and a normally distributed random error term. (The “more expert” of you either has more of these measurements or a smaller error term). Anyone want to step in here with a general analysis?

• http://profile.typekey.com/robinhanson/ Robin Hanson

On the SAT score adjustment, most people do not know that male scores have a higher variance than female scores, nor do they know how much more variance, nor do they know how to combine gender means and variances with one or more particular scores to produce a posterior estimate. So in practice just publishing raw scores will mostly result in ignoring those differing distributions. Yes, you’d want to adjust an estimate based on multiple tests differently that from a single test, but it still seems to me that in practice we’d be better off if the testing agency did this math and published a single best estimate. After all, if you really knew what you are doing, you could use your knowledge of the means and variances to invert their calculation to obtain the likelihood ratio you seek.

• http://profile.typekey.com/halfinney/ Hal Finney

In a comment to my old posting on the coin guessing game linked to above by Z.M. Davis, I gave an example I’d like some help with (slightly modified here):

Jane and James each privately toss a coin. They want to guess the probability that both coins land heads. Let’s suppose the coins are in fact both heads.

The prior for both heads is 1/4, but having observed that their own coin is heads, each estimates the probability to be 1/2. So round one goes:

They seemingly agree. However, upon exchanging these values, however, each can immediately change their probability estimate to 1. That’s because hearing “1/2” from the other player means their coin must have landed heads, since if it had been tails they would have known the probability for both heads was 0. Round 2:

So this is another example where exchanging Bayesian estimates leads to an updated estimate outside the range of the two. And it’s even more curious to me, because they seemingly agreed from the first, and yet they both changed. In my article I raised the question of whether, when exchanging Bayesian estimates, one might see several rounds of disagreement, then agreement, then more disagreement? I also claimed that in the famous 3-hats puzzle, and its generalization to N hats, one might see multiple rounds of estimates that agree, followed by a change in estimate (but I haven’t tried to verify that claim). This leads to the question of how Bayesians can know that they have truly reached agreement.

I tried to work my problem with likelihood ratios, but I got the wrong answer. P(I observe heads | both are heads) = 1. P(I observe heads | not both are heads) = 1/3, because there are 3 non-both-heads possibilities: TH, HT, TT. This gives a likelihood ratio of 3. Now if they multiply their likelihood ratios, they get 9, but in fact the odds for two heads have changed from 1:3 before, to 1:0 or infinity after the exchange. What am I doing wrong?

This leads to the question of how Bayesians can know that they have truly reached agreement.

I think they know, when they could have predicted each others’ stated estimates with certainty. Because that means the estimates provided no new information. In this example, they couldn’t have predicted each others’ estimates in the first round, which could have been 0 or 1/2 with equal probability. But they could have predicted each others’ estimates in the second round.

Now if they multiply their likelihood ratios, they get 9, but in fact the odds for two heads have changed from 1:3 before, to 1:0 or infinity after the exchange. What am I doing wrong?

Independence is missing in this case, so you can’t just multiply them. If you want independence, you have to let Jane and James each observe a coin toss chosen randomly and independently from two coin tosses, instead of letting them each observe a different coin toss.

I think in general, it’s easier to constructively force likelihood ratios to be independent, than to know that two arbitrary likelihood ratios are independent. It’s a bit similar to how you can write a program with certain properties, but can’t know whether an arbitrary program has that property.

• http://profile.typekey.com/halfinney/ Hal Finney

Wei, thanks, that makes sense about convergence in Bayesian updating. That’s very surprising that Jane and James each observing a private coin flip is not independent! Of course their observations are not independent of the outcome, but then that would always be the case for relevant information. I certainly would have thought that observing a private coin flip would be independent information.

For the example you describe, we have 2 coins flipped out of sight, then each player is shown a randomly chosen coin, and they don’t know if they saw the same coin or different ones? Let’s assume again that both coins are heads. I think the likelihood ratio, upon seeing heads, is still 3. A priori odds are 1:3. This checks out, multiplying prior odds times likelihood gives odds of 1:1 or 50% for heads, which is correct. Exchanging likelihood ratios and multiplying them gives 9, for final odds of 3:1 in favor of both heads, or probability of 3/4.

But I get a different answer if I count. There are 16 possibilities for the two coins and each observing the Left or Right coin:
HHLL, HHLR, HHRL, HHRR, HTLL, HTLR, HTRL, HTRR, THLL, THLR, THRL, THRR, TTLL, TTLR, TTRL, TTRR.
After observing heads, Jane (the 1st player) knows it is one of:
HHLL, HHLR, HHRL, HHRR, HTLL, HTLR, THRL, THRR.
This gives P=1/2 for both heads, which is correct. After learning that James also saw heads (otherwise he would have said P=0), she knows it is one of:
HHLL, HHLR, HHRL, HHRR, HTLL, THRR.
Of these 6, 4 are both heads, giving P=2/3 for both heads, or odds ratio of 2:1, not the same as what I got before. This answer seems more likely to be correct.

Maybe I’m making a dumb mistake, or perhaps I misunderstood your example for independence?

Hal, you’re right and my example isn’t independent either. This is trickier than it seems. In order to multiply odds ratios, we need

In general,

So we need

which fails to hold in both examples. In Hal’s example, knowing you observed heads makes it less likely (actually impossible) for me to observe heads if not both are heads. In my example, knowing you observed heads makes it more likely for me to observe heads because it rules out the “both tails” possibility.

I certainly would have thought that observing a private coin flip would be independent information.

In order to multiply odds ratios, we need our individual observations to be independent conditional on the hypothesis being true, and independent conditional on the hypothesis being false. In Hal’s example, the observations are unconditionally independent, but not independent when conditioned on “not both heads”. (In my example, the two observations are just not independent, period. Don’t know what I was thinking!)

One example of multiplying odds ratios Eliezer gave in http://yudkowsky.net/rational/bayes is three independent tests for breast cancer. But in real life, it is impossible to find three lab tests that are independent, conditional on both breast cancer, and on no breast cancer. In the no breast cancer case, especially, getting one false positive should increase the probability of the lab being sloppy, or having one’s blood mixed up, or having a benign tumor, or something else that increases the probability of getting false positive on another test.

I’m not sure what lesson can be drawn from these examples, except “beware dependence”?

• http://rhollerith.com/blog Richard Hollerith

Great post! I would like to read more posts like this one dependent on math and less fiction.

• http://rhollerith.com/blog Richard Hollerith

Since people are asking for help, I’ll take the liberty of asking for help on the problem of the cab fare and the extra twenty.

• Joe from London

James, to end up with a 39% posterior on the coin being heads-weighted, must have seen four heads and one tail:
P(four heads and one tail| heads-weighted) = (0.75^4 ∙ 0.25^1) = 0.079. P(four heads and one tail | fair) = 0.031. P(heads-weighted | five heads) = (0.2∙0.079)/(0.2∙0.079 + 0.8∙0.031) = 0.39, which is the posterior belief James reports

I think most of these numbers should strictly speaking be multiplied by five. The end result is the same, but we should say e.g.
P(four heads and one tail| heads-weighted) = (0.75^4 ∙ 0.25^1 ∙ 5) = .396

I mention this only in case anyone else saw the maths and thought “huh, that’s not right”. Agree with the end product, though, and great post.

• Pingback: Grad School | διά πέντε / dia pente

• Timo Timo

Whoa. Useful. 0o Thankyou.

• Sebastian Nickel

Typo in the maths footnotes: