When I think of Aumann's agreement theorem, my first reflex is to average. You think A is 80% likely; my initial impression is that it's 60% likely. After you and I talk, maybe we both should think 70%. "Average your starting beliefs", or perhaps "do a weighted average, weighted by expertise" is a common heuristic.

James, to end up with a 39% posterior on the coin being heads-weighted, must have seen four heads and one tail:P(four heads and one tail| heads-weighted) = (0.75^4 ∙ 0.25^1) = 0.079. P(four heads and one tail | fair) = 0.031. P(heads-weighted | five heads) = (0.2∙0.079)/(0.2∙0.079 + 0.8∙0.031) = 0.39, which is the posterior belief James reports

I think most of these numbers should strictly speaking be multiplied by five. The end result is the same, but we should say e.g.P(four heads and one tail| heads-weighted) = (0.75^4 ∙ 0.25^1 ∙ 5) = .396

I mention this only in case anyone else saw the maths and thought "huh, that's not right". Agree with the end product, though, and great post.

Hal, you're right and my example isn't independent either. This is trickier than it seems. In order to multiply odds ratios, we need

P(we both observe heads | not both are heads) =P(I observe heads | not both are heads) * P(you observe heads | not both are heads)

In general,

P(we both observe heads | not both are heads) =P(I observe heads | not both are heads and you observe heads) * P(you observe heads | not both are heads)

So we need

P(I observe heads | not both are heads) = P(I observe heads | you observe heads and not both are heads)

which fails to hold in both examples. In Hal's example, knowing you observed heads makes it less likely (actually impossible) for me to observe heads if not both are heads. In my example, knowing you observed heads makes it more likely for me to observe heads because it rules out the "both tails" possibility.

I certainly would have thought that observing a private coin flip would be independent information.

In order to multiply odds ratios, we need our individual observations to be independent conditional on the hypothesis being true, and independent conditional on the hypothesis being false. In Hal's example, the observations are unconditionally independent, but not independent when conditioned on "not both heads". (In my example, the two observations are just not independent, period. Don't know what I was thinking!)

One example of multiplying odds ratios Eliezer gave in http://yudkowsky.net/ration... is three independent tests for breast cancer. But in real life, it is impossible to find three lab tests that are independent, conditional on both breast cancer, and on no breast cancer. In the no breast cancer case, especially, getting one false positive should increase the probability of the lab being sloppy, or having one's blood mixed up, or having a benign tumor, or something else that increases the probability of getting false positive on another test.

I'm not sure what lesson can be drawn from these examples, except "beware dependence"?

Wei, thanks, that makes sense about convergence in Bayesian updating. That's very surprising that Jane and James each observing a private coin flip is not independent! Of course their observations are not independent of the outcome, but then that would always be the case for relevant information. I certainly would have thought that observing a private coin flip would be independent information.

For the example you describe, we have 2 coins flipped out of sight, then each player is shown a randomly chosen coin, and they don't know if they saw the same coin or different ones? Let's assume again that both coins are heads. I think the likelihood ratio, upon seeing heads, is still 3. A priori odds are 1:3. This checks out, multiplying prior odds times likelihood gives odds of 1:1 or 50% for heads, which is correct. Exchanging likelihood ratios and multiplying them gives 9, for final odds of 3:1 in favor of both heads, or probability of 3/4.

But I get a different answer if I count. There are 16 possibilities for the two coins and each observing the Left or Right coin:HHLL, HHLR, HHRL, HHRR, HTLL, HTLR, HTRL, HTRR, THLL, THLR, THRL, THRR, TTLL, TTLR, TTRL, TTRR.After observing heads, Jane (the 1st player) knows it is one of:HHLL, HHLR, HHRL, HHRR, HTLL, HTLR, THRL, THRR.This gives P=1/2 for both heads, which is correct. After learning that James also saw heads (otherwise he would have said P=0), she knows it is one of:HHLL, HHLR, HHRL, HHRR, HTLL, THRR.Of these 6, 4 are both heads, giving P=2/3 for both heads, or odds ratio of 2:1, not the same as what I got before. This answer seems more likely to be correct.

Maybe I'm making a dumb mistake, or perhaps I misunderstood your example for independence?

This leads to the question of how Bayesians can know that they have truly reached agreement.

I think they know, when they could have predicted each others' stated estimates with certainty. Because that means the estimates provided no new information. In this example, they couldn't have predicted each others' estimates in the first round, which could have been 0 or 1/2 with equal probability. But they could have predicted each others' estimates in the second round.

Now if they multiply their likelihood ratios, they get 9, but in fact the odds for two heads have changed from 1:3 before, to 1:0 or infinity after the exchange. What am I doing wrong?

Independence is missing in this case, so you can't just multiply them. If you want independence, you have to let Jane and James each observe a coin toss chosen randomly and independently from two coin tosses, instead of letting them each observe a different coin toss.

I think in general, it's easier to constructively force likelihood ratios to be independent, than to know that two arbitrary likelihood ratios are independent. It's a bit similar to how you can write a program with certain properties, but can't know whether an arbitrary program has that property.

In a comment to my old posting on the coin guessing game linked to above by Z.M. Davis, I gave an example I'd like some help with (slightly modified here):

Jane and James each privately toss a coin. They want to guess the probability that both coins land heads. Let's suppose the coins are in fact both heads.

The prior for both heads is 1/4, but having observed that their own coin is heads, each estimates the probability to be 1/2. So round one goes:

They seemingly agree. However, upon exchanging these values, however, each can immediately change their probability estimate to 1. That's because hearing "1/2" from the other player means their coin must have landed heads, since if it had been tails they would have known the probability for both heads was 0. Round 2:

Jane: P(both heads) = 1James: P(both heads) = 1

So this is another example where exchanging Bayesian estimates leads to an updated estimate outside the range of the two. And it's even more curious to me, because they seemingly agreed from the first, and yet they both changed. In my article I raised the question of whether, when exchanging Bayesian estimates, one might see several rounds of disagreement, then agreement, then more disagreement? I also claimed that in the famous 3-hats puzzle, and its generalization to N hats, one might see multiple rounds of estimates that agree, followed by a change in estimate (but I haven't tried to verify that claim). This leads to the question of how Bayesians can know that they have truly reached agreement.

I tried to work my problem with likelihood ratios, but I got the wrong answer. P(I observe heads | both are heads) = 1. P(I observe heads | not both are heads) = 1/3, because there are 3 non-both-heads possibilities: TH, HT, TT. This gives a likelihood ratio of 3. Now if they multiply their likelihood ratios, they get 9, but in fact the odds for two heads have changed from 1:3 before, to 1:0 or infinity after the exchange. What am I doing wrong?

On the SAT score adjustment, most people do not know that male scores have a higher variance than female scores, nor do they know how much more variance, nor do they know how to combine gender means and variances with one or more particular scores to produce a posterior estimate. So in practice just publishing raw scores will mostly result in ignoring those differing distributions. Yes, you'd want to adjust an estimate based on multiple tests differently that from a single test, but it still seems to me that in practice we'd be better off if the testing agency did this math and published a single best estimate. After all, if you really knew what you are doing, you could use your knowledge of the means and variances to invert their calculation to obtain the likelihood ratio you seek.

Jed, Thanks for the encouragement and suggestions. I'll play with your suggestions, re: scientists and re: a more general audience. Do you know any good writeups to draw from? By far the best I've found are Eliezer's An intuitive explanation and A technical explanation (and Jaynes, if we include books, though I haven't yet read most of it).

Frelkins, thanks. Could you point me to the explanations by Hal and by Chris Hibbert?

Liron, good question. One simple example where the heuristic roughly works is if you have a weighted coin, with a uniform prior over coin-weights, and you and your partner are each estimating the probability that the coin will come up heads on the next toss. ("More expert" here equates to the number of coins you have each seen). A second is if you and your partner are both estimating a random variable (e.g., a person's "true math ability") and each of your measurements is the sum of the person's "true math ability" and a normally distributed random error term. (The "more expert" of you either has more of these measurements or a smaller error term). Anyone want to step in here with a general analysis?

so the likelihood ratio from Set 1 is P(Set1|75%)/P(Set1|50%) = 1.4 x 10^51, and from Set 2 it's only 1.125. Set 1 is wildly improbable but it's hugely more likely to result from a 75% coin than a 50% coin.

Vladimir,A "likelihood ratio" is how likely your observations are under alternative theories. In the weighted coin example, it isyour likelihood ratio for your first set of observations is P( Set 1 | weighted coin ) / P ( Set 1 | fair coin ) = ( .75^2000 * .25^1000 ) / (.5^2000 * .5^1000) = 10^51. (I.e., that first set of observations is *very* strong support for the theory that the coin is weighted.)

In contrast, the likelihood ratio for your second observation set is P( Set 2 | weighted coin) / P( Set 2 | fair coin ) = (.75^2 * .25) / (.5^2 * .5) = 1.12, i.e. the second set of observations is 1.12 times as likely to occur if you have the weighted coin as if you have the fair coin.

That said, I did not mean to say you can infer "how much data I've seen" from my likelihood ratio. What I meant to say is that everything the two of you need, if you are to correctly update your posterior beliefs, is contained in your prior plus your and the other person's likelihood ratios. That is, with likelihood ratios you do not need to keep separate track of "how much data have I seen?" and "how extreme was the data?" -- a single number tells you the important part of both measurements.

>>> Likelihood ratios combine "how much data have you seen?" and "how strongly did your data point to [the coin's unfairness / Jack's amazingness / whatever] into a single number.

If I understand the term 'likelihood ratio' correctly, the likelihood ratios here are the same for the both observation sets, 2:1. If so, I can't tell "how much data have you seen" judging from the ratio alone. Yes, I can get a good guess for a ratio like 1562:1, but that won't work with ratios like 2:1.

Typo in the maths footnotes:"P(heads-weighted | five heads)" -> should be: P(heads-weighted | four heads and one tail)

Whoa. Useful. 0o Thankyou.

James, to end up with a 39% posterior on the coin being heads-weighted, must have seen four heads and one tail:P(four heads and one tail| heads-weighted) = (0.75^4 ∙ 0.25^1) = 0.079. P(four heads and one tail | fair) = 0.031. P(heads-weighted | five heads) = (0.2∙0.079)/(0.2∙0.079 + 0.8∙0.031) = 0.39, which is the posterior belief James reports

I think most of these numbers should strictly speaking be multiplied by five. The end result is the same, but we should say e.g.P(four heads and one tail| heads-weighted) = (0.75^4 ∙ 0.25^1 ∙ 5) = .396

I mention this only in case anyone else saw the maths and thought "huh, that's not right". Agree with the end product, though, and great post.

Since people are asking for help, I'll take the liberty of asking for help on the problem of the cab fare and the extra twenty.

Great post! I would like to read more posts like this one dependent on math and less fiction.

Hal, you're right and my example isn't independent either. This is trickier than it seems. In order to multiply odds ratios, we need

P(we both observe heads | not both are heads) =P(I observe heads | not both are heads) * P(you observe heads | not both are heads)

In general,

P(we both observe heads | not both are heads) =P(I observe heads | not both are heads and you observe heads) * P(you observe heads | not both are heads)

So we need

P(I observe heads | not both are heads) = P(I observe heads | you observe heads and not both are heads)

which fails to hold in both examples. In Hal's example, knowing you observed heads makes it less likely (actually impossible) for me to observe heads if not both are heads. In my example, knowing you observed heads makes it more likely for me to observe heads because it rules out the "both tails" possibility.

I certainly would have thought that observing a private coin flip would be independent information.

In order to multiply odds ratios, we need our individual observations to be independent conditional on the hypothesis being true, and independent conditional on the hypothesis being false. In Hal's example, the observations are unconditionally independent, but not independent when conditioned on "not both heads". (In my example, the two observations are just not independent, period. Don't know what I was thinking!)

One example of multiplying odds ratios Eliezer gave in http://yudkowsky.net/ration... is three independent tests for breast cancer. But in real life, it is impossible to find three lab tests that are independent, conditional on both breast cancer, and on no breast cancer. In the no breast cancer case, especially, getting one false positive should increase the probability of the lab being sloppy, or having one's blood mixed up, or having a benign tumor, or something else that increases the probability of getting false positive on another test.

I'm not sure what lesson can be drawn from these examples, except "beware dependence"?

Wei, thanks, that makes sense about convergence in Bayesian updating. That's very surprising that Jane and James each observing a private coin flip is not independent! Of course their observations are not independent of the outcome, but then that would always be the case for relevant information. I certainly would have thought that observing a private coin flip would be independent information.

For the example you describe, we have 2 coins flipped out of sight, then each player is shown a randomly chosen coin, and they don't know if they saw the same coin or different ones? Let's assume again that both coins are heads. I think the likelihood ratio, upon seeing heads, is still 3. A priori odds are 1:3. This checks out, multiplying prior odds times likelihood gives odds of 1:1 or 50% for heads, which is correct. Exchanging likelihood ratios and multiplying them gives 9, for final odds of 3:1 in favor of both heads, or probability of 3/4.

But I get a different answer if I count. There are 16 possibilities for the two coins and each observing the Left or Right coin:HHLL, HHLR, HHRL, HHRR, HTLL, HTLR, HTRL, HTRR, THLL, THLR, THRL, THRR, TTLL, TTLR, TTRL, TTRR.After observing heads, Jane (the 1st player) knows it is one of:HHLL, HHLR, HHRL, HHRR, HTLL, HTLR, THRL, THRR.This gives P=1/2 for both heads, which is correct. After learning that James also saw heads (otherwise he would have said P=0), she knows it is one of:HHLL, HHLR, HHRL, HHRR, HTLL, THRR.Of these 6, 4 are both heads, giving P=2/3 for both heads, or odds ratio of 2:1, not the same as what I got before. This answer seems more likely to be correct.

Maybe I'm making a dumb mistake, or perhaps I misunderstood your example for independence?

This leads to the question of how Bayesians can know that they have truly reached agreement.

I think they know, when they could have predicted each others' stated estimates with certainty. Because that means the estimates provided no new information. In this example, they couldn't have predicted each others' estimates in the first round, which could have been 0 or 1/2 with equal probability. But they could have predicted each others' estimates in the second round.

Now if they multiply their likelihood ratios, they get 9, but in fact the odds for two heads have changed from 1:3 before, to 1:0 or infinity after the exchange. What am I doing wrong?

Independence is missing in this case, so you can't just multiply them. If you want independence, you have to let Jane and James each observe a coin toss chosen randomly and independently from two coin tosses, instead of letting them each observe a different coin toss.

I think in general, it's easier to constructively force likelihood ratios to be independent, than to know that two arbitrary likelihood ratios are independent. It's a bit similar to how you can write a program with certain properties, but can't know whether an arbitrary program has that property.

In a comment to my old posting on the coin guessing game linked to above by Z.M. Davis, I gave an example I'd like some help with (slightly modified here):

Jane and James each privately toss a coin. They want to guess the probability that both coins land heads. Let's suppose the coins are in fact both heads.

The prior for both heads is 1/4, but having observed that their own coin is heads, each estimates the probability to be 1/2. So round one goes:

Jane: P(both heads) = 1/2James: P(both heads) = 1/2

They seemingly agree. However, upon exchanging these values, however, each can immediately change their probability estimate to 1. That's because hearing "1/2" from the other player means their coin must have landed heads, since if it had been tails they would have known the probability for both heads was 0. Round 2:

Jane: P(both heads) = 1James: P(both heads) = 1

So this is another example where exchanging Bayesian estimates leads to an updated estimate outside the range of the two. And it's even more curious to me, because they seemingly agreed from the first, and yet they both changed. In my article I raised the question of whether, when exchanging Bayesian estimates, one might see several rounds of disagreement, then agreement, then more disagreement? I also claimed that in the famous 3-hats puzzle, and its generalization to N hats, one might see multiple rounds of estimates that agree, followed by a change in estimate (but I haven't tried to verify that claim). This leads to the question of how Bayesians can know that they have truly reached agreement.

I tried to work my problem with likelihood ratios, but I got the wrong answer. P(I observe heads | both are heads) = 1. P(I observe heads | not both are heads) = 1/3, because there are 3 non-both-heads possibilities: TH, HT, TT. This gives a likelihood ratio of 3. Now if they multiply their likelihood ratios, they get 9, but in fact the odds for two heads have changed from 1:3 before, to 1:0 or infinity after the exchange. What am I doing wrong?

On the SAT score adjustment, most people do not know that male scores have a higher variance than female scores, nor do they know how much more variance, nor do they know how to combine gender means and variances with one or more particular scores to produce a posterior estimate. So in practice just publishing raw scores will mostly result in ignoring those differing distributions. Yes, you'd want to adjust an estimate based on multiple tests differently that from a single test, but it still seems to me that in practice we'd be better off if the testing agency did this math and published a single best estimate. After all, if you really knew what you are doing, you could use your knowledge of the means and variances to invert their calculation to obtain the likelihood ratio you seek.

Jed, Thanks for the encouragement and suggestions. I'll play with your suggestions, re: scientists and re: a more general audience. Do you know any good writeups to draw from? By far the best I've found are Eliezer's An intuitive explanation and A technical explanation (and Jaynes, if we include books, though I haven't yet read most of it).

Frelkins, thanks. Could you point me to the explanations by Hal and by Chris Hibbert?

Liron, good question. One simple example where the heuristic roughly works is if you have a weighted coin, with a uniform prior over coin-weights, and you and your partner are each estimating the probability that the coin will come up heads on the next toss. ("More expert" here equates to the number of coins you have each seen). A second is if you and your partner are both estimating a random variable (e.g., a person's "true math ability") and each of your measurements is the sum of the person's "true math ability" and a normally distributed random error term. (The "more expert" of you either has more of these measurements or a smaller error term). Anyone want to step in here with a general analysis?

Vladimir: Likelihood ratio isn't "guess the probability the coin lands heads", it's "which is more likely, a fair coin or a 75% heads coin?"

P(Set1|75%) = (0.75)^2000*(0.25)^1000*C(3000,2000) andP(Set1|50%) = (0.5)^3000*C(3000,2000)

so the likelihood ratio from Set 1 is P(Set1|75%)/P(Set1|50%) = 1.4 x 10^51, and from Set 2 it's only 1.125. Set 1 is wildly improbable but it's hugely more likely to result from a 75% coin than a 50% coin.

Vladimir,A "likelihood ratio" is how likely your observations are under alternative theories. In the weighted coin example, it isyour likelihood ratio for your first set of observations is P( Set 1 | weighted coin ) / P ( Set 1 | fair coin ) = ( .75^2000 * .25^1000 ) / (.5^2000 * .5^1000) = 10^51. (I.e., that first set of observations is *very* strong support for the theory that the coin is weighted.)

In contrast, the likelihood ratio for your second observation set is P( Set 2 | weighted coin) / P( Set 2 | fair coin ) = (.75^2 * .25) / (.5^2 * .5) = 1.12, i.e. the second set of observations is 1.12 times as likely to occur if you have the weighted coin as if you have the fair coin.

That said, I did not mean to say you can infer "how much data I've seen" from my likelihood ratio. What I meant to say is that everything the two of you need, if you are to correctly update your posterior beliefs, is contained in your prior plus your and the other person's likelihood ratios. That is, with likelihood ratios you do not need to keep separate track of "how much data have I seen?" and "how extreme was the data?" -- a single number tells you the important part of both measurements.

>>> Likelihood ratios combine "how much data have you seen?" and "how strongly did your data point to [the coin's unfairness / Jack's amazingness / whatever] into a single number.

Suppose we have two sets of observations:

Set 1: 2000 heads, 1000 tailsSet 2: 2 heads, 1 tail

If I understand the term 'likelihood ratio' correctly, the likelihood ratios here are the same for the both observation sets, 2:1. If so, I can't tell "how much data have you seen" judging from the ratio alone. Yes, I can get a good guess for a ratio like 1562:1, but that won't work with ratios like 2:1.

Liron: "Aumann's agreement theorem gets mentioned so much, I'm surprised we haven't had an example like this on OB earlier."

See Hal Finney's "Coin Guessing Game" from two years ago.

Under what conditions does the naive "average posterior probability weighted by expertise" heuristic work?