We frequently encounter competing estimates of politically salient magnitudes. One example would be the number of attendees at the 1995 “Million Man March”. Obviously, frequently the estimates emanate from biased observers seeking to create or dispel an impression of strength. Someone interested in generating a more neutral estimate might consider applying what I would call the Malatesta Estimator, which I have named after its formulator, the 14th Century Italian mercenary captain, Galeotto Malatesta of Rimini (d. abt. 1385). His advice was: “Take the mean between the maximum given by the exaggerators, and the minimum by detractors, and deduct a third” (Saunders 2004). This simplifies into: the sum of the maximum and the minimum, divided by three. It adjusts for the fact that the minimum is bounded below by zero, while there is no bound on the maximum. Of course, it only works if the maximum is at least double the minimum.

As a mathematician, the obvious solution to this problem seems to me to be to take the geometric mean of the max and min, rather than either the arithmetic mean or the Malatesta estimate. That solves the problem of the minimum being bounded below without the arbitrariness of subtracting a third or the awkwardness of not working when the maximum and minimum are "too close". It also gives more intuitive estimates if the maximum and minimum differ greatly on a logarithmic scale, e.g. if the maximum is a million and the minimum is 10,000, then the estimate according to the geometric mean is 100,000, which seems like a reasonable estimate given those two bounds, unlike the Malatesta estimate of 337k, which seems too high. Finally, I should point out that in the example you give, the geometric mean estimate does approximately as well as Malatesta estimate (the geometric mean estimate would be 894k).

No reason to defer to an estimate just because it has a fancy name attached.

Anders, you are probably right if we can limit our Max and Min estimates to people who are acting rationally. There are complications though. One is pronouncements by persons who seek to be dramatic and have no concern with being believed. A second issue is the problem of "many" or "gazillions". In the Bible and the Middle East one finds frequent use of the number 40, e.g., Ali Baba and the Forty thieves, Moses wandering in the desert for 40 years, 40 paras equals one piastre, etc. Possibly or apparently in some ur-Semitic language the word for 40 and the word for myriad sounded alike, perhaps modulo vowels. Some size estimates may be nothing more than a loose way of saying many, many, many supporters. I suspect that in these cases something like a Benford's Law will be at work, so that we get disproportionately many estamiates that begin with the number 1, i.e., one hundred, one thousand, etc.

If I want to convince you that there were many supporters at my latest rally for protection of footnotes, I cannot name an arbitrarily high number. You have a prior estimate of how likely the size would be (given the issue), and at some point the probability that my claim is just a lie will become significantly larger than the probability that the claim is true. So I should ideally keep below this number if I want to be believed.

Maybe we can estimate my believability as the ratio between the probability of the claimed size and the probability of a lie. If the later is constant, my most believable claim should be what maximizes your prior. But to me, I want to maximize the believability *and* claimed number. A reasonable strategy might be to maximize the number times believability, i.e. x*P(x demonstrators|I claim y, your priors). If you deduce this, you will revise your estimate downwards, and so on. I think this can be solved for a given initial probability distribution of demonstration sizes (say a lognormal). Similarly fro deliberate underestimations.

David: I teach at a B-school and am responsible in a very small way for loosening those hordes. Unfortunately, the subject I teach, management, doesn't really lend itself to the dissemination of the Malatesta Estimator. I will have to proseltyize among some of my colleagues so that I can drive the probability to 1.Anders: I am uncomfortable using the term "probability distribution" with respect to the data to which one would apply the estimator. The data we observe are subjective estimates where the persons making the estimate have vested interests in the magnitude of their estimates, and are not using a neutral methodology. In the oil case you linked to, I noticed that in most cases the estimates would not have met the criterion that K2 be equal to or greater than K1. In fact, they were often quite close, presumably because all parties were geologists using standard textbook methods. In the crowd estimation case the estimator worked, at least before some proponents decided to come up with absurd estimates. I guess the Malatesta requires biased estimates not totally out of touch with reality.

Googling, I just found one application, estimating the amount of Venezuelan oil:http://fallbackbelmont.blog...

It seems to work well. Given the initial numbers in this article (30-50,000 vs 200,000) and the estimate based on occupation density (60,000) the estimator does well (76,000).http://dir.salon.com/story/...

I wonder for what probability distributions of variables and claims the estimator works best? If the minimum is K1 times the variable and the maximum is K2 times, the estimator becomes unbiased when K1+K2=3. Do we have any reason to think that this is a common occurence? In the above examples K1=~0.5, K2=~2 and K1=~0.5,K2=~6.

Shh. This is prime time recruiting for MBA grads, and the consultancies, I-banks, and fund managers are busily putting their candidates through "case" interviews and the such. "How many phone booths are there in New York? What's the market size for iPods in Mexico? I'd estimate that the chance a MBA grad is going to use the phrase "Malatesta Estimator" in an interview tomorrow is about 100%.

PS - I conned a friend into giving me a guess of 25%. Yes, that means it's likely that there is a 42% probability of the phrase being used tomorrow.

As a mathematician, the obvious solution to this problem seems to me to be to take the geometric mean of the max and min, rather than either the arithmetic mean or the Malatesta estimate. That solves the problem of the minimum being bounded below without the arbitrariness of subtracting a third or the awkwardness of not working when the maximum and minimum are "too close". It also gives more intuitive estimates if the maximum and minimum differ greatly on a logarithmic scale, e.g. if the maximum is a million and the minimum is 10,000, then the estimate according to the geometric mean is 100,000, which seems like a reasonable estimate given those two bounds, unlike the Malatesta estimate of 337k, which seems too high. Finally, I should point out that in the example you give, the geometric mean estimate does approximately as well as Malatesta estimate (the geometric mean estimate would be 894k).

No reason to defer to an estimate just because it has a fancy name attached.

Anders, you are probably right if we can limit our Max and Min estimates to people who are acting rationally. There are complications though. One is pronouncements by persons who seek to be dramatic and have no concern with being believed. A second issue is the problem of "many" or "gazillions". In the Bible and the Middle East one finds frequent use of the number 40, e.g., Ali Baba and the Forty thieves, Moses wandering in the desert for 40 years, 40 paras equals one piastre, etc. Possibly or apparently in some ur-Semitic language the word for 40 and the word for myriad sounded alike, perhaps modulo vowels. Some size estimates may be nothing more than a loose way of saying many, many, many supporters. I suspect that in these cases something like a Benford's Law will be at work, so that we get disproportionately many estamiates that begin with the number 1, i.e., one hundred, one thousand, etc.

If I want to convince you that there were many supporters at my latest rally for protection of footnotes, I cannot name an arbitrarily high number. You have a prior estimate of how likely the size would be (given the issue), and at some point the probability that my claim is just a lie will become significantly larger than the probability that the claim is true. So I should ideally keep below this number if I want to be believed.

Maybe we can estimate my believability as the ratio between the probability of the claimed size and the probability of a lie. If the later is constant, my most believable claim should be what maximizes your prior. But to me, I want to maximize the believability *and* claimed number. A reasonable strategy might be to maximize the number times believability, i.e. x*P(x demonstrators|I claim y, your priors). If you deduce this, you will revise your estimate downwards, and so on. I think this can be solved for a given initial probability distribution of demonstration sizes (say a lognormal). Similarly fro deliberate underestimations.

David: I teach at a B-school and am responsible in a very small way for loosening those hordes. Unfortunately, the subject I teach, management, doesn't really lend itself to the dissemination of the Malatesta Estimator. I will have to proseltyize among some of my colleagues so that I can drive the probability to 1.Anders: I am uncomfortable using the term "probability distribution" with respect to the data to which one would apply the estimator. The data we observe are subjective estimates where the persons making the estimate have vested interests in the magnitude of their estimates, and are not using a neutral methodology. In the oil case you linked to, I noticed that in most cases the estimates would not have met the criterion that K2 be equal to or greater than K1. In fact, they were often quite close, presumably because all parties were geologists using standard textbook methods. In the crowd estimation case the estimator worked, at least before some proponents decided to come up with absurd estimates. I guess the Malatesta requires biased estimates not totally out of touch with reality.

Googling, I just found one application, estimating the amount of Venezuelan oil:http://fallbackbelmont.blog...

It seems to work well. Given the initial numbers in this article (30-50,000 vs 200,000) and the estimate based on occupation density (60,000) the estimator does well (76,000).http://dir.salon.com/story/...

I wonder for what probability distributions of variables and claims the estimator works best? If the minimum is K1 times the variable and the maximum is K2 times, the estimator becomes unbiased when K1+K2=3. Do we have any reason to think that this is a common occurence? In the above examples K1=~0.5, K2=~2 and K1=~0.5,K2=~6.

Shh. This is prime time recruiting for MBA grads, and the consultancies, I-banks, and fund managers are busily putting their candidates through "case" interviews and the such. "How many phone booths are there in New York? What's the market size for iPods in Mexico? I'd estimate that the chance a MBA grad is going to use the phrase "Malatesta Estimator" in an interview tomorrow is about 100%.

PS - I conned a friend into giving me a guess of 25%. Yes, that means it's likely that there is a 42% probability of the phrase being used tomorrow.

Cheers!

David Rotor