Weighing Scientists

The latest (top science journal) Nature has an editorial on the need for better ways to communicate expert uncertainty on key topics like climate change, and a two-pager by Willy Aspinall on “More tractable expert advice“:

Of the many ways of gathering advice from experts, the Cooke method is, in my view, the most effective when data are sparse, unreliable or unobtainable. … Take as an example an elicitation I conducted in 2003, to estimate the strength of the thousands of small, old earth dams in the United Kingdom. Acting as facilitator, I first organized a discussion between a group of selected experts. … The experts were then asked individually to give their own opinion of the time-to-failure in a specific type of dam, once such leakage starts. They answered with both a best estimate and a ‘credible interval’, for which they thought there was only a 10% chance that the true answer was higher or lower.

I also asked each expert a set of eleven ‘seed questions’, for which answers are known, so that their proficiency could be calibrated. One seed question, for example, asked about the observed time-to-failure of the Teton Dam in Idaho in June 1976. Again the specialists answered with a best estimate and a credible interval. Their performance on these seed questions was used to ‘weight’ their opinion, and these weighted opinions were pooled to provide a ‘rational consensus’. For the UK dams, the performance-weighted solution indicated a best estimate of time-to-failure of 1,665 hours (70 days) — much higher than if all of the answers were pooled with equal weights (6.5 days).

So what evidence shows this is “most effective”?  First, other methods aren’t perfect:

The traditional committee still rules in many areas … Committees traditionally give all experts equal weight (one person, one vote). This assumes that experts are equally informed, equally proficient and free of bias. These assumptions are generally not justified.  … The Delphi method … involves getting ‘position statements’ from individual experts, circulating these, and allowing the experts to adjust their own opinions over multiple rounds. … What often happens is that participants revise their views in the direction of the supposed ‘leading’ experts, rather than in the direction of the strongest arguments. … Another, more recent elicitation method … involves asking each participating expert to predict the range of uncertainty estimates of any and every person in the field. This creates a huge spread of overall uncertainty, and sometimes physically implausible results: it once implied higher earthquake risk in Switzerland than in California.

Second, more folks are using and liking it:

We were able to provide useful guidance to the authorities, such as the percentage chance of a violent explosion, as quickly as within an hour or two. … More than 14 years on, volcano management in Montserrat stands as the longest-running application of the Cooke method. … The Cooke approach is starting to take root in similar problem areas, including climate-change impacts on fisheries, invasive-species threats and medical risks. … Participants found the elicitation “extremely useful”, and suggested that it would be helpful in assessing other public-health issues.

And, that’s it.  No lab or field trials comparing this method to others are offered, nor can I find them elsewhere, nor are they mentioned under future “research questions.”  Strangely, it is as if scientist Aspinall and his top science journal editor never considered scientific trials!

People complain that we don’t have enough kinds trials comparing prediction markets to other methods – not enough different topics, methods, timescales, etc. And they aren’t always dramatically better.  But, hey, at least we have some trials.  Why doesn’t Aspinall even list prediction markets among competing methods?

My guess: scientists must manage a delicate balance.  Their customers mainly want to affiliate with prestigious academics, and so need official estimates to come attached to the names and affiliations of such academics.  They’d be fine with a simple vote of a committee of prestigious academics.  But the scientists want to estimate in more “scientific” ways, with numbers, procedures, etc.  Weighting votes by test scores fits that bill.

Prediction market estimates might be more accurate, but they less clearly come attached to prestigious academics, so scientist customers won’t like them.  Each scientist also prefers when possible to affiliate with other prestigious academics, and so also prefers to join prestigious committees, however weighed, than to trade in prediction markets.

Alas, only strong clear evidence that prediction markets are more accurate on such topics is likely to embarrass scientists and their customers enough to get them to prefer markets to committees.  And neither are eager to pay for trials that might produce such evidence.

GD Star Rating
Tagged as: , ,
Trackback URL:
  • They’d be fine with a simple vote of a committee of prestigious academics. But the scientists want to estimate in more “scientific” ways, with numbers, procedures, etc.
    That was actually part of Moldbug’s critique of the motivation for futarchy. Of course, he also thinks the committee is an example of faux-scientific legitimacy.

    The Cooke method sounds like it actually would be better than the other ones mentioned to me, though of course we could be more confident if we had good data.

  • And, that’s it. No lab or field trials comparing this method to others are offered, nor can I find them elsewhere, nor are they mentioned under future “research questions.” Strangely, it is as if scientist Aspinall and his top science journal editor never considered scientific trials!

    Nice. Scientifically formed opinions are great… except w/ respect to scientifically formed opinions. Compartmentalization rules.

  • Robert Bloomfield

    Here is a slightly different explanation for why prediction markets are not mentioned: a combination of physical scientists’ ignorance of and discomfort with economics. A Delphi process is a pretty straightforward algorithm that can be modeled in ways most physical scientists are used to. A prediction market is almost impossible to model–even economists have been unable to identify the dynamically optimal trading strategies in a double-auction market (the most common type of prediction market, now used almost universally in securities markets). Economists are comfortable with imposing very strong assumptions to get predictions on overall market behavior (even if we can’t understand individual behavior in markets very well), and the evidence suggests that those models work reasonably well. But Mr. Aspinall is probably not familiar with the models, which are very different from most in the physical sciences, probably wouldn’t care for them if he knew them a bit better, and is almost surely unaware of the experimental and empirical evidence supporting them.

    There is an old saying, “don’t attribute to bad intent what can easily be attributed to incompetence.” A bit strong in this case, but I think a corollary question is in order here: why attribute to desires for prestige and affiliation what can easily be attributed to (well, sorry Mr. Aspinall, the word is too strong) incompetence.

    • Jess Riedel


      At least in regards to the issue of motivation. Robin Hanson’s critique of Willy Aspinall’s not requiring field trials still stands.

    • Jess Riedel

      More constructively, how could we distinguish between these two explanations?

      I think the lack of scientific rigor (independent of which possibilities Willy Aspinall chose to consider) points toward incompetence.

    • Remember this is the top science journal! That makes mere incompetence rather unlikely as an explanation – very few incompetent articles make it into Nature.

      • Andrew Gelman

        Robin: I disagree with you there. Acceptance into Nature or Science is a bit of a crapshoot. My impression is that the social science papers they publish are sometimes pretty wacky.

      • Andrew, what about Nature statistics articles; are those often incompetent?

      • Andrew Gelman

        I didn’t know that Nature published statistics articles. My impression is that advances in statistics are published in stat journals or sometimes in journals in related fields (econ, poli sci, sociology, psychology, CS, or (in the case of computational methods) physics), but not in Science or Nature. If stat articles appear in Science or Nature, I’m not sure what they’re about.

  • Don’t these solve different problems? Prediction markets don’t provide a way of making predictions so much as providing an incentive for someone to find a good way to make predictions. Something like the Cooke method might still be used by a prediction market investor.

  • billb

    I don’t see how a prediction market is going to help you predict the eruption of a volcano or the failure of a dam. Can you explain how you’d set up such a market?

  • Darin Johnson

    What about this method in cases where the correct answer is never revealed? How would you have a prediction market for time to failure of a dam, to use Aspinall’s example, given that it’s unlikely the dam will ever fail? The market would never close.

    It seems to me we are stuck with eliciting expert opinion the old fashioned way for questions like this. Am I missing something?

    • @billb, @Darin

      If failures are rare, then you can run a market that asks about the probability of any one failing. The Foresight Exchange has had claims about earthquakes for quite a while. So far, they aren’t providing any information that isn’t already in the USGS estimates.

      But if you had situations where you had individual experts (or people with ability to do the research) and no consensus estimate, then setting up a market would elicit a prediction. If you use a variable payout claim based on date, you can get a probability even for unlikely events.

  • Bill

    Re: Weighing Scientists

    From my statistical research, scientists do not as a group weigh more than other individuals in the general population.

  • Pingback: Tweets that mention Overcoming Bias : Weighing Scientists -- Topsy.com()

  • CJ

    For what it’s worth, the idea of weighing experts is used in machine and statistical learning as well. For example, in classification problems they will combine various classifiers by weighting each according to its effectiveness. (And yes, there are versions that use Bayes’ Rule to compute optimal weightings, but that’s really neither here nor there.)

  • Aron

    To hell with trials. Do you think the Wikipedia founders should have first started with trials to test the accuracy and breadth of their method?

    If prediction markets are so awesome, then there should be opportunities to use them all around the place. Get yourself a clever idea and do it.

    It’s not embarassment you need to cultivate, it’s jealousy and greed.

    Your moonlighting gig counts for this I suppose.

  • gimli4thewest

    As one who spent many monotonous hours trying to measure the amount of chlorine in one gram of a chemical compound to within 0.001%, I find the enterprise of measuring the temperature the entire Earth’s atmosphere to within a hundredth of a degree in doubt at best.

    Although I thank God I left science for business, I am applaud those who attempt to collect such lofty and tedious data.

  • gimli4thewest

    P.S. I am appalled, I do applaud, was meant as Yogism.

  • Northwest rain

    The “scientists” in Montserrat seem to ignore the need to use data to back up their opinions. Some are also using laws of physics invented ONLY for Montserrat.

    I was well trained in the Scientific Method — hypothesis testing, awareness of EB (Experimenter Bias) etc. Through the years I’ve read thousands of journal articles in dozens of research fields — Cognition, Learning, Neurology, Earth Sciences, Biology, and Behavioral Sciences. In comparison to these journal articles, the “stuff” published by the Montserrat SAC is more science fiction than real science.

    Errors of using probability numbers without DATA are found in the UN Climate Report — that bit about 90% probability that the Himalayan glaciers will melt by 2035 (or 2050). This the same sort of “science” found in the Montserrat Scientific Advisory Committee’s reports. Why bother with data when “scientists” can come up with Wild a** guesses (SWAG) and present that as “science”.

    How can a massive population NOT have an effect on the climate — history of humans shows that we do impact the environment — Easter Island was a lab of sorts.

    When the truth should be enough — why have some Climate scientists chosen the SCARE science route? This sort of garbage makes scientists look bad.

    The problem with yelling Wolf — when creature is only a perhaps a mouse — means that fewer people will believe the ones calling the alert. So that when a real scientist comes along, with a real warning about impending disaster — that person might not be believed.