Weighing Scientists

The latest (top science journal) Nature has an editorial on the need for better ways to communicate expert uncertainty on key topics like climate change, and a two-pager by Willy Aspinall on “More tractable expert advice“:

Of the many ways of gathering advice from experts, the Cooke method is, in my view, the most effective when data are sparse, unreliable or unobtainable. … Take as an example an elicitation I conducted in 2003, to estimate the strength of the thousands of small, old earth dams in the United Kingdom. Acting as facilitator, I first organized a discussion between a group of selected experts. … The experts were then asked individually to give their own opinion of the time-to-failure in a specific type of dam, once such leakage starts. They answered with both a best estimate and a ‘credible interval’, for which they thought there was only a 10% chance that the true answer was higher or lower.

I also asked each expert a set of eleven ‘seed questions’, for which answers are known, so that their proficiency could be calibrated. One seed question, for example, asked about the observed time-to-failure of the Teton Dam in Idaho in June 1976. Again the specialists answered with a best estimate and a credible interval. Their performance on these seed questions was used to ‘weight’ their opinion, and these weighted opinions were pooled to provide a ‘rational consensus’. For the UK dams, the performance-weighted solution indicated a best estimate of time-to-failure of 1,665 hours (70 days) — much higher than if all of the answers were pooled with equal weights (6.5 days).

So what evidence shows this is “most effective”?  First, other methods aren’t perfect:

The traditional committee still rules in many areas … Committees traditionally give all experts equal weight (one person, one vote). This assumes that experts are equally informed, equally proficient and free of bias. These assumptions are generally not justified.  … The Delphi method … involves getting ‘position statements’ from individual experts, circulating these, and allowing the experts to adjust their own opinions over multiple rounds. … What often happens is that participants revise their views in the direction of the supposed ‘leading’ experts, rather than in the direction of the strongest arguments. … Another, more recent elicitation method … involves asking each participating expert to predict the range of uncertainty estimates of any and every person in the field. This creates a huge spread of overall uncertainty, and sometimes physically implausible results: it once implied higher earthquake risk in Switzerland than in California.

Second, more folks are using and liking it:

We were able to provide useful guidance to the authorities, such as the percentage chance of a violent explosion, as quickly as within an hour or two. … More than 14 years on, volcano management in Montserrat stands as the longest-running application of the Cooke method. … The Cooke approach is starting to take root in similar problem areas, including climate-change impacts on fisheries, invasive-species threats and medical risks. … Participants found the elicitation “extremely useful”, and suggested that it would be helpful in assessing other public-health issues.

And, that’s it.  No lab or field trials comparing this method to others are offered, nor can I find them elsewhere, nor are they mentioned under future “research questions.”  Strangely, it is as if scientist Aspinall and his top science journal editor never considered scientific trials!

People complain that we don’t have enough kinds trials comparing prediction markets to other methods – not enough different topics, methods, timescales, etc. And they aren’t always dramatically better.  But, hey, at least we have some trials.  Why doesn’t Aspinall even list prediction markets among competing methods?

My guess: scientists must manage a delicate balance.  Their customers mainly want to affiliate with prestigious academics, and so need official estimates to come attached to the names and affiliations of such academics.  They’d be fine with a simple vote of a committee of prestigious academics.  But the scientists want to estimate in more “scientific” ways, with numbers, procedures, etc.  Weighting votes by test scores fits that bill.

Prediction market estimates might be more accurate, but they less clearly come attached to prestigious academics, so scientist customers won’t like them.  Each scientist also prefers when possible to affiliate with other prestigious academics, and so also prefers to join prestigious committees, however weighed, than to trade in prediction markets.

Alas, only strong clear evidence that prediction markets are more accurate on such topics is likely to embarrass scientists and their customers enough to get them to prefer markets to committees.  And neither are eager to pay for trials that might produce such evidence.

GD Star Rating
Tagged as: , ,
Trackback URL: