Keeping score

A while back Robin Hanson mentioned that somebody should design a mechanism keep to track of pundits’ ‘scores’. Robin notes that there is no feedback mechanism to help the public figure out how accurate pundits’ predictions have been. Moreover, given the way the media works, it’s not clear they have an incentive to find accurate/reliable pundits, rather than entertaining and/or provocative ones. Currently, ‘pundits’ don’t suffer a reputational cost for their blunders. (Anybody recall weapons of mass destruction?) An academic way to go, of course, is to write a critical journal article (also nice at tenure time). For example, the economist/philosopher Erik Angner has written a very nice paper (2006), ‘Economists as Experts: Overconfidence in theory and practice,’ Journal of Economic Methodology 13(1): 1-24 [Fulltext (subscription required); Penultimate draft] in which he analyzes the (somewhat un-inspiring) track-record of Anders Aslund. (Aslund, you may recall, was the Swedish economists that suggested ‘shock therapy’ when he acted as an advisor to the Russian government between 1991 and 1994.) Of course, few pundits deserve such thorough treatment. So, we still await a nice mechanism to ‘score’ and aggregate pundits’ track-records (in the way, say, Ebay merchants are scored). Given the role pundits & talking-heads play in validating (and creating a narrative for) important public policy decisions, this could perform an important public service role. Of course, there is a catch-22 lurking here because it probably requires ‘experts’ to rate/score the ‘experts.’ But given that various foundations are willing to spend serious money on tracking media bias, why not fund a pundit score-keeping institute? In a future post, I’ll discuss the ‘serious’ business of science–and the need for score-keeping in it.

GD Star Rating
Tagged as:
Trackback URL:
  • Anyone interested in this may want to check out Ken Waight’s “Lying in Ponds” blog, which primarily tracks pundit partisanship but also tracks any clear-cut predictions made by pundits.

  • Doug S.

    Science Fiction author David Brin (, has been frequently calling for such “prediction registries” to judge the accuracy of various pundits.

  • Doug S.

    One quote from the “Lying in Ponds” Philosophy page caught my attention.

    “When two people agree on everything, it’s pretty certain that only one is doing the thinking.”

    Do we need to tell him about the mathematics of disagreement?

  • JMG3Y

    Hmmm. Maybe an assessment system could be established for the professional politicians as well under which their renumeration is proportional to the social benefits and costs of their individual actions if office. As many will pass on before a sufficiently precise assessment of their actions can be performed, this would have to include the wealth of their descendants. Having done “good”, you and your descendants flourish. You and your descendants keep you pension for some defined period. Having done sufficiently “bad”, you and your descendants have to give the value of it all back, all the salary, the perks and all the pension. With interest.

    Politicians could shift their responsibility/accountability onto the consultants they contract with. They could buy insurance. They could cast their net out and provide a prize for the best ideas (ala chapter 1 of Wikinomics, which I’ve only browsed as yet). They could devise a mechanism of real-time voting by their constituents, making use of this evolving technology for rapid communication. Lobbyists and their supporters would be roped in for a cut of responsibility, which might markedly increase the value of their advice. Get a “bad” tax break or government policy through? You gotta pay society back. Pundits could be evaluated by their willingness to step up to support the policy they are advocating with some jeopardy when the politician in the maelstrom calls. If they will only write columns for the New York Times, that says something. Sort of the ultimate in transparency and open source. Should be a really interesting market for everyone.

  • I propose a market-based system for keeping score. If the system works as intended, then an emergent property is scores for any pundit without the pundit:

    1. participating in the markets
    2. approving of the markets
    3. needing to be aware of the markets

    Interested to know if others think a system along these lines could work as intended. If not, why not. If so, how could the model be improved either theoretically or practically.

  • One exception to the lack of scoring is Robert Cringley, a computer columnist for PBS whose weekly column has been a source of insights for about a decade. At the beginning of each year he does a prediction column, and at the end of the year he goes back and sees how he did. Historically he does pretty well (I think his average is 75%) considering that he’s speculating on outcomes of a complex world and is always conservative in the calculation of his score (for example, he didn’t give himself point for the iPhone, even though he predicted it for 2006, because it was announced one week too late).

    Alas, few people are so dedicated to accurate prediction, and need to have the cost put upon them.

  • Doug S.

    I don’t like the adjudication process in your Truth Markets. Quite simply, the voters have little incentive to pick the answer that corresponds with the evidence. There’s a big difference between asking what people believe and asking what is true. Sometimes, crowds can be damn stupid; a lot of people believe an awful lot of nonsense. For example, consider the struggle to keep creationism out of the public schools in the United States. Evolution fares much worse in public opinion polls than it does in scientific peer-reviewed journals. Regardless of what personal beliefs you have regarding religion, it is clear that most people have incorrect religious beliefs, because none of the many mutually incompatible religious beliefs are held by more than half of the world’s population. Crowds of experts are usually going to be probably better than a single expert chosen at random, but a crowd of ignorant people isn’t going to be as good as choosing one expert at random.

    Richard Feynman once told a story that illustrates this point.Suppose you want to find out the length of the nose of the Emperor of China. However, nobody is allowed to actually see the emperor of China, so you can’t go and measure his nose. Therefore, you instead ask as many people as you can what they think the length of the Emperor’s nose is, and then you take the average! Since you have so many measurements, the resulting average must be very accurate, right?

  • You make a good point. What if you were to weight individual judgments by the the judge’s own “truthiness” rating for past claims (i.e. their NAV value)? I can see a bootstrap issue, but here’s a proposed process: For phase one we only allow Futures claims (not Currents). During this phase, a base of expert predictors emerges. Phase two, we introduce Currents (which are more subjective), but at this point the experts will have a voice commensurate with their track record on prediction. Judges that are new to the system start with a baseline but can build a reputation over time.

  • TGGP

    It might not be all that relevant, but the best piece on “shock therapy” I’ve read is Comparing Apples. Usually Russia is the only country discussed, but as that points out, it is something of an out-lier when it comes to its development after the fall of the USSR. The term shock therapy originally came from other former Soviet countries, but it didn’t become well known until Russia tried to imitate them.

    This post has pretty much been copied from other ones I made at gnxp and Sailer, which I was reminded of by the mention of Aslund, who tries to defend his reputation with regard to Russia here.