The 80% Forecasting Solution

We often have remarkably poor forecasts, relative to the information around.   An exciting new way to do better is prediction markets; have people bet on your topic.  But honestly, I’d guess you can get 80% of the improvement that predict markets offer by using a much simpler solution: collect track records.

Yup, it’s that simple.   When people make forecast-like-statements, write them down in a clear standardized form, and then check back later to see who was more accurate.   Along the way, create a consensus forecast by averaging recent forecasts, perhaps weighted by accuracy or expert credentials.   If you collect enough forecasts to evaluate accuracy, and reward accuracy well enough, people will try hard to be right, and you’ll learn what kinds of people to listen to.

Now don’t get me wrong; prediction markets offer real advantages over simple track records.   Prediction markets can directly pay people with short track records for adding information.   They don’t require you to decide who is more accurate when, and they are probably updated more quickly.    On the other hand, complex trading environments discourage many people from contributing their information.   So some vendors, such as Newsfutures and HP, are experimenting with skipping the trading and just getting people to make comparable forecasts.

It is something of a puzzle that we don’t demand more track records from our advisers.   If you go to a cancer doctor, for example, he will not show you a record of his previous patients and how long they lived after his treatments.  In fact, he usually has no such records, as he has typically lost track of his old patients.   Similarly for the guy who repairs your car, or who treats your lawn, or who teaches your classes.   

Yes, it would be trouble to collect and evaluate track records.   But if customers cared enough, records would happen.   Consider that companies go through the bother of hiring auditors to check financial records because investors insist on it; why don’t we insist on track records?   Until we understand why people haven’t opted for the 80% solution of track records, I’m afraid we can’t be that optimistic about prediction markets either.

P.S.  Today I’m presenting at a Yahoo prediction markets meeting

GD Star Rating
Tagged as:
Trackback URL:
  • How much are you willing to bet on whether this prediction market idea would work or not?

  • Acad Ronin

    1) Ixnay on the weighting by expertise. We have over a quarter of a century of evidence that experts do not beat informed amateurs. For more on this point, see Philip Tetlock’s excellent (though flawed) book: Expert Political Judgment: How Good Is It? How Can We Know? (Princeton Univ. Press).

    2) We also need biased forecasters (Tetlock’s hedgehogs) to enable us to estimate risk.

    3) Uncertainty, Sec. of Defense Rumsfeld’s “unknown unkowns”, is inescapable.

  • de

    my bet is that:
    1 – people generally learned that past performance does not guarantee future results
    2 – the more you trust somebodys opinions, the more you depend on them. so you hedge it

  • Spencer

    Maybe you can get 80% of the predictive power by being more careful about who you listen to (by paying attention to track records). But if collecting and tracking those track records is more effort, or more than 80% of the effort of setting up a predictive market, then why bother?

  • Perry E. Metzger

    One must be very careful in the way that one implements such a proposal. Consider that if you have 1024 people making random predictions about 50% probability events, one of them may get a remarkable ten of those predictions right in a row, even though their method of prediction is random. Even if someone has a track record for prediction, one should not necessarily believe them!

    In case some think this is a theoretical problem, consider how many people participate in certain prediction activities (like investing), and then consider how many of them might, by pure accident, seem to be brilliant analysts when they are in fact just lucky. Once you have millions of points in your sample space, the usual “95% confidence” people prefer in scientific studies is not sufficient.

    (This reminds me — I’d love to see a blog posting about the fact that, of every 100 papers out there claiming 95% confidence, by definition 5 are wrong. With the amount of science going on today, is 95% confidence actually a sufficient metric? One can expect rafts of incorrect papers even in quite prestigious journals purely on the basis of statistics even ignoring the possibility of fraud…)

  • The collecting of such track records could be a profitable enterprise for people looking to get into prediction market arbitrage. They could capitalize on people who’re unwilling to participate in a betting market (because of other interests, or an aversion to gambling), but who have relevant and otherwise unrepresented knowledge. But I share your pessimism that this kind of product would be valued by consumers in general.

  • “But if customers cared enough, records would happen.”

    Is there yet a well-maintained registry of investment bank research analyst recommendations?

    If not, this is at least somewhat related to the reluctance of management to run internal prediction markets.. i.e, outsiders might overrate the demand for certain kinds of information in the power structure of the firm (or in a close-knit community of firms).

  • In the case of doctors there is a growing system of track records. In Great Britain there is for example an online system showing survival rates of heart surgery at different hospitals and doctors:
    Of course, presenting this kind of information right is tricky. Some doctors specialize in difficult cases and have a lower survival rate.
    But this is largely a reporting and information design problem.

    Overall I think the information design aspect is worth pursuing (maybe I should make a post on this since it is a pet subject of mine). A good information design helps people judge tricky information like risk better, like the dartboards and roulette wheels for medical risk communication
    while bad design of course biases or obscures data. It is an interesting challenge to visualise track records well.

  • With abundant blogs and auto-caching, perhaps we’re getting close to such a day…
    Though to be sure, 1) people tend not to make concrete predictions, for fear of later scrutiny, as you’ve pointed out, and 2) CEOs and presidents don’t blog, for the most part.
    (Mark Cuban is an exception, and makes quite a few concrete predictions too.)
    Have fun with the Yahooligans!

  • jck

    jason:”Is there yet a well-maintained registry of investment bank research analyst recommendations?”
    yes,but it’s private,there is a fund in the uk that tracks around 200000 analysts trade recommendations a year,and if they have access to all that research it’s because there is a good incentive for the banks: a piece of a commissions budget well north of $200 millions a year.[they have been doing that for close to 10 years].

  • Acad: Tetlock showed that there is an important domain for which weighting by expert credentials would work poorly. But he hints that the cause is the way news media reward these so-called experts for making dramatic forecasts.
    In different domains, experts face different incentives and get reputations in different ways, so I don’t expect Tetlock’s results to generalize as well as you imply.

    Perry, here’s a blog post explaining why the problem of incorrect papers is worse than you indicate, and a few ways to improve on it:

  • One interesting idea that came up in a discussion yesterday with Stuart Armstrong was to force people to state their predictions as negatives: not what they think will happen, but what will *not* happen or be the case at a certain point in time. Rather than saying “In 2003 workers who are not computer fluent will have a hard time finding a job” we might look at whether “In 2003 workers not fluent in computers can still easily find a job” is untrue. In this case it seems that the first prediction is still hard to judge, while the second is much sharper and probably untrue. I guess this is an application of falsifiability. Human tend to state their predictions in an unfalsifiable positive form, but negating them often produces a sharper falsifiable form. Of course one can do a bad job of it and still fudge, but it seems a list of negative predictions might be more useful than a list of positive predictions for establishing a track record.

  • One thing to watch out for is that many people are reluctant to be put on the spot by having their forecasts recorded and judged. Tetlock ran into this with his study, he had to offer his subjects anonymity. (And it’s a good thing for them he did, since they did so badly.)

    The problem is that it might be that some of the best forecasters are most uncertain about their forecasts (Tetlock’s foxes) and therefore might be reluctant to be scored like this, while some of the worst forecasters are overconfident (hedgehogs) and would be overrepresented in the sample. While one might hope that this would be self-correcting to some extent (people who do badly get dropped) it could still reduce the average accuracy of forecasts.

  • Hal, it is hard to sympathize with people whose job it is to make forecasts, such as cancer doctors, who are reluctant to be “put on the spot.”

  • Pingback: Overcoming Bias : Track Records()