New Scientist Contest

New Scientist magazine set up a contest between new prediction techniques, including prediction markets:

We decided to see how the latest techniques would stand up to the task of predicting what people will buy. … Over the past four months, we have set four teams the task of trying to predict the sales of each issue of New Scientist, using some of the most promising and innovative approaches available. …

We had each hone their techniques on historic data – sales of New Scientist between 2006 and 2010 in UK stores. We also provided images of all the magazine covers. … The forecasters were free to study any other data they deemed useful. …

Data scientists … looked at the numbers and scratched their heads. … Bollen … wanted to examine the connection between tweets about New Scientist and the magazine’s sales. … Yet none emerged. … [Others] started by identifying and extrapolating long-term trends in our sales. …

Our second entrant – a “prediction market” – didn’t fare much better. These markets date back to work in the 1990s by Robin Hanson … Hanson realised that this “wisdom of the crowd” could be used to forecast other events. … Consensus Point … set up a prediction market involving New Scientist staff. Around 25 of us used an online interface to express how much confidence we had in each edition of the magazine. If we thought a cover had big potential to drive sales, for example, we would buy shares in it. … For this task, as a crowd we did not prove wise. …

A different crowd turned out to have more smarts. … CrowdFlower intern Diyang Tang started by asking workers to rate old covers. … She asked if they would pay $10 – almost twice the actual price – to buy the corresponding issue. The fraction of workers that said yes correlated with historic sales, so she applied this approach in the contest. …

In the last days of the contest, the “Turkers” were battling it out for first place with our final contestant, Sebastian Wernicke, a former bioinformatics statistician, … [who] applied a statistical algorithm to the task. … He ran a pixel-by-pixel analysis of each cover that revealed the distribution of different colours. He also considered the topics, wording and image type. Details of public holidays were thrown into the mix on the assumption that time off may affect reading habits. (more)

I have two points to make here. First, while the article gives no statistics, I’ve been able to see all the forecasts for four of the contestants, and can say that at the 10% level there are no statistically significant accuracy differences between any pair of contestants. They averaged 8.5% to 10.7% error over 17 forecasts.

Second, and more important, this was a test of prediction markets as methods, not as forums:

Methods are ways to do things; forums are ways to pick who decides what to do. … Good forums induce people to find good methods. … To me, prediction markets are mostly interesting as forums, not methods. … Averaging popular opinion may be an interesting method, as is statistical analysis, but comparing these does not evaluate prediction markets as forums. (more)

While the New Scientist article mentions me, this contest seems mostly inspired by James Surowiecki’s “wisdom of crowds” concept. To many, this concept says that an almost mystical wisdom on most any topic can be found merely by averaging the opinions of random folks who’ve hardly considered the subject. Had New Scientist asked me (which they didn’t), I’d have expressed little confidence that averaging top-of-the-head opinions by 25 random distracted staffers would beat the concentrated attention of professional collectors and analysts of data.

In this forecasting situation, the forum question is: what is our best forecast of next week’s sales, given access to the forecasts of the various available experts and their methods? Just because one method barely beat others in a particular contest doesn’t mean we should give it all of our weight in future forecasts. Judgment must be used to weigh the different track records of methods in different past contexts. A prediction market would be a fine forum in which to make such judgements, but it would work best if its participants had access to the forecasts made by those different methods.

The idea that you can gain subtle wisdom just by asking your question to a few dozen random isolated people is not remotely as interesting or useful an idea. Even if you use a prediction market to do the asking.

GD Star Rating
Tagged as: ,
Trackback URL:
  • billswift

    >To many, this concept says that an almost mystical wisdom on most any topic can be found merely by averaging the opinions of random folks who’ve hardly considered the subject.

    That is what I had heard about it, but when I finally got around to reading it I found a far more nuanced discussion of the idea, including when “crowd-sourcing” cannot work. I found it a useful summary of the research, I wish the references were better though.

  • Evan

    Speaking of prediction markets, Intrade shows that Ron Paul is in 1st place among all Republican candidates in terms of electability. His electability is currently at 93%, which is far greater than any of the other candidates:

    • I like your site Evan. Useful. Nice, thanks. Note you might want to discuss the issue that correlation does not necessarily equal causation. For example, it could be that in a world where Ron Paul is nominated, some highly significant event favoring one of his causes happens.

    • This makes me wish I was in an area where the legality of Intrade was clear. This seems pretty obviously to be a product of wishful thinking and attempts to shove Paul’s name out there by his fans. I’d love to trade on this and bilk them out of a lot of cash. I’ll applaud anyone else who does so.

      • Evan

        I don’t think this has anything to do with “wishful thinking” from Ron Paul fans trying to get his name out there.

        There isn’t an “electability” contract at Intrade. “Electability” is the odds that a candidate will become President if that candidate is nominated. Electability is computed by dividing presidential odds by nomination odds. Electability = Presidential Odds / Nomination odds.

      • Douglas Knight

        The system is biased in favor of long shots, probably intentionally. If you parked $27k for a year betting against Paul’s nomination, you could only make $630. And you have to pay intrade $60 for your account. If you only put up $1k, you wouldn’t move the price and you’d make the full 3%, but that wouldn’t cover account fees.

        So the numbers are useless for the calculation Evan wants to do. But I wouldn’t be surprised if Paul had high electability on a conditional market, for the reason David Pennock gives.

  • Vlad Tarko

    “I’d have expressed little confidence that averaging top-of-the-head opinions by 25 random distracters staffers would beat the concentrated attention of professional collectors and analysts of data.”

    Do you by any chance know how did the mode, rather than the average, fare?

    I ask this because I organized a few months ago a “wisdom of crowds” game with political science undergraduates in which they had to guess the percentage with which various proposed bills had been voted in Parliament, and while the average of their answers was, well, average, the mode won almost all the rounds.

    • Did you mean mode or median? Median also seems to do better than average.

      • Vlad Tarko


    • what if no two people gave the same answer?

  • Jim Giles

    I’m one of the reporters who worked on the New Scientist article. Thanks for your comments on our work.

    I’d like to clarify a couple of points for readers who might be confused about the market we used. Although we did not talk to Robin, we did ask the company at which he is chief scientist — Consensus Point — to run our prediction market. They agreed and very kindly put a lot of work into setting up and running the market. We chose Consensus Point partly because of Robin’s affiliation with the company and the role he played in the development of prediction markets.

    The Consensus Point system is quite sophisticated and gives traders a great deal of flexibility when it comes to making and recalibrating forecasts. So to describe the process as “averaging top-of-the-head opinions by 25 random distracters [sic] staffers” probably doesn’t do it justice.

    • Jim, I agree that Consensus Point software is sophisticated, and the approach it embodies is superior to other ways to “average” opinions. But the effect of such better averaging is weak compared to the effects of the participants’ degrees of expertise, incentives, and attention. (I corrected the typo in the post.)

      • Jim Giles

        These were all issues we discussed with Consensus Point prior to launch. To incentivize staff members, we gave a weekly prize to the person who profited most from their trades. We also offered a random-draw prize to encourage everyone to play, regardless of their success in previous weeks.

        And they were definitely not lacking in expertise! Everyone who played was a member of the core editorial team. They are all very familiar with the ups and down of New Scientist sales. They also understand the process by which we try and create covers that appeal to our readers.

        The article is now up on our website. Registration required, but it’s free to access:

      • Jim, happy to accept that your market participants are not amateurs, and that they were given some incentives. But the other contest participants were world class professionals, competing for the valuable publicity prize of New Scientist’s implicit endorsement. Those seem to me like pretty substantial differences.

  • If you were marketing to a business, what would be example of potential uses for a prediction market as a “forum” rather than method?

    • TGGP, you’d want to allow and encourage broad participation, by all the different people and groups that make, consume, or inform forecasts.

  • A new site for tracking pundit predictions. But as my link indicates, pundits have suffered no consequences for terrible predictions, so maybe folks just don’t care.