New Scientist magazine set up a contest between new prediction techniques, including prediction markets:
We decided to see how the latest techniques would stand up to the task of predicting what people will buy. … Over the past four months, we have set four teams the task of trying to predict the sales of each issue of New Scientist, using some of the most promising and innovative approaches available. …
We had each hone their techniques on historic data – sales of New Scientist between 2006 and 2010 in UK stores. We also provided images of all the magazine covers. … The forecasters were free to study any other data they deemed useful. …
Data scientists … looked at the numbers and scratched their heads. … Bollen … wanted to examine the connection between tweets about New Scientist and the magazine’s sales. … Yet none emerged. … [Others] started by identifying and extrapolating long-term trends in our sales. …
Our second entrant – a “prediction market” – didn’t fare much better. These markets date back to work in the 1990s by Robin Hanson … Hanson realised that this “wisdom of the crowd” could be used to forecast other events. … Consensus Point … set up a prediction market involving New Scientist staff. Around 25 of us used an online interface to express how much confidence we had in each edition of the magazine. If we thought a cover had big potential to drive sales, for example, we would buy shares in it. … For this task, as a crowd we did not prove wise. …
A different crowd turned out to have more smarts. … CrowdFlower intern Diyang Tang started by asking workers to rate old covers. … She asked if they would pay $10 – almost twice the actual price – to buy the corresponding issue. The fraction of workers that said yes correlated with historic sales, so she applied this approach in the contest. …
In the last days of the contest, the “Turkers” were battling it out for first place with our final contestant, Sebastian Wernicke, a former bioinformatics statistician, … [who] applied a statistical algorithm to the task. … He ran a pixel-by-pixel analysis of each cover that revealed the distribution of different colours. He also considered the topics, wording and image type. Details of public holidays were thrown into the mix on the assumption that time off may affect reading habits. (more)
I have two points to make here. First, while the article gives no statistics, I’ve been able to see all the forecasts for four of the contestants, and can say that at the 10% level there are no statistically significant accuracy differences between any pair of contestants. They averaged 8.5% to 10.7% error over 17 forecasts.
Second, and more important, this was a test of prediction markets as methods, not as forums:
Methods are ways to do things; forums are ways to pick who decides what to do. … Good forums induce people to find good methods. … To me, prediction markets are mostly interesting as forums, not methods. … Averaging popular opinion may be an interesting method, as is statistical analysis, but comparing these does not evaluate prediction markets as forums. (more)
While the New Scientist article mentions me, this contest seems mostly inspired by James Surowiecki’s “wisdom of crowds” concept. To many, this concept says that an almost mystical wisdom on most any topic can be found merely by averaging the opinions of random folks who’ve hardly considered the subject. Had New Scientist asked me (which they didn’t), I’d have expressed little confidence that averaging top-of-the-head opinions by 25 random distracted staffers would beat the concentrated attention of professional collectors and analysts of data.
In this forecasting situation, the forum question is: what is our best forecast of next week’s sales, given access to the forecasts of the various available experts and their methods? Just because one method barely beat others in a particular contest doesn’t mean we should give it all of our weight in future forecasts. Judgment must be used to weigh the different track records of methods in different past contexts. A prediction market would be a fine forum in which to make such judgements, but it would work best if its participants had access to the forecasts made by those different methods.
The idea that you can gain subtle wisdom just by asking your question to a few dozen random isolated people is not remotely as interesting or useful an idea. Even if you use a prediction market to do the asking.
A new site for tracking pundit predictions. But as my link indicates, pundits have suffered no consequences for terrible predictions, so maybe folks just don't care.
what if no two people gave the same answer?