Big Impact Isn’t Big Data

Nov 10, 2016

A common heuristic for estimating the quality of something is: what has it done for me lately? For example, you could estimate the quality of a restaurant via a sum or average of how much you’ve enjoyed your meals there. Or you might weight recent visits more, since quality may change over time. Such methods are simple and robust, but they aren’t usually the best. For example, if you know of others who ate at that restaurant, their meal enjoyment is also data, data that can improve your quality estimate. Yes, those other people might have different meal priorities, and that may be a reason to give their meals less weight than your meals. But still, their data is useful.

15 Comments

Stephen Diamond

I formulated it as a two-tailed null for simplicity. Stating it as one-tailed is trivial. (The probability of a Trump win was not as assigned by the prediction market or less.)

Expand full comment

But as you formulated it, rejecting the null here would be saying that it is untrue that both p(A)=market odds and p(B)=market odds. It doesn't tell you about the probability that at least one was right, for instance.

And that means that as a hypothesis test, it tells you not to assume that both of two predictions will be exactly correct. Is that what you claimed originally?

Expand full comment

Stephen Diamond

The null hypothesis is that the probability assigned by the prediction market is the true probability. Since your concern is with the form of the null, we don't need to look at joint probability. If a prediction market assigned a .05 probability to a Trump win (this is hypothetical), and Trump won, that would mean that random fluctuations cannot account for the prediction market's failure to predict within standard levels of significance.

You're looking in the wrong place. If there's a problem with my argument, it concerns the universe to which the test pertains (the cherry-picking problem), not combining probabilities or the mechanics of hypothesis testing.

Expand full comment

They were miscalibrated, which is different than incorrect. You'd want a proper scoring rule to differentiate between good and bad predictions - but my point was that hypothesis tests require a null hypothesis, and simply saying the joint probability is < 0.05 doesn't make anything falsified.

Expand full comment

OK, let’s specify this test a bit more clearly. What’s your null hypothesis? That the pair of predictions is no better than chance?

If you can show me a coherent null, specify what test you would use, and show how it can be applied reasonably to this case, I'll stop being "smug about my purported knowledge of probability."

Expand full comment

If all of those happen, the predictions were not that good.

Expand full comment

Overcoming Bias Commenter

Subsidize? No. Not by taxing me, thank you. Legalize? Yes! Let the "market" decide!

Expand full comment

Stephen Diamond

OK, but however you combine the probabilities, it's going to be below .1. If my natural experiement argument is sound, significance at the .1 level is still cause for concern.

Expand full comment

As someone who spends a lot of time trying to predict financial markets, let me assure you that uncorrelated errors is basically never a good assumption.

Expand full comment

Stephen Diamond

Don't be so smug about your purported knowledge of probability. The very simple answer to your "counter-example" is that the appropriate test is one-tailed.

The logic is very simple and requires no deep knowledge of probability or philosophical commitment to frequentism. The only possible objection to my reasoning is that the errors are correlated. (I think uncorrelated errors is a due assumption if you accept the logic of prediction markets.)

Expand full comment

We should subsidize markets until the marginal cost of adding new info equals the social marginal value. That should on average be larger subsidies for more important questions.

Expand full comment

That's not how probability works, even if you accept the paradigm of null-hypothesis-significance-testing.

Simple counterexample; if markets had predicted 11 events as 75% likely, and all happen, the conjunction is ~0.042. Your procedure says that invalidates them.

Expand full comment

Stephen Diamond

Brexit and Trump represent a natural experiment: the two most significant recent electoral events by far (as measured by popular interest - personally, I think their importance is overstated). The prediction markets gave Trump a .1 probability and Brexit a .25 probability. The conjunction of these probabilities is .025. This natural experiment rejects the null hypothesis beyond the scientifically standard .05 level. (I base prediction market stats on Predictwise, which draws on various betting markets.)

[By "natural experiment" I refer to the indications that these aren't cherry-picked outcomes. To the extent that an experiment is natural, it is less cherry-picked than published studies.]

Expand full comment

The other key point is that binary outcomes aren't a very information-rich / useful way of judging a prediction model. Predicting 51%-49% when the result is 49%-51% is a better fit than predicting 44%-56%, even though the second got the "answer" correct.

Expand full comment

Overcoming Bias Commenter

Since as you noted people in practice weight 'more important' events more heavily when judging the quality of a prediction machine, should we subsidize prediction markets whose outcomes are important to predict?

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts