Gambling Save Science?

The latest New Yorker:

All sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. … This phenomenon … is occurring across a wide range of fields, from psychology to ecology. … The most likely explanation for the decline is … regression to the mean. … Biologist Michael Jennions argues that the decline effect is largely a product of publication bias. Biologist Richard Palmer suspects that an equally significant issue is the selective reporting of results. … The disturbing implication … is that a lot of extraordinary scientific data is nothing but noise. (more)

Academics are trustees of one of our greatest resources – the accumulated abstract knowledge of our ancestors. Academics appear to spend most of their time trying to add to that knowledge, and such effort is mostly empirical – seeking new interesting data. Alas, for the purpose of intellectual progress, most of that effort is wasted. And one of the main wastes is academics being too gullible about their and allies’ findings, and too skeptical about rivals’ findings.

Academics can easily coordinate to be skeptical of the findings of non-academics and low-prestige academics. Beyond that, each academic has an incentive to be gullible about his own findings, and his colleagues, journals, institutions, etc. share in that incentive as they gain status by association with him. The main contrary incentive is a fear that others will at some point dislike a findings’ conclusions, methods, or conflicts with other findings.

Academics in an area can often coordinate to declare their conclusions reasonable, methods sound, and conflicts minimal. If they do this, the main anti-guillibility incentives are outsiders’ current or future complaints. And if an academic area is prestigious and unified enough, it can resist and retaliate against complaints from academics in other fields, the way medicine now easily resists complaints from economics. Conflicts with future evidence can be dismissed by saying they did their best using the standards of the time.

It is not clear that these problems hurt academics’ overall reputation, or that academics care much to coordinate to protect it. But if academics wanted to limit the gullibility of academics in other fields, their main tool would be simple clear social norms, like those now encouraging public written archives, randomized trials, controlled experiments, math-expressed theories, and statistically-significant estimates.

Such norms remain insufficient, as great inefficiency remains. How can we do better? The article above concludes by suggesting:

We like to pretend that our experiments define the truth for us. But … when the experiments are done, we still have to choose what to believe.

True, but of little use. The article’s only other suggestion:

Schooler says “Every researcher should have to spell out, in advance, how many subjects they’re going to use, and what exactly they’re testing, and what constitutes a sufficient level of proof.”

Alas this still allows much publication bias, and one just cannot anticipate all reasonable ways to learn from data before it is collected. Arnold Kling suggests:

An imperfect but workable fix would be to standardize on a lower significance level. I think that for most ordinary research, the significance level ought to be set at .001.

I agree this would reduce excess gullibility, though at the expense of increasing excess skepticism. My proposal naturally involves prediction markets:

When possible, a paper whose main contribution is “interesting” empirical estimates should give a description of a much better (i.e., larger later) study that, if funded, would offer more accurate estimates. There should be funding to cover a small (say 0.001) chance of actually doing that better study, and to subsidize a conditional betting markets on its results, open to a large referee community with access to the paper for a min period (say a week).  A paper should not gain prestigious publication mainly on the basis of “interesting” estimates if current market estimates of better estimates do not support those estimates.

Theory papers containing proofs might similarly offer bets on whether errors will be found in them, and might also offer conditional bets on if more interesting and general results could be proven, if sufficient resources were put to the task.

More quotes from that New Yorker article:

The study turned [Schooler] into an academic star. … It has been cited more than four hundred times. … [But] it was proving difficult to replicate. …his colleagues assured him that such things happened all the time. … “I really should stop talking about this. But I can’t.” That’s because he is convinced he’s has stumbled on a serious problem, one that afflicts many of the most exciting new ideas in psychology. ….

Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “this is a very sensitive issue for scientists,” he says. … In recent years, publication bias has mostly been seen as a problem for clinical trials …But its becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology. …

“Once I realized that selective reporting is everywhere in science, I got quite depressed.” Palmer told me. … “I had no idea how widespread it is.” … “Some – perhaps many – cherished generalities are at best exaggerated … at at worst a collective illusion.” … John Ioannidis … says … “We waste a lot of money treating millions of patients and doing lots of follow up studies on other themes based on results that are misleading.”

GD Star Rating
Tagged as: ,
Trackback URL:
  • Upon getting to know academia, I was quite astounded to learn that ideas would cluster locally around institutions and groups. This is so much the norm that ideologies are often named by location or institution. I was also amazed by the fact that no one thought this somewhat improbable in and of itself.

    I’d be interested to know if any Robber’s Cave Experiments

    … have been run on academics. If they were forced into situations that required co-operation, how would this effect their views?

    Encouraging or even requiring intimacy between competing professionals might be one way to increase the quality of results. But this would require the implementation of institutionally directed social ‘procedures’ that would be designed to disrupt tribal allegiances. This would be so alien culturally that I can’t imagine it ever being implemented.

  • Sounds like the very definition of the leadup to a paradigm shift.

  • Jonas

    Very interesting. Thank you for sharing these insights!

  • cournot

    But the US already has among the least insular of academic communities. It is much less inbred in the sense that top colleges abroad are more likely to hire their own PhDs immediately upon graduation. Whereas the US has more of a mixing even if they’re still biased in favor of institutional clumps centered on a few places. But relative to most other countries, it’s more competitive and the labor market in academia is more open.

    That’s hardly ideal, but much better than the historical norm.

  • Curt Adams

    In the current journal environment, publication bias and selective reporting of results are basically the same thing if the scientist is not being blatantly deceptive (cherry-picking or lemon tossing individual patients/trials rather than experiments). I think the driving force is the demand for positive, original results for publication. Experiments with negative results or confirmations of previous results don’t get published. So researchers are basically forced to keep re-doing experiments until they get a p value below 0.05, at which time they publish. This means that published ideas are at most 20 times more likely to be true than a random idea for an experiment, and the effect is much less if studies are underpowered (they usually are). The vast majority of hypotheses are wrong, and a filter of 5 to 20-fold is not enough to overcome that.

    Raising the p value would help with inadvertent bias, but not eliminate it. I’ve gotten a p = 0.000057 result that turned out to be completely bogus. In addition, if the current requirement for positive original results continued, scientists would almost be forced to be deceptive, because honest scientists would go years without honest p = 0.001 results and that would end their careers.

  • Zvi Mowshowitz

    Prediction markets are wonderful things, but markets without sufficient incentives simply won’t trade. A market that refunds all trades with probability .999 has insufficient returns on investments even at zero research cost, zero transaction cost and success rate of 1, so why should anyone looking to make money bother with it? As noted, only because of a massive subsidy, at which point I’m doing arbitrage of the two sides to collect the subsidy rather than predicting anything.

    • One can let traders convert a dollar into a thousand conditional dollars, for a condition with a 1/1000 random chance of occurring. Alternatively, a single dollar can support a thousand conditional trades of a dollar, if one coordinates to make those thousand conditions mutually exclusive.

      • Zvi Mowshowitz

        I’m missing it. How am I going to get a valid opinion about a thousand different markets? If I’m allowed to bet more than $1 conditionally on each market with $1, why don’t me and my friend bet opposite sides of each market at 0.5 and collect the subsidy? Why don’t I do that anyway? I’d like to see an explicitly laid out structure that both provides no incentive to game the subsidy and provides sufficient incentive to bother. I can say from experience that gamblers very much do not like the idea of a bet that usually gives them their money back.

      • You read a paper and disagree; it’s estimate is .6, while your think that can at most be .5, so you think on a straight bet you could spend $40, expect to gain $50, for a profit of $10, or a 25% return. But instead there’s this conditional bet; what to do? You convert your $40 into $40K conditional on the bigger study being done, which you expect to be worth $50K after a study. Your overall expected value is 0.001*$50K = $50, again for a 25% return.

      • Jess Riedel

        I also don’t see how this solves the problem. Your expected return is $50, but don’t you lose $40 each of the 999 times that the study is not done? And the 1 time the study is done, you have zero payout 50% of the time and win $100K 50% of the time.

        So I can only even out my variance if I bet on around 1000 markets, which is unfeasible. And the only way that high of variance would be acceptable would be if the stakes were so low that I’m not risk adverse, in which case it wouldn’t be worth it for me to do the research to make the bet in the first place.

        Or am I confused about this conditional money?

  • cournot

    I agree with Zvi Mowshowitz. Robin likes to say markets for rare events are still better than nothing, but we do have prediction markets for some events that perform very poorly. A good example is betting on the Nobel for econ. This is a clearly defined, easy payoff, and easy to study outcome that is simpler to predict than 99% of the far off disaster or medical issues Robin wants studied. Yet the prediction markets have done poorly in this (whether official ones or office betting pools). We should assume that the value of prediction markets in many other things would have at least an order of magnitude worse signal noise ratio, if not being much lower. And this assumes away transactions costs, political interference, and other institutional risks.

    At what point will Robin admit that some prediction markets are too weak to give us much benefit?

    • What other sources gives more accurate estimates on the econ Nobel? And accuracy should increase with the subsidy; you get what you pay for. Prediction markets are a mechanism to pay for accuracy, and some problems are just hard for any mechanism.

  • I started skimming this post after the first paragraph or so, but the thought occured to me that rival disciplines may help the problem of peer gullibility coordination incentives.

    For example, the prestige of more rigorous neuroscience has risen relative to psychology, and, separately, psychology academics seem to have been a useful check on economics academics.

    Also, my sense is that the rise of computer science academics has served as a useful rival check on mathematical disciplines.

    In terms of institutional design, perhaps we should consciously encourage rival disciplines.

    Rival regional departments of the same disciplines might help too. For example, there seems to me to be rival economic academic schools of thought at least at Berkeley (history fundamentalism?) Chicago (rational agent-focused?), MIT (quant fundamentalism?) and GMU (libertarian shtick?). Not sure how productive regional rivalry is, to the extent it exists. I feel like something good had to have come out of MIT/Caltech hard science and engineering rivalry, and Harvard/MIT across the pond rivalry in a number of disciplines. But I can’t list the benefits as clearly as I can when disciplines invade each other.

    Pretty much every discipline should have a class of highly statistically literate skeptics (mini-Gelman clones), but they don’t seem particularly high profile to me across the board.

  • On prediction markets—They seem to point to no estimate of its results’ credibility. If you try to predict the Econ Nobel Prize winner, you get a result that looks no different from applying markets to easy events. You don’t know that the Nobel prediction is hard from the prediction market result, whereas common sense provides a rough indication of whether the prediction is hard or easy. In itself this doesn’t constitute a problem, but it does when common sense can reach its verdict only through the investigation that relying on prediction market results causes you to forgo.

    On science’s reliability problem—I wonder why the problem isn’t handled by competition between different laboratories or university departments. Each department, it would seem, has an interest in obtaining reliable results. To some extent, this competitive process seems ongoing; it’s not surprising that cold fusion came out of the University of Utah, not Harvard or Stanford. These institutions have an interest in instituting quality control of their research projects. In that light, I wonder if the problem of unreliable results isn’t overstated, in that results that came from some labs just isn’t taken seriously. Then, the problem would be waste of resources, rather than unreliable results.

  • US anti-gambling laws would appear to be a problem. What can be done?

  • Paul Christiano

    Recently Scott Aaronson “bet” heavily against a proposed P != NP proof. The TCS community’s response is interesting and somewhat related to this discussion.

    It also prompted some discussion there about gambling as a tool to improve the quality of theoretical research.

  • RJB

    Paul Christiano’s link above has a relevant observation:

    More importantly (and in contrast to 98% of claimed P≠NP proofs), even if this attempt fails, it seems to introduce some thought-provoking new ideas, particularly a connection between statistical physics and the first-order logic characterization of NP.

    The original post and content it is based on characterize the entire contribution of a published article as the truth value of some testable assertion. But most articles point the way to follow-up research by introducing new methods, new connections, and new testable hypotheses. It is very rare that some important decision is made on the basis of the claimed findings of a handful of papers. In fields where they are (medicine), there is a strong push to require publication of null results. So the authors overstate the cost of the problem.

    Also, what is the alternative? Popper’s falsificationism is rejected wholesale by practicing scientists because null-result papers almost never point a way toward future research. Just like the purpose of a chicken is to make more chickens, the purpose of an academic study is usually to lead to more studies. Maybe in 20 years we learn something.

  • Should we have a prediction market in how much the prediction market idea can be successfully extended in the myriad ways you propose, and then fund the development of those markets accordingly?

    Of course it makes sense to develop the ideas that you continue to flog out new and different applications of the idea. And part of the value of doing that is it gets others, including myself, to thinking, what are the limits and how can we determine them?

    So I ask you, how good at predicting corporate values are the stock market? Maybe this is already studied and reported?

    I can certainly speak qualitatively of the stock markets failures. It did not predict the total explosion of mortgage backed securities and the effect of that on numerous banks and other large tradable companies. It did not predict the over-valuation, the over-investment, in internet companies in the 1990s.

    I’d love to start seeing some education from you on the limits and failures of prediction markets intermixed with the amazing stream of abstruse proposals for their use.

  • Rachel

    You don’t want to publish papers that are most likely to be true. You want to publish papers that change your Bayesian priors the most.
    This system would screen out all novel ideas.