12 Comments

I agree that this approach isn't very attractive unless one can find simple, standard, and useful ways to decide how to replicate a paper.

Expand full comment

I see that could be useful, but it doesn't seem especially well-suited to particular task discussed here of giving a quality measure on a replicable paper.

Expand full comment

An alternative incentive structure for dealing with this issue is provided and enforced at http://kn-x.com. White papers are linked from the site's FAQ.

Expand full comment

To put the worry I'm having more succinctly I am concerned that interesting scientific hypotheses don't often offer easy answers as to what counts as a replication/falsification. As such authors/markets will be forced to define bets far more narrowly than the question of true scientific interest so rather than rewarding studies that are likely showing a deep and interesting generality we reward studies that, when exactly duplicated, replicate.

For instance, consider an early article suggesting that some kind of antibiotic is cancer protective. Do we reward a challenger who can show that in some context it actually increases cancer rates? To avoid letting the challenger just cherry pick some context where it fails we would have to define the claim so narrowly (in such and such populations given such and such dosages etc.. etc..) that it no longer captures the main point of scientific interest: i.e., that generally speaking this chemical induces cancer.

And this is a really simple claim. Consider claims about the minimum wage or immigration.

Expand full comment

The big problem I see with doing this is defining the exact nature of the replication.

When we consider a journal article what we are interested in isn't whether the effect will replicate under exactly the same conditions as the original effect (indeed we already know the answer since that was the original study) but whether it will replicate in some broader range of interesting cases.

Now one might hope that one could solve this problem by having the author specify the replication conditions which count. Or having some market specify replication conditions to bid on. However, this then seems to push us back to asking experts to judge how strong a result the paper shows since we will need experts to judge just how strong the conditions attached to the bet are so we can figure out what meaning to give high odds that a replication will succeed.

In short the difficulty of determining how much a given bet is 'worth' actually makes a betting market farther removed from giving a simple signal of quality as evaluating the strength of a given replication bet seems to require just as much if not more expertise as evaluating the original strength of the article.

As far as bets about citation counts and other attempts to bet on quality I worry that by changing the profile of a paper these bets themselves affect the probabilities they are betting on. This is a problem because then the probabilities no longer track the original metric of how many citations would this article have gotten in the normal course of events. Why not just rely on actual citation counts instead? Unlike other betting markets it doesn't seem like probabilities of getting various citation counts will actually reveal the property (paper quality) we really want to know about.

Expand full comment

Since you mentioned using citation counts as an indicator of quality, I thought I'd relay Andrew Gelman's post on a study that continues to get plenty of citations even after a replication failed (including lit reviews which cite it without mentioning the replication failure), while the replication paper gets comparatively few.

https://andrewgelman.com/20...

Expand full comment

I don't see it as a problem if authors typically do other small scale trials to check their results.

Expand full comment

Because most bets will have to be pretty small, in which case, why bother making them?

Expand full comment

But why is this a problem?

Expand full comment

Let's price this out. Say 15% of studies won't replicate because there is no real effect. A cheap study will catch 2/3 of those, or 10%, as well as 15% more of real but small effects that don't replicate due to sample size error. If you filter that 25% *again* with another small-scale study, you end up with 6.66% no effect/won't replicate when scaled up, 2.5% real but small effect/will replicate when scaled up.

Now -- you have run 125 small-scale tests, and you've ended up with ~9 bets, for which you will net 4.16 units. So the bet amount can't be more than 125/4.16 = 30x the cost of the small-scale studies or author ends up losing money.

Expand full comment

I don't see how you could prevent that, but I'm not sure you need to. This possibility does complicate interpreting the author's bet, however, in a way that applies less to market odds.

Expand full comment

In the "author bets" scenario, how would you prevent the counterparty from running a secret+fast+cheap mini-replication attempt on mechanical turk before publicly accepting the bet and running a bigger, proper replication study?

Expand full comment