Tag Archives: Prediction Markets

Space Fund

At a space conference this last weekend, I was inspired to ponder the key problem I see regarding space colonization: how to recruit the great passion among so many to support and participate somehow in the topic today, while avoiding the vast waste that most likely results when that passion is directed to greatly premature near term projects.

Someday humans will colonize Antartica, the top of the Himalaya mountains, and the bottom of Earth oceans. But this won’t happen until these colonies are in the ballpark of cost-effective relative to more familiar locations. Quirky preferences or religious devotion can make a modest difference, but can’t overcome huge cost differences.

The same applies to colonization of space, a place much harder to colonize. While extra passion and quirky preferences can make a modest difference, mostly space colonization just can’t happen until near when it would be feasible given more ordinary motives. Efforts spent well before that time are mostly wasted, unless they are especially well targeted toward easing later efforts when such colonization is nearly feasible.

Here’s my decision-market idea for tying current passion to useful future efforts:

  1. Create a space fund that passively reinvests its assets to grow over a long period, a fund to which anyone can donate,
  2. Define an ex-post measure of successful space colonization. For example, LNYD = Log of number N people living in space for at least Y years by date D.
  3. For a modest fee, let anyone at anytime submit a proposal for how to spend the entire space fund. Any proposal is fair game, including transferring all of this fund to a new fund managed a new way.
  4. Create financial assets $LNYD that pay in proportional to this measure LNYD. (This may require setting a min & max value for the measure.) Let people trade these assets for cash, creating a LNYD market price.
  5. Each proposal submission is evaluated via a LNYD-based decision market. That is, for each proposal, on a particular unique pre-announced date, market speculators may trade LNYD assets for cash, in trades that are called off if (or if not) this proposal is approved. If the LNYD price difference between approval and non-approval is clearly positive, the proposal is approved. (The price difference threshold used here should reflect the fact that this system should reject a great many proposals, and approve only one.)

Under this system, people today who want to feel involved with space colonization can do so in three ways: 1) donate to the space fund, 2) develop and submit proposals for approval, or 3) trade in the markets that decide if to approve proposals. Later, when space colonization is nearly feasible, so that money spent can actually make a difference, these decision markets should make good choices about when and how to spend this fund to best create maximal colonization, according to the initially- chosen measure.

That’s the basic idea. Now here’s a variation, designed to avoid incentives for sabotage. When a donor donates $2 to the space fund, $1 goes into the fund, and this donor gets back a $LNYD asset whose value is guaranteed to fall within [$0,$1]. They can then trade this $LNYD asset in the decision markets. The remaining $(1-LNYD) asset is put into in a new space fund tied to a new goal defined regarding some date D’ after date D. In this system, only this new fund holds the $(1-LNYD) assets that might tempt a holder to sabotage the space colonization effort.

GD Star Rating
loading...
Tagged as: ,

Speculator-Chosen Immigrants

On immigration, the big political tug-o-war axis today is: more or less immigrants. But if you want tug the rope sideways, both to oppose polarization and to have a better chance of adding value, you might do better to focus on a perpendicular axis. Such as transferable citizenship, crime liability insurance for immigrants, or the topic of this post: who exactly to admit.

Even if we disagree on how many immigrants we want, we should agree that we want better immigrants. For example, good immigrants pay lots of taxes, volunteer to help their communities, don’t greatly harm our political or social equilibria, are not criminals, and impose fewer burdens on government benefit systems. Yes, we may disagree on the relative weights to assign to such features, but these disagreements seem relatively modest; there’s plenty of room here to work together to make better choices.

Note that, for the foreseeable future, we aren’t likely to approve for immigration more than a small fraction of all the outsiders who’d be willing to apply, if we were likely to accept them. So as a practical matter our efforts to pick candidates should focus on estimating well at the high tail of the distribution, for the candidates most likely to be best.

Note also that while a better way to select immigrants might induce us to accept more immigrants, those who are wary of this outcome tend to feel risk averse about such changes. Thus we should be looking for ways to pick immigrants that seem especially good at assuring skeptics that any one person is a good candidate.

To achieve all this, I suggest that we look at the prices of new financial assets that we can create to track the net tax revenue from each immigrant, conditional on their being admitted. Let me explain.

For every immigrant that we admit, the government could track how much that person pays in taxes each year, and also how much the government spends on that person via benefits whose costs can be measured individually. We could probably assign individual costs for schools, Medicare and Medicaid, prison, etc. For types of costs or benefits that can’t be measured individually, we’d could attribute to each immigrant some average value across citizens of their location and demographic type. When there are doubts, let us err in the direction of estimating higher costs, so that our measures are biased against immigrants adding value.

Okay, so now we have a conservative net financial value number for each immigrant for each year, a number that can be positive or negative. From these numbers we can create financial assets that pay annual dividends proportional to these numbers. If we let many people trade such assets, their market prices should give us decent estimates of the current present financial value of this stream of future revenue. And if we allow trading in such assets regarding people who apply to immigrate, with those trades being conditional on that person being admitted and coming, then such prices would estimate the net financial value of an immigration candidate conditional on their immigrating.

We could then admit the candidates for whom such estimates are highest; using a high threshold could ensure a high confidence that each immigrant is a net financial advantage. Those who are skeptical about particular immigrants, or about immigration in general, could insure themselves against bad immigration choices via trades in these markets, trades from which they expect to profit if their skepticism is accurate.

As usual, there are some subtitles to consider. For example, traders must be given some info on each candidate, and market estimates are more accurate the more info that traders are given. While I see no obvious legal requirement to do so, candidates could be assured some privacy. Immigration skeptics, however, might want to limit such privacy, to better ensure that each immigrant is a net gain.

Once immigrants become citizens, they of course have stronger privacy rights. While the government-calculated dividend values on them each year would reveal some info, there’s no need to reveal details of how that number was computed. To cut info revealed further, we might even wait and pay dividends as a single lump every five years.

In principle, a trader might acquire a large enough net negative stake in a particular immigrant that they have an incentive to hurt that immigrant, or at least to hurt that immigrant’s chances of achieving high net value. We might thus want to limit the size of negative stakes, at least after the immigrant comes, and among traders with opaque abilities to cause such harms.

The fact that net financial revenue can be both positive and negative complicates the asset creation. We might add some large constant to the financial numbers, to ensure that dividends paid have a positive sign. Or we might create two assets, one that pays dividends for the positive amounts, and one that pays for the negative amounts.

Some groups of candidates, such as a church, family, or firm, might be worth more if admitted as a unit together. We might then have trades on packages of assets for a whole group of candidates, trades conditional on their all being admitted as a unit. With a high enough estimated value of the group, we might then just admit such groups as units, even when we have doubts about individual members.

And that’s it, another pull-the-rope-sideways proposal designed to improve policy on a hot-button topic without taking a side on topic’s main dispute. Whether you want more or fewer immigrants, you should want better immigrants.

Added 1p 25Mar: If we could design individual measures of cultural assimilation and impact on cultural change, and assign dollar values to those measures, then we could include them in this proposed system.

GD Star Rating
loading...
Tagged as: , ,

Can We Trust Deliberation Priests?

In Science, academic “deliberation” experts offer a fix for our political ills:

Citizens to express their views … overabundance [of] … has been accompanied by marked decline in civility and argumentative complexity. Uncivil behavior by elites and pathological mass communication reinforce each other. How do we break this vicious cycle? …

All survey research … obtains evidence only about the capacity of the individual in isolation to reason about politics. … [But] even if people are bad solitary reasoners, they can be good group problem-solvers … Deliberative experimentation has generated empirical research that refutes many of the more pessimistic claims about the citizenry’s ability to make sound judgments.

Great huh? But there’s a catch:

Especially when deliberative processes are well-arranged: when they include the provision of balanced information, expert testimony, and oversight by a facilitator … These effects are not necessarily easy to achieve; good deliberation takes time and effort. Many positive effects are demonstrated most easily in face-to-face assemblies and gatherings, which can be expensive and logistically challenging at scale. Careful institutional design involv[es] participant diversity, facilitation, and civility norms …

A major improvement … might involve a randomly selected citizens’ panel deliberating a referendum question and then publicizing its assessments for and against a measure … problem is not social media per se but how it is implemented and organized. Algorithms for ranking sources that recognize that social media is a political sphere and not merely a social one could help. …

It is important to remain vigilant against incentives for governments to use them as symbolic cover for business as usual, or for well-financed lobby groups to subvert their operation and sideline their recommendations. These problems are recognized and in many cases overcome by deliberative practitioners and practice. … The prospects for benign deployment are good to the degree that deliberative scholars and practitioners have established relationships with political leaders and publics—as opposed to being turned to in desperation in a crisis.

So ordinary people are capable of fair and thoughtful deliberation, but only via expensive processes carefully managed in detail by, and designed well in advance by, proper deliberation experts with “established relationships with political leaders and publics.” That is, these experts must be free to pick the “balance” of info, experts, and participants included, and even who speaks when how, and these experts must be treated with proper respect and deference by the public and by political authorities.

No, they aren’t offering a simple well-tested mechanism (e.g., an auction) that we can apply elsewhere with great confidence that the deployed mechanism is the same as the one that they tested. Because what they tested instead was a mechanism with a lot of “knobs” that need context-specific turning; they tested the result of having particular experts use a lot of discretion to make particular political and info choices in particular contexts. They say that went well, and their academic peer reviewers (mostly the same people) agreed. So we shouldn’t worry that such experts would become corrupted if we gave them a lot more power.

This sure sounds like a priesthood to me. If we greatly empower and trust a deliberation priesthood, presumably overseen by these 20 high priest authors and their associates, they promise to create events wherein ordinary people talk much more reasonably, outputting policy recommendations that we could then all defer to with more confidence. At least if we trust them.

In contrast, I’ve been suggesting that we empower and trust prediction markets on key policy outcomes. We’ve tested such mechanisms a lot, including in contexts with strong incentives to corrupt them, and these mechanisms have far fewer knobs that must be set by experts with discretion. Which seems more trustworthy to me.

GD Star Rating
loading...
Tagged as: , , ,

Replication Markets Team Seeks Journal Partners for Replication Trial

An open letter, from myself and a few colleagues:

Recent attempts to systematically replicate samples of published experiments in the social and behavioral sciences have revealed disappointingly low rates of replication. Many parties are discussing a wide range of options to address this problem.

Surveys and prediction markets have been shown to predict, at rates substantially better than random, which experiments will replicate. This suggests a simple strategy by which academic journals could increase the rate at which their published articles replicate. For each relevant submitted article, create a prediction market estimating its chance of replication, and use that estimate as one factor in deciding whether to publish that article.

We the Replication Markets Team seek academic journals to join us in a test of this strategy. We have been selected for an upcoming DARPA program to create prediction markets for several thousand scientific replication experiments, many of which could be based on articles submitted to your journal. Each market would predict the chance of an experiment replicating. Of the already-published experiments in the pool, approximately one in ten will be sampled randomly for replication. (Whether submitted papers could be included in the replication pool depends on other teams in the program.) Our past markets have averaged 70% accuracy, and the work is listed at the Science Prediction Market Project page, and has been published in Science, PNAS, and Royal Society Open Science.

While details are open to negotiation, our initial concept is that your journal would tell potential authors that you are favorably inclined toward experiment article submissions that are posted at our public archive of submitted articles. By posting their article, authors declare that they have submitted their article to some participating journal, though they need not say which one. You tell us when you get a qualifying submission, we quickly tell you the estimated chance of replication, and later you tell us of your final publication decision.

At this point in time we seek only an expression of substantial interest that we can take to DARPA and other teams. Details that may later be negotiated include what exactly counts as a replication, whether archived papers reveal author names, how fast we respond with our replication estimates, what fraction of your articles we actually attempt to replicate, and whether you privately give us any other quality indicators obtained in your reviews to assist in our statistical analysis.

Please RSVP to: Angela Cochran, PM acochran@replicationmarkets.com 571 225 1450

Sincerely, the Replication Markets Team

Thomas Pfeiffer (Massey University)
Yiling Chen, Yang Liu, and Haifeng Xu (Harvard University)
Anna Dreber Almenberg & Magnus Johannesson (Stockholm School of Economics)
Robin Hanson & Kathryn Laskey (George Mason University)

Added 2p: We plan to forecast ~8,000 replications over 3 years, ~2,000 within the first 15 months.  Of these, ~5-10% will be selected for an actual replication attempt.

GD Star Rating
loading...
Tagged as: , ,

Toward An Honest Consensus

Star Trek original series featured a smart computer that mostly only answered questions; humans made key decisions. Near the start of Nick Chater’s book The Mind Is Flat, which I recently started, he said early AI visions were based on the idea of asking humans questions, and then coding their answers into a computer, which might then answer the same range of questions when asked. But to the surprise of most, typical human beliefs turned out to be much too unstable, unreliable, incoherent, and just plain absent to make this work. So AI research turned to other approaches.

Which makes sense. But I’m still inspired by that ancient vision of an explicit accessible shared repository of what we all know, even if that isn’t based on AI. This is the vision that to varying degrees inspired encyclopedias, libraries, internet search engines, prediction markets, and now, virtual assistants. How can we all coordinate to create and update an accessible shared consensus on important topics?

Yes, today our world contains many social institutions that, while serving other functions, also function to create and update a shared consensus. While we don’t all agree with such consensus, it is available as a decent first estimate for those who do not specialize in a topic, facilitating an intellectual division of labor.

For example: search engines, academia, news media, encyclopedias, courts/agencies, consultants, speculative markets, and polls/elections. In many of these institutions, one can ask questions, find closest existing answers, induce the creation of new answers, induce elaboration or updates of older answers, induce resolution of apparent inconsistencies between existing answers, and challenge existing answers with proposed replacements. Allowed questions often include meta questions such as origins of, translations of, confidence in, and expected future changes in, other questions.

These existing institutions, however, often seem weak and haphazard. They often offer poor and biased incentives, use different methods for rather similar topics, leave a lot of huge holes where no decent consensus is offered, and tolerate many inconsistencies in the answers provided by different parts. Which raises the obvious question: can we understand the advantages and disadvantages of existing methods in different contexts well enough to suggest which ones we should use more or less where, or to design better variations, ones that offer stronger incentives, lower costs, and wider scope and integration?

Of course computers could contribute to such new institutions, but they needn’t be the only or even main parts. And of course the idea here is to come up with design candidates to test first at small scales, scaling up only when results look promising. Design candidates will seem more promising if we can at least imagine using them more widely, and if they are based on theories that plausibly explain failings of existing institutions. And of course I’m not talking about pressuring people to follow a consensus, just to make a consensus available to those who want to use it.

As usual, a design proposal should roughly describe what acts each participant can do when, what they each know about what others have done, and what payoffs they each get for the main possible outcomes of typical actions. All in a way that is physically, computationally, and financially feasible. Of course we’d like a story about why equilibria of such a system are likely to produce accurate answers fast and at low cost, relative to other possible systems. And we may need to also satisfy hidden motives, the unacknowledged reasons for why people actually like existing institutions.

I have lots of ideas for proposals I’d like the world to consider here. But I realized that perhaps I’ve neglected calling attention to the problem itself. So I’ve written this post in the hope of inspiring some of you with a challenge: can you help design (or test) new robust ways to create and update a social consensus?

GD Star Rating
loading...
Tagged as: , ,

Choose: Allies or Accuracy

Imagine that person A tells you something flattering or unflattering about person B. All else equal, this should move your opinion of B in the direction of A’s claim. But how far? If you care mainly about accuracy, you’ll want to take into account base rates on claimers A and targets B, as well as more specific specific signs on the accuracy of A regarding B.

But what if you care mainly about seeming loyal to your allies? Well if A is more of your ally than is B, as suggested by your listening now to A, then you’ll be more inclined to just believe A, no matter what. Perhaps if other allies give a different opinion, you’ll have to decide which of your allies to back. But if not, trying to be accurate on B mainly risks seeming disloyal to A and you’re other allies.

It seems that humans tend to just believe gossip like this, mostly ignoring signs of accuracy:

The trustworthiness of person-related information … can vary considerably, as in the case of gossip, rumors, lies, or “fake news.” …. Social–emotional information about the (im)moral behavior of previously unknown persons was verbally presented as trustworthy fact (e.g., “He bullied his apprentice”) or marked as untrustworthy gossip (by adding, e.g., allegedly), using verbal qualifiers that are frequently used in conversations, news, and social media to indicate the questionable trustworthiness of the information and as a precaution against wrong accusations. In Experiment 1, spontaneous likability, deliberate person judgments, and electrophysiological measures of emotional person evaluation were strongly influenced by negative information yet remarkably unaffected by the trustworthiness of the information. Experiment 2 replicated these findings and extended them to positive information. Our findings demonstrate a tendency for strong emotional evaluations and person judgments even when they are knowingly based on unclear evidence. (more; HT Rolf Degen)

I’ve toyed with the idea of independent juries to deal with Twitter mobs. Pay a random jury a modest amount to 1) read a fuller context and background on the participants, 2) talk a bit among themselves, and then 3) choose which side they declare as more reasonable. Sure sometimes the jury would hang, but often they could give a voice of reason that might otherwise be drown out by loud participants. I’d have been willing to pay for this a few times. And once juries became a standard thing, we could lower costs via making prediction markets on jury verdicts if a case were randomly choose for jury evaluation.

But alas, I’m skeptical that most would care much about what an independent jury is estimated to say, or even about what it actually says. For that, they’d have to care more about truth than about showing support for allies.

GD Star Rating
loading...
Tagged as: , ,

Can Foundational Physics Be Saved?

Thirty-four years ago I left physics with a Masters degree, to start a nine year stint doing AI/CS at Lockheed and NASA, followed by 25 years in economics. I loved physics theory, and given how far physics had advanced over the previous two 34 year periods, I expected to be giving up many chances for glory. But though I didn’t entirely leave (I’ve since published two physics journal articles), I’ve felt like I dodged a bullet overall; physics theory has progressed far less in the last 34 years, mainly because data dried up:

One experiment after the other is returning null results: No new particles, no new dimensions, no new symmetries. Sure, there are some anomalies in the data here and there, and maybe one of them will turn out to be real news. But experimentalists are just poking in the dark. They have no clue where new physics may be to find. And their colleagues in theory development are of no help.

In her new book Lost in Math, theoretical physicist Sabine Hossenfelder describes just how bad things have become. Previously, physics foundations theorists were disciplined by a strong norm of respecting the theories that best fit the data. But with less data, theorists have turned to mainly judging proposed theories via various standards of “beauty” which advocates claim to have inferred from past patterns of success with data. Except that these standards (and their inferences) are mostly informal, change over time, differ greatly between individuals and schools of thought, and tend to label as “ugly” our actual best theories so far.

Yes, when data is truly scarce, theory must suggest where to look, and so we must choose somehow among as-yet-untested theories. The worry is that we may be choosing badly:

During experiments, the LHC creates about a billion proton-proton collisions per second. … The events are filtered in real time and discarded unless an algorithm marks them as interesting. From a billion events, this “trigger mechanism” keeps only one hundred to two hundred selected ones. … That CERN has spent the last ten years deleting data that hold the key to new fundamental physics is what I would call the nightmare scenario.

One bad sign is that physicists have consistently, confidently, and falsely told each other and the public that big basic progress was coming soon: Continue reading "Can Foundational Physics Be Saved?" »

GD Star Rating
loading...
Tagged as: , , ,

How To Fund Prestige Science

How can we best promote scientific research? (I’ll use “science” broadly in this post.) In the usual formulation of the problem, we have money and status that we could distribute, and they have time and ability that they might apply. They know more than we do, but we aren’t sure who is how good, and they may care more about money and status than about achieving useful research. So we can’t just give things to anyone who claims they would use it to do useful science. What can we do? We actually have many options. Continue reading "How To Fund Prestige Science" »

GD Star Rating
loading...
Tagged as: , ,

Bottom Boss Prediction Market

Sheryl Sandberg and Rachel Thomas write:

Women continue to be vastly underrepresented at every level. For women of color, it’s even worse. Only about one in five senior leaders is a woman, and just one in twenty-five is a woman of color. Progress isn’t just slow—it’s stalled.

Women are doing their part. They’ve been earning more bachelor’s degrees than men for over 30 years. They’re asking for promotions and negotiating salaries as often as men. And contrary to conventional wisdom, women are not leaving the workforce at noticeably higher rates to care for children—or for any other reason. …

At the entry level, when one might expect an equal number of men and women to be hired, men get 54% of jobs, while women get 46%. At the next step, the gap widens. Women are less likely to be hired and promoted into manager-level jobs; for every 100 men promoted to manager, only 79 women are. As a result, men end up holding 62% of manager positions, while women hold only 38%.

The fact that men are far more likely than women to get that first promotion to manager is a red flag. It’s highly doubtful that there are significant enough differences in the qualifications of entry-level men and women to explain this degree of disparity. More probably, it’s because of performance bias. Research shows that both men and women overestimate men’s performance and underestimate women’s. …

By the manager level, women are too far behind to ever catch up. … Even if companies want to hire more women into senior leadership—and many do—there are simply far fewer of them with the necessary qualifications. The entire race has become rigged because of those unfair advantages at the start. …

Companies need to take bold steps to make the race fair. This begins with establishing clear, consistent criteria for hiring and reviews, because when they are based on subjective impressions or preferences, bias creeps in. Companies should train employees so they understand how unconscious bias can affect who’s hired and promoted—and who’s not. (more)

I can’t hold much hope for cutting all subjective judgements from hiring. Most jobs are just too complicated to reduce all useful candidate quality signals to objective measures. But I do have hopes of creating less biased subjective judgements, via (you guessed it) prediction markets. In the rest of this post, I’ll outline a vision for how that could work.

If the biggest problem is that not enough women are promoted to their first-level (bottom boss) management position, then let’s make prediction markets focused on that problem. For whatever consortium of firms join my proposed new system, let them post to that consortium a brief description of all candidates being considered for each of their open first-level management jobs. Include gender and color as two of the descriptors.

Then let all employees within that consortium bet, for any job candidate X, on the chance that if candidate X is put into a particular management job, then that candidate will be promoted to a one-level-higher management job within Y (five?) years. (Each firm decides what higher level jobs count, at that firm or another firm. And perhaps the few employees likely to actually hire those higher-level managers should not be allowed to bet anyone who they might hire.)

Firms give each consortium employee say $100 to bet in these markets, and let them keep any winnings. (Firms perhaps also create a few specialist traders with much larger stakes and access to deep firm statistics on hiring and performance.) Giving participants their stake avoids anti-gambling law problems, and focusing on first level managers avoids insider trading law problems.

It would also help to give participants easy ways to bet on all pools of job candidates with particular descriptors. Say all women, or all women older than thirty years old. Then participants who thought market odds to be biased against identifiable classes of people could easily bet on such beliefs, and correct for such biases. Our long experience with prediction markets suggests that such biases would likely be eliminated; but if not at least participants would be financially rewarded and punished for seeing versus not seeing the light.

It seems reasonable for these firms to apply modest pressure on those filling these positions to put substantial weight on these market price estimates about candidates. Yes, there may be hiring biases at higher levels, but if the biggest problem is at the bottom boss level then these markets should at least help. Yes, suitability for further promotions is not the only consideration in picking a manager, but it is an important one, and it subsumes many other important considerations. And it is a nice clearly visible indicator that is common across many divisions and firms. It is hard to see firms going very wrong because they hired managers a bit more likely to be promoted if hired.

In sum: if the hiring of bottom bosses is now biased against women, but a prediction market on promotion-if-hired would be less biased, then pushing hirers to put more weight on these market estimates should result in less bias against women. Compared to simply pushing hirers to hire more women, this approach should be easier for hirers to accept, as they’d more acknowledge the info value of the market estimates.

GD Star Rating
loading...
Tagged as: ,

Bets As Signals of Article Quality

On October 15, I talked at the Rutgers Foundation of Probability Seminar on Uncommon Priors Require Origin Disputes. While visiting that day, I talked to Seminar host Harry Crane about how the academic replication crisis might be addressed by prediction markets, and by his related proposal to have authors offer bets supporting their papers. I mentioned to him that I’m now part of a project that will induce a great many replication attempts, set up prediction markets about them beforehand, and that we would love to get journals to include our market prices in their review process. (I’ll say more about this when I can.)

When the scheduled speaker for the next week slot of the seminar cancelled, Crane took the opening to give a talk comparing our two approaches (video & links here). He focused on papers for which it is possible to make a replication attempt and said “We don’t need journals anymore.” That is, he argued that we should not use which journal is willing to publish a paper as a signal of paper quality, but that we should use the signal of what bet authors offer in support of their paper.

That author betting offer would specify what would count as a replication attempt, and as a successful replication, and include an escrowed amount of cash and betting odds which set the amount a challenger must put up to try to win that escrowed amount. If the replication fails, the challenger wins these two amounts minus the cost of doing a replication attempt; if not the authors win that amount.

In his talk, Crane contrasted his approach with an alternative in which the quality signal would be the odds in an open prediction market of replication, conditional on a replication attempt. In comparing the two, Crane seems to think that authors would not usually participate in setting market odds. He lists three advantages of author bets over betting market odds: 1) Authors bets give authors better incentives to produce non-misleading papers. 2) Market odds are less informed because market participants know less that paper authors about their paper. 3) Relying on market odds allows a mistaken consensus to suppress surprising new results. In the rest of this post, I’ll respond.

I am agnostic on whether journal quality should remain as a signal of article quality. If that signal goes away, then we are talking about what other signals can be how useful. And if that signal remains, then we can be talking about other signals that might be used by journals to make their decisions, and also by other observers to evaluate article quality. But whatever signals are used, I’m pretty sure that most observers will demand that a few simple easy-to-interpret signals be distilled from the many complex signals available. Tenure review committees, for example, will need signals nearly as simple as journal prestige.

Let me also point out that these two approaches of market odds or author bets can also be applied to non-academic articles, such as news articles, and also to many other kinds of quality signals. For example, we could have author or market bets on how many future citations or how much news coverage an article will get, whether any contained math proofs will be shown to be in error, whether any names or dates will be shown to have been misreported in the article, or whether coding errors will be found in supporting statistical analysis. Judges or committees might also evaluate overall article quality at some distant future date. Bets on any of these could be conditional on whether serious attempts were made in that category.

Now, on the comparison between author and market bets, an obvious alternative is to offer both author bets and market odds as signals, either to ultimate readers or to journals reviewing articles. After all, it is hard to justify suppressing any potentially useful signal. If a market exists, authors could easily make betting offers via that market, and those offers could easily be flagged for market observers to take as signals.

I see market odds as easier for observers to interpret than author bet offers. First, authors bets are more easily corrupted via authors arranging for a collaborating shill to accept their bet. Second, it can be hard for observers to judge how author risk-aversion influences author odds, and how replication costs and author wealth influences author bet amounts. For market odds, in contrast, amounts take care of themselves via opposing bets, and observers need only judge any overall differences in wealth and risk-aversion between the two sides, differences that tend to be smaller, vary less, and matter less for market odds.

Also, authors would usually participate in any open market on their paper, giving those authors bet incentives and making market odds include their info. The reason authors will bet is that other participants will expect authors to bet to puff up their odds, and so other participants will push the odds down to compensate. So if authors don’t in fact participate, the odds will tend to look bad for them. Yes, market odds will be influenced by views others than those of authors, but when evaluating papers we want our quality signals to be based on the views of people other than paper authors. That is why we use peer review, after all.

When there are many possible quality metrics on which bets could be offered, article authors are unlikely to offer bets on all of them. But in an open market, anyone could offer to bet on any of those metrics. So an open market could show estimates regarding any metric for which anyone made an offer to bet. This allows a much larger range of quality metrics to be available under the market odds approach.

While the simple market approach merely bets conditional on someone attempting a replication attempt, an audit lottery variation that I’ve proposed would instead use a small fixed percentage of amounts bet to pay for replication attempts. If the amount collected is insufficient, then it and all betting amounts are gambled so that either a sufficient amount is created, or all these assets disappear.

Just as 5% significance is treated as a threshold today for publication evaluation, I can imagine particular bet reliability thresholds being important for evaluating article quality. News articles might even be filtered or show simple icons based on a reliability category. In this case the betting offer and market options would more tend to merge.

For example, an article might be considered “good enough” if it had no more than a 5% chance of being wrong, if checked. The standard for checking this might be if anyone was currently offering to bet at 19-1 odds in favor of reliability. For as long as the author or anyone else maintained such offers, the article would qualify as at least that reliable, and so could be shown via filters or icons as meeting that standard. For this approach we don’t need to support a market with varying prices; we only need to keep track of how much has been offered and accepted on either side of this fixed odds bet.

GD Star Rating
loading...
Tagged as: , ,