Tag Archives: Prediction Markets

Replication Markets Team Seeks Journal Partners for Replication Trial

An open letter, from myself and a few colleagues:

Recent attempts to systematically replicate samples of published experiments in the social and behavioral sciences have revealed disappointingly low rates of replication. Many parties are discussing a wide range of options to address this problem.

Surveys and prediction markets have been shown to predict, at rates substantially better than random, which experiments will replicate. This suggests a simple strategy by which academic journals could increase the rate at which their published articles replicate. For each relevant submitted article, create a prediction market estimating its chance of replication, and use that estimate as one factor in deciding whether to publish that article.

We the Replication Markets Team seek academic journals to join us in a test of this strategy. We have been selected for an upcoming DARPA program to create prediction markets for several thousand scientific replication experiments, many of which could be based on articles submitted to your journal. Each market would predict the chance of an experiment replicating. Of the already-published experiments in the pool, approximately one in ten will be sampled randomly for replication. (Whether submitted papers could be included in the replication pool depends on other teams in the program.) Our past markets have averaged 70% accuracy, and the work is listed at the Science Prediction Market Project page, and has been published in Science, PNAS, and Royal Society Open Science.

While details are open to negotiation, our initial concept is that your journal would tell potential authors that you are favorably inclined toward experiment article submissions that are posted at our public archive of submitted articles. By posting their article, authors declare that they have submitted their article to some participating journal, though they need not say which one. You tell us when you get a qualifying submission, we quickly tell you the estimated chance of replication, and later you tell us of your final publication decision.

At this point in time we seek only an expression of substantial interest that we can take to DARPA and other teams. Details that may later be negotiated include what exactly counts as a replication, whether archived papers reveal author names, how fast we respond with our replication estimates, what fraction of your articles we actually attempt to replicate, and whether you privately give us any other quality indicators obtained in your reviews to assist in our statistical analysis.

Please RSVP to: Angela Cochran, PM acochran@replicationmarkets.com 571 225 1450

Sincerely, the Replication Markets Team

Thomas Pfeiffer (Massey University)
Yiling Chen, Yang Liu, and Haifeng Xu (Harvard University)
Anna Dreber Almenberg & Magnus Johannesson (Stockholm School of Economics)
Robin Hanson & Kathryn Laskey (George Mason University)

Added 2p: We plan to forecast ~8,000 replications over 3 years, ~2,000 within the first 15 months.  Of these, ~5-10% will be selected for an actual replication attempt.

GD Star Rating
loading...
Tagged as: , ,

Toward An Honest Consensus

Star Trek original series featured a smart computer that mostly only answered questions; humans made key decisions. Near the start of Nick Chater’s book The Mind Is Flat, which I recently started, he said early AI visions were based on the idea of asking humans questions, and then coding their answers into a computer, which might then answer the same range of questions when asked. But to the surprise of most, typical human beliefs turned out to be much too unstable, unreliable, incoherent, and just plain absent to make this work. So AI research turned to other approaches.

Which makes sense. But I’m still inspired by that ancient vision of an explicit accessible shared repository of what we all know, even if that isn’t based on AI. This is the vision that to varying degrees inspired encyclopedias, libraries, internet search engines, prediction markets, and now, virtual assistants. How can we all coordinate to create and update an accessible shared consensus on important topics?

Yes, today our world contains many social institutions that, while serving other functions, also function to create and update a shared consensus. While we don’t all agree with such consensus, it is available as a decent first estimate for those who do not specialize in a topic, facilitating an intellectual division of labor.

For example: search engines, academia, news media, encyclopedias, courts/agencies, consultants, speculative markets, and polls/elections. In many of these institutions, one can ask questions, find closest existing answers, induce the creation of new answers, induce elaboration or updates of older answers, induce resolution of apparent inconsistencies between existing answers, and challenge existing answers with proposed replacements. Allowed questions often include meta questions such as origins of, translations of, confidence in, and expected future changes in, other questions.

These existing institutions, however, often seem weak and haphazard. They often offer poor and biased incentives, use different methods for rather similar topics, leave a lot of huge holes where no decent consensus is offered, and tolerate many inconsistencies in the answers provided by different parts. Which raises the obvious question: can we understand the advantages and disadvantages of existing methods in different contexts well enough to suggest which ones we should use more or less where, or to design better variations, ones that offer stronger incentives, lower costs, and wider scope and integration?

Of course computers could contribute to such new institutions, but they needn’t be the only or even main parts. And of course the idea here is to come up with design candidates to test first at small scales, scaling up only when results look promising. Design candidates will seem more promising if we can at least imagine using them more widely, and if they are based on theories that plausibly explain failings of existing institutions. And of course I’m not talking about pressuring people to follow a consensus, just to make a consensus available to those who want to use it.

As usual, a design proposal should roughly describe what acts each participant can do when, what they each know about what others have done, and what payoffs they each get for the main possible outcomes of typical actions. All in a way that is physically, computationally, and financially feasible. Of course we’d like a story about why equilibria of such a system are likely to produce accurate answers fast and at low cost, relative to other possible systems. And we may need to also satisfy hidden motives, the unacknowledged reasons for why people actually like existing institutions.

I have lots of ideas for proposals I’d like the world to consider here. But I realized that perhaps I’ve neglected calling attention to the problem itself. So I’ve written this post in the hope of inspiring some of you with a challenge: can you help design (or test) new robust ways to create and update a social consensus?

GD Star Rating
loading...
Tagged as: , ,

Choose: Allies or Accuracy

Imagine that person A tells you something flattering or unflattering about person B. All else equal, this should move your opinion of B in the direction of A’s claim. But how far? If you care mainly about accuracy, you’ll want to take into account base rates on claimers A and targets B, as well as more specific specific signs on the accuracy of A regarding B.

But what if you care mainly about seeming loyal to your allies? Well if A is more of your ally than is B, as suggested by your listening now to A, then you’ll be more inclined to just believe A, no matter what. Perhaps if other allies give a different opinion, you’ll have to decide which of your allies to back. But if not, trying to be accurate on B mainly risks seeming disloyal to A and you’re other allies.

It seems that humans tend to just believe gossip like this, mostly ignoring signs of accuracy:

The trustworthiness of person-related information … can vary considerably, as in the case of gossip, rumors, lies, or “fake news.” …. Social–emotional information about the (im)moral behavior of previously unknown persons was verbally presented as trustworthy fact (e.g., “He bullied his apprentice”) or marked as untrustworthy gossip (by adding, e.g., allegedly), using verbal qualifiers that are frequently used in conversations, news, and social media to indicate the questionable trustworthiness of the information and as a precaution against wrong accusations. In Experiment 1, spontaneous likability, deliberate person judgments, and electrophysiological measures of emotional person evaluation were strongly influenced by negative information yet remarkably unaffected by the trustworthiness of the information. Experiment 2 replicated these findings and extended them to positive information. Our findings demonstrate a tendency for strong emotional evaluations and person judgments even when they are knowingly based on unclear evidence. (more; HT Rolf Degen)

I’ve toyed with the idea of independent juries to deal with Twitter mobs. Pay a random jury a modest amount to 1) read a fuller context and background on the participants, 2) talk a bit among themselves, and then 3) choose which side they declare as more reasonable. Sure sometimes the jury would hang, but often they could give a voice of reason that might otherwise be drown out by loud participants. I’d have been willing to pay for this a few times. And once juries became a standard thing, we could lower costs via making prediction markets on jury verdicts if a case were randomly choose for jury evaluation.

But alas, I’m skeptical that most would care much about what an independent jury is estimated to say, or even about what it actually says. For that, they’d have to care more about truth than about showing support for allies.

GD Star Rating
loading...
Tagged as: , ,

Can Foundational Physics Be Saved?

Thirty-four years ago I left physics with a Masters degree, to start a nine year stint doing AI/CS at Lockheed and NASA, followed by 25 years in economics. I loved physics theory, and given how far physics had advanced over the previous two 34 year periods, I expected to be giving up many chances for glory. But though I didn’t entirely leave (I’ve since published two physics journal articles), I’ve felt like I dodged a bullet overall; physics theory has progressed far less in the last 34 years, mainly because data dried up:

One experiment after the other is returning null results: No new particles, no new dimensions, no new symmetries. Sure, there are some anomalies in the data here and there, and maybe one of them will turn out to be real news. But experimentalists are just poking in the dark. They have no clue where new physics may be to find. And their colleagues in theory development are of no help.

In her new book Lost in Math, theoretical physicist Sabine Hossenfelder describes just how bad things have become. Previously, physics foundations theorists were disciplined by a strong norm of respecting the theories that best fit the data. But with less data, theorists have turned to mainly judging proposed theories via various standards of “beauty” which advocates claim to have inferred from past patterns of success with data. Except that these standards (and their inferences) are mostly informal, change over time, differ greatly between individuals and schools of thought, and tend to label as “ugly” our actual best theories so far.

Yes, when data is truly scarce, theory must suggest where to look, and so we must choose somehow among as-yet-untested theories. The worry is that we may be choosing badly:

During experiments, the LHC creates about a billion proton-proton collisions per second. … The events are filtered in real time and discarded unless an algorithm marks them as interesting. From a billion events, this “trigger mechanism” keeps only one hundred to two hundred selected ones. … That CERN has spent the last ten years deleting data that hold the key to new fundamental physics is what I would call the nightmare scenario.

One bad sign is that physicists have consistently, confidently, and falsely told each other and the public that big basic progress was coming soon: Continue reading "Can Foundational Physics Be Saved?" »

GD Star Rating
loading...
Tagged as: , , ,

How To Fund Prestige Science

How can we best promote scientific research? (I’ll use “science” broadly in this post.) In the usual formulation of the problem, we have money and status that we could distribute, and they have time and ability that they might apply. They know more than we do, but we aren’t sure who is how good, and they may care more about money and status than about achieving useful research. So we can’t just give things to anyone who claims they would use it to do useful science. What can we do? We actually have many options. Continue reading "How To Fund Prestige Science" »

GD Star Rating
loading...
Tagged as: , ,

Bottom Boss Prediction Market

Sheryl Sandberg and Rachel Thomas write:

Women continue to be vastly underrepresented at every level. For women of color, it’s even worse. Only about one in five senior leaders is a woman, and just one in twenty-five is a woman of color. Progress isn’t just slow—it’s stalled.

Women are doing their part. They’ve been earning more bachelor’s degrees than men for over 30 years. They’re asking for promotions and negotiating salaries as often as men. And contrary to conventional wisdom, women are not leaving the workforce at noticeably higher rates to care for children—or for any other reason. …

At the entry level, when one might expect an equal number of men and women to be hired, men get 54% of jobs, while women get 46%. At the next step, the gap widens. Women are less likely to be hired and promoted into manager-level jobs; for every 100 men promoted to manager, only 79 women are. As a result, men end up holding 62% of manager positions, while women hold only 38%.

The fact that men are far more likely than women to get that first promotion to manager is a red flag. It’s highly doubtful that there are significant enough differences in the qualifications of entry-level men and women to explain this degree of disparity. More probably, it’s because of performance bias. Research shows that both men and women overestimate men’s performance and underestimate women’s. …

By the manager level, women are too far behind to ever catch up. … Even if companies want to hire more women into senior leadership—and many do—there are simply far fewer of them with the necessary qualifications. The entire race has become rigged because of those unfair advantages at the start. …

Companies need to take bold steps to make the race fair. This begins with establishing clear, consistent criteria for hiring and reviews, because when they are based on subjective impressions or preferences, bias creeps in. Companies should train employees so they understand how unconscious bias can affect who’s hired and promoted—and who’s not. (more)

I can’t hold much hope for cutting all subjective judgements from hiring. Most jobs are just too complicated to reduce all useful candidate quality signals to objective measures. But I do have hopes of creating less biased subjective judgements, via (you guessed it) prediction markets. In the rest of this post, I’ll outline a vision for how that could work.

If the biggest problem is that not enough women are promoted to their first-level (bottom boss) management position, then let’s make prediction markets focused on that problem. For whatever consortium of firms join my proposed new system, let them post to that consortium a brief description of all candidates being considered for each of their open first-level management jobs. Include gender and color as two of the descriptors.

Then let all employees within that consortium bet, for any job candidate X, on the chance that if candidate X is put into a particular management job, then that candidate will be promoted to a one-level-higher management job within Y (five?) years. (Each firm decides what higher level jobs count, at that firm or another firm. And perhaps the few employees likely to actually hire those higher-level managers should not be allowed to bet anyone who they might hire.)

Firms give each consortium employee say $100 to bet in these markets, and let them keep any winnings. (Firms perhaps also create a few specialist traders with much larger stakes and access to deep firm statistics on hiring and performance.) Giving participants their stake avoids anti-gambling law problems, and focusing on first level managers avoids insider trading law problems.

It would also help to give participants easy ways to bet on all pools of job candidates with particular descriptors. Say all women, or all women older than thirty years old. Then participants who thought market odds to be biased against identifiable classes of people could easily bet on such beliefs, and correct for such biases. Our long experience with prediction markets suggests that such biases would likely be eliminated; but if not at least participants would be financially rewarded and punished for seeing versus not seeing the light.

It seems reasonable for these firms to apply modest pressure on those filling these positions to put substantial weight on these market price estimates about candidates. Yes, there may be hiring biases at higher levels, but if the biggest problem is at the bottom boss level then these markets should at least help. Yes, suitability for further promotions is not the only consideration in picking a manager, but it is an important one, and it subsumes many other important considerations. And it is a nice clearly visible indicator that is common across many divisions and firms. It is hard to see firms going very wrong because they hired managers a bit more likely to be promoted if hired.

In sum: if the hiring of bottom bosses is now biased against women, but a prediction market on promotion-if-hired would be less biased, then pushing hirers to put more weight on these market estimates should result in less bias against women. Compared to simply pushing hirers to hire more women, this approach should be easier for hirers to accept, as they’d more acknowledge the info value of the market estimates.

GD Star Rating
loading...
Tagged as: ,

Bets As Signals of Article Quality

On October 15, I talked at the Rutgers Foundation of Probability Seminar on Uncommon Priors Require Origin Disputes. While visiting that day, I talked to Seminar host Harry Crane about how the academic replication crisis might be addressed by prediction markets, and by his related proposal to have authors offer bets supporting their papers. I mentioned to him that I’m now part of a project that will induce a great many replication attempts, set up prediction markets about them beforehand, and that we would love to get journals to include our market prices in their review process. (I’ll say more about this when I can.)

When the scheduled speaker for the next week slot of the seminar cancelled, Crane took the opening to give a talk comparing our two approaches (video & links here). He focused on papers for which it is possible to make a replication attempt and said “We don’t need journals anymore.” That is, he argued that we should not use which journal is willing to publish a paper as a signal of paper quality, but that we should use the signal of what bet authors offer in support of their paper.

That author betting offer would specify what would count as a replication attempt, and as a successful replication, and include an escrowed amount of cash and betting odds which set the amount a challenger must put up to try to win that escrowed amount. If the replication fails, the challenger wins these two amounts minus the cost of doing a replication attempt; if not the authors win that amount.

In his talk, Crane contrasted his approach with an alternative in which the quality signal would be the odds in an open prediction market of replication, conditional on a replication attempt. In comparing the two, Crane seems to think that authors would not usually participate in setting market odds. He lists three advantages of author bets over betting market odds: 1) Authors bets give authors better incentives to produce non-misleading papers. 2) Market odds are less informed because market participants know less that paper authors about their paper. 3) Relying on market odds allows a mistaken consensus to suppress surprising new results. In the rest of this post, I’ll respond.

I am agnostic on whether journal quality should remain as a signal of article quality. If that signal goes away, then we are talking about what other signals can be how useful. And if that signal remains, then we can be talking about other signals that might be used by journals to make their decisions, and also by other observers to evaluate article quality. But whatever signals are used, I’m pretty sure that most observers will demand that a few simple easy-to-interpret signals be distilled from the many complex signals available. Tenure review committees, for example, will need signals nearly as simple as journal prestige.

Let me also point out that these two approaches of market odds or author bets can also be applied to non-academic articles, such as news articles, and also to many other kinds of quality signals. For example, we could have author or market bets on how many future citations or how much news coverage an article will get, whether any contained math proofs will be shown to be in error, whether any names or dates will be shown to have been misreported in the article, or whether coding errors will be found in supporting statistical analysis. Judges or committees might also evaluate overall article quality at some distant future date. Bets on any of these could be conditional on whether serious attempts were made in that category.

Now, on the comparison between author and market bets, an obvious alternative is to offer both author bets and market odds as signals, either to ultimate readers or to journals reviewing articles. After all, it is hard to justify suppressing any potentially useful signal. If a market exists, authors could easily make betting offers via that market, and those offers could easily be flagged for market observers to take as signals.

I see market odds as easier for observers to interpret than author bet offers. First, authors bets are more easily corrupted via authors arranging for a collaborating shill to accept their bet. Second, it can be hard for observers to judge how author risk-aversion influences author odds, and how replication costs and author wealth influences author bet amounts. For market odds, in contrast, amounts take care of themselves via opposing bets, and observers need only judge any overall differences in wealth and risk-aversion between the two sides, differences that tend to be smaller, vary less, and matter less for market odds.

Also, authors would usually participate in any open market on their paper, giving those authors bet incentives and making market odds include their info. The reason authors will bet is that other participants will expect authors to bet to puff up their odds, and so other participants will push the odds down to compensate. So if authors don’t in fact participate, the odds will tend to look bad for them. Yes, market odds will be influenced by views others than those of authors, but when evaluating papers we want our quality signals to be based on the views of people other than paper authors. That is why we use peer review, after all.

When there are many possible quality metrics on which bets could be offered, article authors are unlikely to offer bets on all of them. But in an open market, anyone could offer to bet on any of those metrics. So an open market could show estimates regarding any metric for which anyone made an offer to bet. This allows a much larger range of quality metrics to be available under the market odds approach.

While the simple market approach merely bets conditional on someone attempting a replication attempt, an audit lottery variation that I’ve proposed would instead use a small fixed percentage of amounts bet to pay for replication attempts. If the amount collected is insufficient, then it and all betting amounts are gambled so that either a sufficient amount is created, or all these assets disappear.

Just as 5% significance is treated as a threshold today for publication evaluation, I can imagine particular bet reliability thresholds being important for evaluating article quality. News articles might even be filtered or show simple icons based on a reliability category. In this case the betting offer and market options would more tend to merge.

For example, an article might be considered “good enough” if it had no more than a 5% chance of being wrong, if checked. The standard for checking this might be if anyone was currently offering to bet at 19-1 odds in favor of reliability. For as long as the author or anyone else maintained such offers, the article would qualify as at least that reliable, and so could be shown via filters or icons as meeting that standard. For this approach we don’t need to support a market with varying prices; we only need to keep track of how much has been offered and accepted on either side of this fixed odds bet.

GD Star Rating
loading...
Tagged as: , ,

Challenge Coins

Imagine you are a king of old, afraid of being assassinated. Your king’s guard tells you that they’ve got you covered, but too many kings have been killed in your area over the last century for you to feel that safe. How can you learn of your actual vulnerability, and of how to cut it?

Yes, you might make prediction markets on if you will be killed, and make such markets conditional on various policy changes, to find out which policies cut your chance of being killed. But in this post I want to explore a different solution.

I suggest that you auction off challenge coins at some set rate, say one a month. Such coins can be resold privately to others, so that you don’t know who holds them. Each coin gives the holder the right to try a mock assassination. If a coin holder can get within X meters of you, with a clear sight of a vulnerable part of you, then they need only raise their coin into the air and shout “Challenge Coin”, and they will be given N gold coins in exchange for that challenge coin, and then set free. And if they are caught where they should not be then they can pay the challenge coin to instead be released from whatever would be the usual punishment for that intrusion. If authorities can find the challenge coin, such as on their person, this trade can be required.

Now for a few subtleties. Your usual staff and people you would ordinarily meet are not eligible to redeem challenge coins. Perhaps you’d also want to limit coin redeemers to people who’d be able to kill someone; perhaps if requested they must kill a cute animal with their bare hands. If a successful challenger can explain well enough how they managed to evade your defenses, then they might get 2N gold coins or more. Coin redeemers may be suspected of being tied to a real assassin, and so they must agree to opening themselves to being investigated in extra depth, and if still deemed suspicious enough they might be banned from ever using a challenge coin again. But they still get their gold coins this time. Some who issue challenge coins might try to hide transmitters in them, but holders could just wrap coins in aluminum foil and dip them in plastic to limit odor emissions. I estimate that challenge coins are legal, and not prohibited by asset or gambling regulations.

This same approach could be used by the TSA to show everyone how hard it is to slip unapproved items past TSA security. Just reveal your coin and your unapproved item right after you exit TSA security. You could also use this approach to convince an audience that your accounting books are clean; anyone with a coin can point to any particular item in your books, and demand an independent investigation of that item, paid for at the coin-issuer’s expense. If the item is found to not be as it should, the coin holder gets the announced prize; otherwise they just lose their coin.

In general, issuing challenge coins is a way to show an audience what rate of detection success (or security failure) results from what level of financial incentives. (The audience will need to see data on the rates of coin sales and successful vs. unsuccessful redemptions.) We presume that the larger the payoff to a successful challenge, the higher the fraction of coins that successfully result in a detection (or security failure).

GD Star Rating
loading...
Tagged as: ,

News Accuracy Bonds

Fake news is a type of yellow journalism or propaganda that consists of deliberate misinformation or hoaxes spread via traditional print and broadcast news media or online social media. This false information is mainly distributed by social media, but is periodically circulated through mainstream media. Fake news is written and published with the intent to mislead in order to damage an agency, entity, or person, and/or gain financially or politically, often using sensationalist, dishonest, or outright fabricated headlines to increase readership, online sharing, and Internet click revenue. (more)

One problem with news is that sometimes readers who want truth instead read (or watch) and believe news that is provably false. That is, a news article may contain claims that others are capable of proving wrong to a sufficiently expert and attentive neutral judge, and some readers may be fooled against their wishes into believing such news.

Yes, news can have other problems. For example, there can be readers who don’t care much about truth, and who promote false news and its apparent implications. Or readers who do care about truth may be persuaded by writing whose mistakes are too abstract or subtle to prove wrong now to a judge. I’ve suggested prediction markets as a partial solution to this; such markets could promote accurate consensus estimates on many topics which are subtle today, but which will eventually become sufficiently clear.

In this post, however, I want to describe what seems to me the simple obvious solution to the more basic problem of truth-seekers believing provably-false news: bonds. Those who publish or credential an article could offer bonds payable to anyone who shows their article to be false. The larger the bond, the higher their declared confidence in their article. With standard icons for standard categories of such bonds, readers could easily note the confidence associated with each news article, and choose their reading and skepticism accordingly.

That’s the basic idea; the rest of this post will try to work out the details.

While articles backed by larger bonds should be more accurate on average, the correlation would not be exact. Statistical models built on the dataset of bonded articles, some of which eventually pay bonds, could give useful rough estimates of accuracy. To get more precise estimates of the chance that an article will be shown to be in error, one could create prediction markets on the chance that an individual article will pay a bond, with initial prices set at statistical model estimates.

Of course the same article should have a higher chance of paying a bond when its bond amount is larger. So even better estimates of article accuracy would come from prediction markets on the chance of paying a bond, conditional on a large bond amount being randomly set for that article (for example) a week after it is published. Such conditional estimates could be informative even if only one article in a thousand is chosen for such a very large bond. However, since there are now legal barriers to introducing prediction markets, and none to introducing simple bonds, I return to focusing on simple bonds.

Independent judging organizations would be needed to evaluate claims of error. A limited set of such judging organizations might be certified to qualify an article for any given news bond icon. Someone who claimed that a bonded article was in error would have to submit their evidence, and be paid the bond only after a valid judging organization endorsed their claim.

Bond amounts should be held in escrow or guaranteed in some other way. News firms could limit their risk by buying insurance, or by limiting how many bonds they’d pay on all their articles in a given time period. Say no more than two bonds paid on each day’s news. Another option is to have the bond amount offered be a function of the (posted) number of readers of an article.

As a news article isn’t all true or false, one could distinguish degrees of error. A simple approach could go sentence by sentence. For example, a bond might pay according to some function of the number of sentences (or maybe sentence clauses) in an article shown to be false. Alternatively, sentence level errors might be combined to produce categories of overall article error, with bonds paying different amounts to those who prove each different category. One might excuse editorial sentences that do not intend to make verifiable newsy claims, and distinguish background claims from claims central to the original news of the article. One could also distinguish degrees of error, and pay proportional to that degree. For example, a quote that is completely made up might be rated as completely false, while a quote that is modified in a way that leaves the meaning mostly the same might count as a small fractional error.

To the extent that it is possible to verify partisan slants across large sets of articles, for example in how people or organizations are labeled, publishers might also offer bonds payable to those than can show that a publisher has taken a consistent partisan slant.

A subtle problem is: who pays the cost to judge a claim? On the one hand, judges can’t just offer to evaluate all claims presented to them for free. But on the other hand, we don’t want to let big judging fees stop people from claiming errors when errors exist. To make a reasonable tradeoff, I suggest a system wherein claim submissions include a fee to pay for judging, a fee that is refunded double if that claim is verified.

That is, each bond specifies a maximum amount it will pay to judge that bond, and which judging organizations it will accept.  Each judging organization specifies a max cost to judge claims of various types. A bond is void if no acceptable judge’s max is below that bond’s max. Each submission asking to be paid a bond then submits this max judging fee. If the judges don’t spend all of their max judging fee evaluating this case, the remainder is refunded to the submission. It is the amount of the fee that the judges actually spend that will be refunded double if the claim is supported. A public dataset of past bonds and their actual judging fees could help everyone to estimate future fees.

Those are the main subtleties that I’ve considered. While there are ways to set up such a system better or worse, the basic idea seems robust: news publishers who post bonds payable if their news is shown to be wrong thereby credential their news as more accurate. This can allow readers to more easily avoid believing provably-false news.

A system like that I’ve just proposed has long been feasible; why hasn’t it been adopted already? One possible theory is that publishers don’t offer bonds because that would remind readers of typical high error rates:

The largest accuracy study of U.S. papers was published in 2007 and found one of the highest error rates on record — just over 59% of articles contained some type of error, according to sources. Charnley’s first study [70 years ago] found a rate of roughly 50%. (more)

If bonds paid mostly for small errors, then bond amounts per error would have to be very small, and calling reader attention to a bond system would mostly remind them of high error rates, and discourage them from consuming news.

However, it seems to me that it should be possible to aggregate individual article errors into measures of overall article error, and to focus bond payouts on the most mistaken “fake news” type articles. That is, news error bonds should mostly pay out on articles that are wrong overall, or at least quite misleading regarding their core claims. Yes, a bit more judgment might be required to set up a system that can do this. But it seems to me that doing so is well within our capabilities.

A second possible theory to explain the lack of such a system today is the usual idea that innovation is hard and takes time. Maybe no one ever tried this with sufficient effort, persistence, or coordination across news firms. So maybe it will finally take some folks who try this hard, long, and wide enough to make it work. Maybe, and I’m willing to work with innovation attempts based on this second theory.

But we should also keep a third theory in mind: that most news consumers just don’t care much for accuracy. As we discuss in our book The Elephant in the Brain, the main function of news in our lives may be to offer “topics in fashion” that we each can all riff on in our local conversations, to show off our mental backpacks of tools and resources. For that purpose, it doesn’t much matter how accurate is such news. In fact, it might be easier to show off with more fake news in the mix, as we can then show off by commenting on which news is fake. In this case, news bonds would be another example of an innovation designed to give us more of what we say we want, which is not adopted because we at some level know that we have hidden motives and actually want something else.

GD Star Rating
loading...
Tagged as: , , ,

My Market Board Game

From roughly 1989 to 1992, I explored the concept of prediction markets (which I then called “idea futures”) in part via building and testing a board game. I thought I’d posted details on my game before, but searching I couldn’t find anything. So here is my board game.

The basic idea is simple: people bet on “who done it” while watching a murder mystery. So my game is an add-on to a murder mystery movie or play, or a game like How to Host a Murder. While watching the murder mystery, people stand around a board where they can reach in with their hands to directly and easily make bets on who done it. Players start with the same amount of money, and in the end whoever has the most money wins (or maybe wins in proportion to their winnings).

Together with Ron Fischer (now deceased) I tested this game a half-dozen times with groups of about a dozen. People understood it quickly and easily, and had fun playing. I looked into marketing the game, but was told that game firms do not listen to proposals by strangers, as they fear being sued later if they came out with a similar game. So I set the game aside.

All I really need to explain here is how mechanically to let people bet on who done it. First, you give all players 200 in cash, and from then on they have access to a “bank” where they can always make “change”:

Poker chips of various colors can represent various amounts, like 1, 5, 10, 25, or 100. In addition, you make similar-sized cards that read things like “Pays 100 if Andy is guilty.” There are different cards for different suspects in the murder mystery, each suspect with a different color card. The “bank” allows exchanges like trading two 5 chips for one 10 chip, or trading 100 in chips for a set of all the cards, one for each suspect.

Second, you make a “market board”, which is an array of slots, each of which can hold either chips or a card. If there were six suspects, an initial market board could look like this:

For this board, each column is about one of the six suspects, and each row is about one of these ten prices: 5,10,15,20,25,30,40,50,60,80. Here is a blow-up of one slot in the array:

Every slot holds either the kind of card for that column, or it holds the amount of chips for that row. The one rule of trading is: for any slot, anyone can swap the right card for the right amount of chips, or can make the opposite swap, depending on what is in the slot at the moment. The swap must be immediate; you can’t put your hand over a slot to reserve it while you get your act together.

This could be the market board near the end of the game:

Here the players have settled on Pam as most likely to have done it, and Fred as least likely. At the end, players compute their final score by combining their cash in chips with 100 for each winning card; losing cards are worth nothing. And that’s the game!

For the initial board, fill a row with chips when the number of suspects times the price for that row is less than 100, and fill that row with cards otherwise. Any number of suspects can work for the columns, and any ordered set of prices between 0 and 100 can work for the rows. I made my boards by taping together clear-color M512 boxes from Tap Plastics, and taping printed white paper on tops around the edge.

Added 30Aug: Here are a few observations about game play. 1) Many, perhaps most, players were so engaged by “day trading” in this market that they neglected to watch and think enough about the murder mystery. 2) You can allow players to trade directly with each other, but players show little interest in doing this. 3) Players found it more natural to buy than to sell. As a result, prices drifted upward, and often the sum of the buy prices for all the suspects was over 100. An electronic market maker could ensure that such arbitrage opportunities never arise, but in this mechanical version some players specialized in noticing and correcting this error.

Added 31Aug: A twitter poll picked a name for this game: Murder, She Bet.

Added 9Sep: Expert gamer Zvi Mowshowitz gives a detailed analysis of this game. He correctly notes that incentives for accuracy are lower in the endgame, though I didn’t notice substantial problems with endgame accuracy in the trials I ran.

GD Star Rating
loading...
Tagged as: , ,