Tag Archives: Prediction Markets

Could Gambling Save Psychology?

A new PNAS paper:

Prediction markets set up to estimate the reproducibility of 44 studies published in prominent psychology journals and replicated in The Reproducibility Project: Psychology predict the outcomes of the replications well and outperform a survey of individual forecasts. … Hypotheses being tested in psychology typically have low prior probabilities of being true (median, 9%). … Prediction markets could be used to obtain speedy information about reproducibility at low cost and could potentially even be used to determine which studies to replicate to optimally allocate limited resources into replications. (more; see also coverage at 538AtlanticScience, Gelman)

We’ve had enough experiments with prediction markets over the years, both lab and field experiments, to not be at all surprised by these findings of calibration and superior accuracy. If so, you might ask: what is the intellectual contribution of this paper?

When one is trying to persuade groups to try prediction markets, one encounters consistent skepticism about experiment data that is not on topics very close to the proposed topics. So one value of this new data is to help persuade academic psychologists to use prediction markets to forecast lab experiment replications. Of course for this purpose the key question is whether enough academic psychologists were close enough to the edge of making such markets a continuing practice that it was worth the cost of a demonstration project to create closely related data, and so push them over the edge.

I expect that most ordinary academic psychologists need stronger incentives than personal curiosity to participate often enough in prediction markets on whether key psychology results will be replicated (conditional on such replication being attempted). Such additional incentives could come from:

  1. direct monetary subsidies for market trading, such as via subsidized market makers,
  2. traders with higher than average trading records bragging about it on their vitae, and getting hired etc. more because of that, or
  3. prediction market prices influencing key decisions such as what articles get published where, who gets what grants, or who gets what jobs.

For example, imagine that one or more top psychology journals used prediction market chances that an empirical paper’s main result(s) would be confirmed (conditional on an attempt) as part of deciding whether to publish that paper. In this case the authors of a paper and their rivals would have incentives to trade in such markets, and others could be enticed to trade if they expected trades by insiders and rivals alone to produce biased estimates. This seems a self-reinforcing equilibrium; if good people think hard before participating in such markets, others could see those market prices as deserving of attention and deference, including in the journal review process.

However, the existing equilibrium also seems possible, where there are few or small markets on such topics off to the side, markets that few pay much attention to and where there is little resources or status to be won. This equilibrium arguably results in less intellectual progress for any given level of research funding, but of course progress-inefficient academic equilibria are quite common.

Bottom line: someone is going to have to pony up some substantial scarce academic resources to fund an attempt to move this part of academia to a better equilibria. If whomever funded this study didn’t plan on funding this next step, I could have told them ahead of time that they were mostly wasting their money in funding this study. This next move won’t happen without a push.

GD Star Rating
a WordPress rating system
Tagged as: ,

Intelligence Futures

For many purposes, such as when choosing if to admit someone to a college, we care about both temporary features, who they are now, and permanent features, who they have the ultimate potential to become. One of those features is intelligence; we care about how smart they are now, and about how smart they have the potential to become.

A standard result in intelligence research is that intelligence as measured late in life, such as at age fifty, is a much better indicator of ultimate potential than is intelligence measured at early ages. That is, environments have a stronger influence over measured intelligence of the young, relative to the old.

So if you want a measure of an ultimate potential, such as to use in college admissions, then instead of using current tests like SAT scores, you’d do better to use a good prediction of future test scores, such as predictions of related tests at age fifty.

Now of course colleges could try to do this prediction themselves. They could collect a dataset of people where they have late life test scores and also many possible early predictors of those future test scores, and then fit a statistical model to all that. But such data is hard to collect, this approach limits you to predictors available in your dataset, and the world changes, so that models that work on old data may not predict new data.

Let me propose a prediction market solution: create prediction markets on late life test scores. To make sure people try hard enough later, collect a fund to pay out to the person later in proportion to their late life test score. Then open (and subsidize) a market today in that future test score, and post any associated info that this person will allow. Speculators could then use that info, and anything else they could figure out, to guess the future test score. Finally, use market prices as estimate of future test scores, and thus of ultimate potential, in college admissions.

This approach could of course also be used by employers and other individuals or organizations that care about potential. A single market on a future test score could inform many audiences at once. And this approach could also be used for any other measures of potential where late life measures are more reliable than early life measures.

GD Star Rating
a WordPress rating system
Tagged as: ,

Elite Evaluator Rents

The elite evaluator story discussed in my last post is this: evaluators vary in the perceived average quality of the applicants they endorse. So applicants seek the highest ranked evaluator willing to endorse them. To keep their reputation, evaluators can’t consistently lie about the quality of those they evaluate. But evaluators can charge a price for their evaluations, and higher ranked evaluators can charge more. So evaluators who, for whatever reason, end up with a better pool of applicants can sustain that advantage and extract continued rents from it.

This is a concrete plausible story to explain the continued advantage of top schools, journals, and venture capitalists. On reflection, it is also a nice concrete story to help explain who resists prediction markets and why.

For example, within each organization, some “elites” are more respected and sought after as endorsers of organization projects. The better projects look first to get endorsement of elites, allowing those elites to sustain a consistently higher quality of projects that they endorse. And to extract higher rents from those who apply to them. If such an organization were instead to use prediction markets to rate projects, elite evaluators would lose such rents. So such elites naturally oppose prediction markets.

For a more concrete example, consider that in 2010 the movie industry successfully lobbied the US congress to outlaw the Hollywood Stock Exchange, a real money market just then approved by the CFTC for predicting movie success, and about to go live. Hollywood is dominated by a few big studios. People with movie ideas go to these studios first with proposals, to gain a big studio endorsement, to be seen as higher quality. So top studios can skim the best ideas, and leave the rest to marginal studios. If people were instead to look to prediction markets to estimate movie quality, the value of a big studio endorsement would fall, as would the rents that big studios can extract for their endorsements. So studios have a reason to oppose prediction markets.

While I find this story as stated pretty persuasive, most economists won’t take it seriously until there is a precise formal model to illustrate it. So without further ado, let me present such a model. Math follows. Continue reading "Elite Evaluator Rents" »

GD Star Rating
a WordPress rating system
Tagged as: , ,

SciCast Contest

SciCast is holding a new contest:

We’ll be offering $16,000 in prizes for conditional forecasts only made from April 23 to May 22.

GD Star Rating
a WordPress rating system
Tagged as:

Show Outside Critics

Worried that you might be wrong? That you might be wrong because you are biased? You might think that your best response is to study different kinds of biases, so that you can try to correct your own biases. And yes, that can help sometimes. But overall, I don’t think it helps much. The vast depths of your mind are quite capable of tricking you into thinking you are overcoming biases, when you are doing no such thing.

A more robust solution is to seek motivated and capable critics. Real humans who have incentives to find and explain flaws in your analysis. They can more reliably find your biases, and force you to hear about them. This is of course an ancient idea. The Vatican has long had “devil’s advocates”, and many other organizations regularly assign critics to evaluate presented arguments. For example, academic conferences often assign “discussants” tasked with finding flaws in talks, and journals assign referees to criticize submitted papers.

Since this idea is so ancient, you might think that the people who talk the most about trying to overcoming bias would apply this principle far more often than do others. But from what I’ve seen, you’d be wrong.

Oh, almost everyone circulates drafts among close associates for friendly criticism. But that criticism is mostly directed toward avoiding looking bad when they present to a wider audience. Which isn’t at all the same as making sure they are right. That is, friendly local criticism isn’t usually directed at trying to show a wider audience flaws in your arguments. If your audience won’t notice a flaw, your friendly local critics have little incentive to point it out.

If your audience cared about flaws in your arguments, they’d prefer to hear you in a context where they can expect to hear motivated capable outside critics point out flaws. Not your close associates or friends, or people from shared institutions via which you could punish them for overly effective criticism. Then when the flaws your audience hears about are weak, they can have more confidence that your arguments are strong.

And if even if your audience only cared about the appearance of caring about flaws in your argument, they’d still want to hear you matched with apparently motivated capable critics. Or at least have their associates hear that such matching happens. Critics would likely be less motivated and capable in this case, but at least there’d be a fig leaf that looked like good outside critics matched with your presented arguments.

So when you see people presenting arguments without even a fig leaf of the appearance of outside critics being matched with presented arguments, you can reasonably conclude that this audience doesn’t really care much about appearing to care about hidden flaws in your argument. And if you are the one presenting arguments, and if you didn’t try to ensure available critics, then others can reasonably conclude that you don’t care much about persuading your audience that your argument lacks hidden flaws.

Now often this criticism approach is often muddled by the question of which kinds of critics are in fact motivated and capable. So often “critics” are used who don’t have in fact have much relevant expertise, or who have incentives that are opaque to the audience. And prediction markets can be seen as a robust solution to this problem. Every bet is an interaction between two sides who each implicitly criticize the other. Both are clearly motivated to be accurate, and have clear incentives to only participate if they are capable. Of course prediction market critics typically don’t give as much detail to explain the flaws they see. But they do make clear that they see a flaw.

GD Star Rating
a WordPress rating system
Tagged as: , , ,

Me At NIPS Workshop

Tomorrow I’ll present on prediction markets and disagreement, in Montreal at the NIPS Workshop on Transactional Machine Learning and E-Commerce. A video will be available later.

GD Star Rating
a WordPress rating system
Tagged as: , ,

Policy vs. Meta-Policy

What is our main problem, bad policy or bad meta-policy? That is, do our collective choices go wrong mainly because we make a few key mistakes in choosing particular policies? Or do they go wrong mainly because we use the wrong institutions to choose these policies?

I would have thought meta-policy was the obvious answer. But CATO asked 51 scholars/pundits this question:

If you could wave a magic wand and make one or two policy or institutional changes to brighten the U.S. economy’s long-term growth prospects, what would you change and why?

And out of the 29 answers now visible, only four (or 14%) of us picked meta-policy changes:

Michael Strain says to increase fed data agency budgets:

BLS data on gross labor market flows … are not available at the state and MSA level, they do not have detailed industry breakdowns, and they do not break down by occupation or by job task. … We also need better “longitudinal” data — data that track individuals every year (or even more frequently) for a long period of time. … The major federal statistical agencies need larger budgets to collect the data we need to design policies to increase workforce participation and to strength future growth. … My second policy suggestion is to expand the … EITC.

Lee Drutman says to increase Congress staff policy budgets:

I would triple the amount the Congress spends on staff (keeping it still at just under 0.1% of the total federal budget). I’d also concentrate that spending in the policy committees. I’d give those committees the resources to be leading institutions for expertise on the issues on which they deal. I’d also give these committees the resources to hire their own experts — economists, lawyers, consultants, etc. But I’d also make sure that these committees were not explicitly partisan.

Eli Dourado says to pay Congress a bonus if the economy does well:

A performance bonus would help to overcome some of Congress’s complacency and division in the face of decades-long economic stagnation. … One good performance metric would be total factor productivity (TFP). … Fernald adjusts his TFP estimate for cyclical labor and capital utilization changes, making his series a better measure. … Members of Congress would earn a $200,000 bonus if the two-year period in which they serve averages 2 percent TFP growth. (more)

Robin Hanson says to use decisions markets to choose policies:

First, I propose that our national legislatures pass bills to define national welfare, and fund and authorize an agency to collect statistics to measure this numerical quantity after the fact. … Second, … create an open bounty system for proposing policies to increase national welfare. … Third, … create two open speculative decision markets for each official proposal, to estimate national welfare given that we do or do not adopt this proposal. … If over the decision day the average if-adopted price is higher than the average if-not-adopt price (plus average bid-ask spread), then the proposal … becomes a new law of the land.

It seems to me that Michael, Lee, and Eli feel wave pretty weak wands. Surely if they thought their wands strong enough to cast any policy or meta-policy spell, wouldn’t they pick meta-policy spells a bit stronger than these? (And why is it always more spending, not less?)

By focusing on policy instead of meta-policy, it seems to me that the other 25 writers show either an unjustified faith in existing policy institutions, or a lack of imagination on possible alternatives. Both of which are somewhat surprising for 51 scholars chosen by CATO.

Added Dec3:  3 of the 25 remaining proposals were in the meta-policy direction:

Susan Dudley:

[Regulatory] agencies should be required to present evidence that they have identified a material failure of competitive markets or public institutions that requires a federal regulatory solution, and provide an objective evaluation of alternatives.

Michael Mandel:

The Regulatory Improvement Commission … would have a limited period of time to come up with a package of regulations to be eliminated or fixed, drawing on public suggestions. The package would then be sent to Congress for an up-or-down vote, and then onto the President for signing.

Megan McArdle:

Instead of analyzing whether the [cost-benefit] calculations in a regulatory ledger sum to a positive or a negative number, we need to set a level of [regulatory] complexity that we’re willing to live with, and then decide which positive sum regulations we’re willing to discard in order to stay within that budget. … Crude rules which might well serve, like capping the number of laws and regulations, allowing a new one to be implemented only if an older one is repealed.

Added 30Sept2015: There are now 51 of these proposals, collected into a book. I found no more that are plausibly meta-proposals.

GD Star Rating
a WordPress rating system
Tagged as: , ,

SciCast Pays HUGE

I’ve posted twice before when SciCast paid out big. The first time we just paid for activity. The second time, we paid for accuracy, but weakly, as it was measured only a few weeks after each trade. Now we are paying HUGE, for longer-term accuracy. We’ll pay out $86,000 to the most accurate participants, as measured from November 7 to March 6:

SciCast is running a new special! The most accurate forecasters during the special will receive Amazon gift cards:

• The top 15 participants will win $2250 to spend at Amazon.com

• The other 135 of the top 150 participants will win $225 to spend at Amazon.com

Participants will be ranked according to their total expected and realized points from their forecasts during the special. Be sure to use SciCast from November 7 through March 6! (more)

Added: At any one time about half the questions will be eligible for this contest. We of course hope to compare accuracy between eligible and ineligible questions.

GD Star Rating
a WordPress rating system
Tagged as:

Why Not Egg Futures?

Older women often find themselves too old to have kids, and regretting it. Such women would have gained by freezing some eggs when they were younger. But when younger, they didn’t think they’d ever want kids, or thought the issue could wait.

Such women might be helped by an egg futures business, paid to take on this risk for them. Such a business could buy eggs from women when young, freeze them, and sell them back to these same women when old.

Of course, to compensate for the wait and risk that the women wouldn’t want eggs later, this business would have to sell eggs back a high price. But still, if the women bought the egg later, that would show they expected to gain from the deal.

Also, not all women would make equally good prospects. So such a business would focus on women likely to wait too long, be well off, and want kids later. So this business would “discriminate” by class in its purchases, paying more to upper class women. A lot like we now discriminate when we pay more for used clothes, cars, or houses from richer people.

Several people have told me that, while they were not personally offended, they expect others to be offended by such a business. Especially if men were involved in the business – a female only business would offend less. I’m somewhat mystified, which is partly why I’m writing this post. Maybe others can help me understand the objection.

Interestingly, we could add some personal prediction markets, which would probably be legal. For each possible young woman, there could be a market where one buys and sells conditional shares in an egg from that customer. If you owned a conditional share, you’d own a share of the profit from later selling that customer her egg. And you’d owe a share of the cost to buy her egg from her, freeze it, and store it. Imagine the fun buying and selling conditional shares regarding the young women that you know. And the fact that this is a share of a real physical object should make it legal.

Ok, I can see how people might be offended at this last suggestion. After all, there’s a risk that people might have fun on something that is supposed to be serious! 😉

GD Star Rating
a WordPress rating system
Tagged as: , ,

SciCast Pays Big Again

Back in May I said that while SciCast hadn’t previously been allowed to pay participants, we were finally running a four week experiment to reward random activities. That experiment paid big and showed big effects; we saw far more activity on days when we paid cash.

In the next four weeks we’ll run another experiment that pays even more:

SciCast is running a new special! For four weeks, you can win prizes on some days of the week:

  • On Tuesdays, win a $25 Amazon gift card with activity.
  • On Wednesdays, win an activity badge for your profile.
  • On Thursdays, win a $25 Amazon gift card with accurate forecasting.
  • On Fridays, win an accuracy badge for your profile.

On each activity prize day, up to 80 valid forecasts and comments made that day will be randomly selected to win. On each accuracy prize day, your chance of winning any of 80 prizes is proportional to your forecasting accuracy. Be sure to use SciCast from July 22 to August 15!

So this time we’ll compare activity incentives to accuracy incentives. Will we get more activity on days when we reward activity, and more accuracy on days when we reward accuracy? Now our accuracy incentives are admittedly weak, in that we’ll evaluate the accuracy of each trade/edit via price changes over only a few weeks after the trade. But hey, its something. Hopefully we can do a better experiment next year.

SciCast now has 532 questions on science and technology, and you can make conditional forecasts on most of them. Come!

GD Star Rating
a WordPress rating system
Tagged as: