Tag Archives: Prediction Markets

Needed: Social Innovation Adaptation

This is the point during the electoral cycle when people are most willing to consider changing political systems. The nearly half of voters whose candidates just lost are now most open to changes that might have let their side win. But even in an election this acrimonious, that interest is paper thin, and blows away in the slightest breeze. Because politics isn’t about policy – what we really want is to feel part of a political tribe via talking with them about the same things. So if the rest of your tribe isn’t talking about system change, you don’t want to talk about that either.

So I want to tell or remind everyone that if you actually did care about outcomes instead of feeling part of a big tribe, large social gains wait untapped in better social institutions. In particular, very large gains await detailed field trials of institutional innovations. Let me explain.

Long ago when I was a physicist turned computer researcher who started to study economics, I noticed that it seemed far easier to design new better social institutions than to design new better computer algorithms or physical devices. This helped inspire me to switch to economics.

Once I was in graduate program with a thesis advisor who specialized in institution/mechanism design, I seemed to see a well established path for social innovations, from vague intuitions to theoretical analysis to lab experiments to simplified field experiments to complex practice. Of course as with most innovation paths, as costs rose along the path most candidates fell by the wayside. And yes, designing social institutions was harder that it looked at first, though it still seems easier than for computers and physical devices.

But it took me a long time to learn that this path is seriously broken near the end. Organizations with real problems do in fact sometimes allow simplified field trials of institutional alternatives that social scientists have proposed, but only in a very limited range of areas. And usually they mainly just do this to affiliate with prestigious academics; most aren’t actually much interested in adopting better institutions. (Firms mostly outsource social innovation to management consultants, who don’t actually endorse much. Yes startups explore some innovations, but relatively few.)

So by now academics have accumulated a large pile of promising institution ideas, many of which have supporting theory, lab experiments, and even simplified field trials. In addition, academics have even larger literatures that measure and theorize about existing social institutions. But even after promising results from simplified field experiments, much work usually remains to adapt such new proposals to the many complex details of existing social worlds. Complex worlds can’t usefully digest abstract academic ideas without such adaptation.

And the bottom line is that we very much lack organizations willing to do that work for social innovations. Organizations do this work more often for computer or device innovations, and sometimes social innovations get smuggled in via that route. A few organizations sometimes work on social innovations directly, but mostly to affiliate with prestigious academics, so if you aren’t such an academic you mostly can’t participate.

This is the point where I’ve found myself stuck with prediction & decision markets. There has been prestige and funding to prove theorems, do lab experiments, analyze field datasets, and even do limited simplified field trials. But there is little prestige or funding for that last key step of adapting academic ideas to complex social worlds. Its hard to apply rigorous general methods in such efforts, and so hard to publish on that academically. (Even blockchain folks interested have mainly been writing general code, not working with messy organizations.)

So if you want to make clubs, firms, cities, nations, and the world more effective and efficient, a highly effective strategy is to invest in widening the neglected bottleneck of the social innovation pathway. Get your organization to work on some ideas, or pay other organizations to work on them. Yes some ideas can only be tried out at large scales, but for most there are smaller scale analogues that it makes sense to work on first. I stand ready to help organizations do this for prediction & decision markets. But alas to most organizations I lack sufficient prestige for such associations.

GD Star Rating
loading...
Tagged as: ,

Big Impact Isn’t Big Data

A common heuristic for estimating the quality of something is: what has it done for me lately? For example, you could estimate the quality of a restaurant via a sum or average of how much you’ve enjoyed your meals there. Or you might weight recent visits more, since quality may change over time. Such methods are simple and robust, but they aren’t usually the best. For example, if you know of others who ate at that restaurant, their meal enjoyment is also data, data that can improve your quality estimate. Yes, those other people might have different meal priorities, and that may be a reason to give their meals less weight than your meals. But still, their data is useful.

Consider an extreme case where one meal, say your wedding reception meal, is far more important to you than the others. If you weigh your meal experiences in proportion to meal importance, your whole evaluation may depend mainly on one meal. Yes, if meals of that important type differ substantially from other meals then using this method best avoids biases from using unimportant types of meals to judge important types. But the noise in your estimate will be huge; individual restaurant meals can vary greatly for many random reasons even when the underlying quality stays the same. You just won’t know much about meal quality.

I mention all this because many seem eager to give the recent presidential election (and the recent Brexit vote) a huge weight in their estimate the quality of various prediction sources. Sources that did poorly on those two events are judged to be poor sources overall. And yes, if these were by far more important events to you, this strategy avoids the risk that familiar prediction sources have a different accuracy on events like this than they do on other events. Even so, this strategy mostly just puts you at the mercy of noise. If you use a small enough set of events to judge accuracy, you just aren’t going to be able to see much of a difference between sources; you will have little reason to think that those sources that did better on these few events will do much better on other future events.

Me, I don’t see much reason to think that familiar prediction sources have an accuracy that is very different on the most important events, relative to other events, and so I mainly trust comparisons that use a lot of data. For example, on large datasets prediction markets have shown a robustly high accuracy compared to other sources. Yes, you might find other particular sources that seem to do better in particular areas, but you have to worry about selection effects – how many similar sources did you look at to find those few winners? And if prediction market participants became convinced that these particular sources had high accuracy, they’d drive market prices to reflect those predictions.

GD Star Rating
loading...
Tagged as:

Regulating Self-Driving Cars

Warning: I’m sure there’s a literature on this, which I haven’t read. This post is instead based on a conversation with some folks who have read more of it. So I’m “shooting from the hip” here, as they say.

Like planes, boats, submarines, and other vehicles, self-driving cars can be used in several modes. The automation can be turned off. It can be turned on and advisory only. It can be driving, but with the human watching carefully and ready to take over at any time. Or it can be driving with the human not watching very carefully, so that the human would take a substantial delay before being able to take over. Or the human might not be capable of taking over at all; perhaps a remote driver would stand ready to take over via teleoperation.

While we might mostly trust vehicle owners or passengers to decide when to use which modes, existing practice suggest we won’t entirely trust them. Today, after a traffic accident, we let some parties sue others for damages. This can improves driver incentives to drive well. But we don’t trust this to fully correct incentives. So in addition, we regulate traffic. We don’t just suggest that you stop at a red light, keep in one lane, or stay below a speed limit. We require these things, and penalize detected violations. Similarly, we’ll probably want to regulate the choice of self-driving mode.

Consider a standard three-color traffic light. When the light is red, you are not allowed to go. When it is green you are allowed, but not required, to go; sometimes it is not safe to go even when a light is green. When the light is yellow, you are supposed to pay extra attention to a red light coming soon. We could similarly use a three color system as the basis of a three-mode system of regulating self-driving cars.

Imagine that inside each car is a very visible light, which regulators can set to be green, yellow or red. When your light is red you must drive your car yourself, even if you get advice from automation. When the light is yellow you can let the automation take over if you want, but you must watch carefully, ready to take over. When the light is green, you can usually ignore driving, such as by reading or sleeping, though you may watch or drive if you want.

(We might want a standard way to alert drivers when their color changed away from green. Of course we could imagine adding more colors, to distinguish more levels of attention and control. But a three level system seems a reasonable place to start.)

Under this system, the key regulatory choice is the choice of color. This choice could in principle be set different for each car at each moment. But early on the color would probably be set the same for all cars and drivers of a type, in a particular geographic area at a particular time. The color might come from in part a broadcasted signal, with the light perhaps defaulting to red if it can’t get a signal.

One can imagine a very bureaucratic system to set the color, with regulators sitting in a big room filled with monitors, like NASA mission control. That would probably be too conservative and fail to take local circumstances enough into account. Or one might imagine empowering fancy statistical or machine learning algorithms to make the choice. But most any algorithm would make a lot of mistakes, and the choice of algorithm might be politicized, leading to a poor choice.

Let me suggest using prediction markets for this choice. Regulators would have to choose a large set of situation buckets, such that the color must be the same for all situations in the same bucket. Then for each bucket we’d have three markets, estimating the accident rate conditional on a particular color. Assuming that drivers gain some direct benefit from paying less attention to driving, we’d set the color to green unless the expected difference between the green and yellow accident rate became high enough. Similarly for the choice between red and yellow.

Work on combinatorial prediction markets suggests that it is feasible to have billions or more such buckets at a time. We might use audit lotteries and only actually estimate accident rates for some small fraction of these buckets, using bets conditional on such auditing. But even with a much smaller number of buckets, our experience with prediction markets suggests that such a system would work better than either a bureaucratic or statistical system with a similar number of buckets.

Added 1p: My assumptions were influenced by the book Our Robots, Ourselves on the history of automation.

GD Star Rating
loading...
Tagged as: , ,

Merkle’s Futarchy

My futarchy paper, Shall We Vote on Values But Bet on Beliefs?, made public in 2000 but officially “published” in 2013, has gotten more attention lately as some folks talk about using it to govern blockchain organizations. In particular, Ralph Merkle (co-inventor of public key cryptography) has a recent paper on using futarchy within “Decentralized Autonomous Organizations.”

I tried to design my proposal carefully to avoid many potential problems. But Merkle seems to have thrown many of my cautions to the wind. So let me explain my concerns with his variations.

First, I had conservatively left existing institutions intact for Vote on Values; we’d elect representatives to oversee the definition and measurement of a value metric. Merkle instead has each citizen each year report a number in [0,1] saying how well their life has gone that year:

Annually, all citizens are asked to rank the year just passed between 0 and 1 (inclusive). .. it is intended to provide information about one person’s state of satisfaction with the year that has just passed. .. Summed over all citizens and divided by the number of citizens, this gives us an annual numerical metric between 0 and 1 inclusive. .. An appropriately weighted sum of annual collective welfares, also extending indefinitely into the future, would then give us a “democratic collective welfare” metric. .. adopting a discount rate seems like at least a plausible heuristic. .. To treat their death: .. ask the person who died .. ask before they die. .. [this] eliminates the need to evaluate issues and candidates. The individual citizen is called upon only to determine whether the year has been good or bad for themselves. .. We’ve solved .. the need to wade through deceptive misinformation.

Yes, it could be easy to decide how your last year has gone, even if it is harder to put that on a scale from worst to best possible. But reporting that number is not your best move here! Your optimal strategy here is almost surely “bang-bang”, i.e., reporting either 0 or 1. And you’ll probably want to usually give the same consistent answer year after year. So this is basically a vote, except on “was this last year a good or a bad year?”, which in practice becomes a vote on “has my life been good or bad over the last decades.” Each voter must pick a threshold where they switch their vote from good to bad, a big binary choice that seems ripe for strong emotional distortions. That might work, but it is pretty far from what voters have done before, so a lot of voter learning is needed.

I’m much more comfortable with futarchy that uses value metrics tied to the reason an organization exists. Such as using the market price of investment to manage an investment, attendance to manage a conference, or people helped (& how much) to manage a charity.

If there are too many bills on the table at anyone one time for speculators to consider, many bad ones can slip through and have effects before bills to reverse them can be proposed and adopted. So I suggested starting with a high bar for bills, but allowing new bills to lower the bar. Merkle instead starts with a very low bar that could be raised, and I worry about all the crazy bills that might pass before the bar rises:

Initially, anyone can propose a bill. It can be submitted at any time. .. At any time, anyone can propose a new method of adopting a bill. It is evaluated and put into effect using the existing methods. .. Suppose we decided that it would improve the stability of the system if all bills had a mandatory minimum consideration period of three months before they could be adopted. Then we would pass a bill modifying the DAO to include this provision.

I worried that the basic betting process could bias the basic rules, so I set basic voting and process rules off limits from bet changes, and set an independent judiciary to judge if rules are followed. Merkle instead allows this basic bet process to change all the rules, and all the judges, which seems to me to risk self-supporting rule changes:

How the survey is conducted, and what instructions are provided, and the surrounding publicity and environment, will all have a great impact on the answer. .. The integrity of the annual polls would be protected only if, as a consequence, it threatened the lives or the well-being of the citizens. .. The simplest approach would be to appoint, as President, that person the prediction market said had the highest positive impact on the collective welfare if appointed as President. .. Similar methods could be adopted to appoint the members of the Supreme Court.

Finally, I said explicitly that when the value formula changes then all the previous definitions must continue to be calculated to pay off past bets. It isn’t clear to me that Merkle adopts this, or if he allows the bet process to change value definitions, which also seems to me to risk self-supporting changes:

We leave the policy with respect to new members, and to births, to our prediction market. .. difficult to see how we could justify refusing to adopt a policy that accepts some person, or a new born child, as a member, if the prediction market says the collective welfare of existing members will be improved by adopting such a policy. .. Of greater concern are changes to the Democratic Collective Welfare metric. Yet even here, if the conclusion reached by the prediction market is that some modification of the metric will better maximize the original metric, then it is difficult to make a case that such a change should be banned.

I’m happy to see the new interest in futarchy, but I’m also worried that sloppy design may cause failures that are blamed on the overall concept instead of on implementation details. As recently happened to the DAO concept.

GD Star Rating
loading...
Tagged as: , ,

Against Prestige

My life has been, in part, a series of crusades. First I just wanted to understand as much as possible. Then I focused on big problems, wondering how to fix them. Digging deeper I was persuaded by economists: our key problems are institutional. Yes we can have lamentable preferences and cultures. But it is hard to find places to stand and levers to push to move these much, or even to understand the effects of changes. Institutions, in contrast, have specific details we can change, and economics can say which changes would help.

I learned that the world shows little interest in the institutional changes economists recommend, apparently because they just don’t believe us. So I focused on an uber institutional problem: what institutions can we use to decide together what to believe? A general solution to this problem might get us to believe economists, which could get us to adopt all the other economics solutions. Or to believe whomever happens to be right, when economists are wrong. I sought one ring to rule them all.

Of course it wasn’t obvious that a general solution exists, but amazingly I did find a pretty general one: prediction markets. And it was also pretty simple. But, alas, mostly illegal. So I pursued it. Trying to explain it, looking for everyone who had said something similar. Thinking and hearing of problems, and developing fixes. Testing it in the lab, and in the field. Spreading the word. I’ve been doing this for 28 years now. (Began at age 29.)

And I will keep at it. But I gotta admit it seems even harder to interest people in this one uber solution than in more specific solutions. Which leads me to think that most who favor specific solutions probably do so for reasons other than the ones economists give; they are happy to point to economist reasons when it supports them, and ignore economists otherwise. So in addition to pursuing this uber fix, I’ve been studying human behavior, trying to understand why we seem so disinterested.

Many economist solutions share a common feature: a focus on outcomes. This feature is shared by experiments, incentive contracts, track records, and prediction markets, and people show a surprising disinterest in all of them. And now I finally think I see a common cause: an ancient human habit of strong deference to the prestigious. As I recently explained, we want to affiliate with the prestigious, and feel that an overly skeptical attitude toward them taints this affiliation. So we tend to let the prestigious in each area X decide how to run area X, which they tend to arrange more to help them signal than to be useful. This happens in school, law, medicine, finance, research, and more.

So now I enter a new crusade: I am against prestige. I don’t yet know how, but I will seek ways to help people doubt and distrust the prestigious, so they can be more open to focusing on outcomes. Not to doubt that the prestigious are more impressive, but that letting them run the show produces good outcomes. I will be happy if other competent folks join me, though I’m not especially optimistic. Yet. Yet.

GD Star Rating
loading...
Tagged as: , , ,

Does Money Ruin Everything?

Imagine someone said:

The problem with paying people to make shoes is that then they get all focused on the money instead of the shoes. People who make shoes just because they honestly love making shoes, and who aren’t paid anything at all, make better shoes. Once money gets involved people lie about how good their shoes are, and about which shoes they like how much. But without money involved, everyone is nice and honest and efficient. That’s the problem with capitalism; money ruins everything.

Pretty sad argument, right? Now read Tyler Cowen on betting:

This episode is a good example of what is wrong with betting on ideas. Betting tends to lock people into positions, gets them rooting for one outcome over another, it makes the denouement of the bet about the relative status of the people in question, and it produces a celebratory mindset in the victor. That lowers the quality of dialogue and also introspection, just as political campaigns lower the quality of various ideas — too much emphasis on the candidates and the competition. Bryan, in his post, reaffirms his core intuition that labor markets usually return to normal pretty quickly, at least in the United States. But if you scrutinize the above diagram, as well as the lackluster wage data, that is exactly the premise he should be questioning. (more)

Sure, relative to ideal people who only discuss and think about topics with a full focus on and respect for the truth and their disputants, what could be the advantage of bets? Money will only distract them from studying truth, right?

But just because people don’t bet doesn’t mean they don’t have plenty of other non-truth-oriented incentives and interests. They are often rooting for positions, and celebrating some truths over others, due to these other interests. Bet incentives are at least roughly oriented toward speaking truth; the other incentives, not so much. Don’t let the fictional best be the enemy of the feasible-now good. For real people with all their warts, bets promote truth. But for saints, yeah, maybe not so much.

GD Star Rating
loading...
Tagged as:

Could Gambling Save Psychology?

A new PNAS paper:

Prediction markets set up to estimate the reproducibility of 44 studies published in prominent psychology journals and replicated in The Reproducibility Project: Psychology predict the outcomes of the replications well and outperform a survey of individual forecasts. … Hypotheses being tested in psychology typically have low prior probabilities of being true (median, 9%). … Prediction markets could be used to obtain speedy information about reproducibility at low cost and could potentially even be used to determine which studies to replicate to optimally allocate limited resources into replications. (more; see also coverage at 538AtlanticScience, Gelman)

We’ve had enough experiments with prediction markets over the years, both lab and field experiments, to not be at all surprised by these findings of calibration and superior accuracy. If so, you might ask: what is the intellectual contribution of this paper?

When one is trying to persuade groups to try prediction markets, one encounters consistent skepticism about experiment data that is not on topics very close to the proposed topics. So one value of this new data is to help persuade academic psychologists to use prediction markets to forecast lab experiment replications. Of course for this purpose the key question is whether enough academic psychologists were close enough to the edge of making such markets a continuing practice that it was worth the cost of a demonstration project to create closely related data, and so push them over the edge.

I expect that most ordinary academic psychologists need stronger incentives than personal curiosity to participate often enough in prediction markets on whether key psychology results will be replicated (conditional on such replication being attempted). Such additional incentives could come from:

  1. direct monetary subsidies for market trading, such as via subsidized market makers,
  2. traders with higher than average trading records bragging about it on their vitae, and getting hired etc. more because of that, or
  3. prediction market prices influencing key decisions such as what articles get published where, who gets what grants, or who gets what jobs.

For example, imagine that one or more top psychology journals used prediction market chances that an empirical paper’s main result(s) would be confirmed (conditional on an attempt) as part of deciding whether to publish that paper. In this case the authors of a paper and their rivals would have incentives to trade in such markets, and others could be enticed to trade if they expected trades by insiders and rivals alone to produce biased estimates. This seems a self-reinforcing equilibrium; if good people think hard before participating in such markets, others could see those market prices as deserving of attention and deference, including in the journal review process.

However, the existing equilibrium also seems possible, where there are few or small markets on such topics off to the side, markets that few pay much attention to and where there is little resources or status to be won. This equilibrium arguably results in less intellectual progress for any given level of research funding, but of course progress-inefficient academic equilibria are quite common.

Bottom line: someone is going to have to pony up some substantial scarce academic resources to fund an attempt to move this part of academia to a better equilibria. If whomever funded this study didn’t plan on funding this next step, I could have told them ahead of time that they were mostly wasting their money in funding this study. This next move won’t happen without a push.

GD Star Rating
loading...
Tagged as: ,

Intelligence Futures

For many purposes, such as when choosing if to admit someone to a college, we care about both temporary features, who they are now, and permanent features, who they have the ultimate potential to become. One of those features is intelligence; we care about how smart they are now, and about how smart they have the potential to become.

A standard result in intelligence research is that intelligence as measured late in life, such as at age fifty, is a much better indicator of ultimate potential than is intelligence measured at early ages. That is, environments have a stronger influence over measured intelligence of the young, relative to the old.

So if you want a measure of an ultimate potential, such as to use in college admissions, then instead of using current tests like SAT scores, you’d do better to use a good prediction of future test scores, such as predictions of related tests at age fifty.

Now of course colleges could try to do this prediction themselves. They could collect a dataset of people where they have late life test scores and also many possible early predictors of those future test scores, and then fit a statistical model to all that. But such data is hard to collect, this approach limits you to predictors available in your dataset, and the world changes, so that models that work on old data may not predict new data.

Let me propose a prediction market solution: create prediction markets on late life test scores. To make sure people try hard enough later, collect a fund to pay out to the person later in proportion to their late life test score. Then open (and subsidize) a market today in that future test score, and post any associated info that this person will allow. Speculators could then use that info, and anything else they could figure out, to guess the future test score. Finally, use market prices as estimate of future test scores, and thus of ultimate potential, in college admissions.

This approach could of course also be used by employers and other individuals or organizations that care about potential. A single market on a future test score could inform many audiences at once. And this approach could also be used for any other measures of potential where late life measures are more reliable than early life measures.

GD Star Rating
loading...
Tagged as: ,

Elite Evaluator Rents

The elite evaluator story discussed in my last post is this: evaluators vary in the perceived average quality of the applicants they endorse. So applicants seek the highest ranked evaluator willing to endorse them. To keep their reputation, evaluators can’t consistently lie about the quality of those they evaluate. But evaluators can charge a price for their evaluations, and higher ranked evaluators can charge more. So evaluators who, for whatever reason, end up with a better pool of applicants can sustain that advantage and extract continued rents from it.

This is a concrete plausible story to explain the continued advantage of top schools, journals, and venture capitalists. On reflection, it is also a nice concrete story to help explain who resists prediction markets and why.

For example, within each organization, some “elites” are more respected and sought after as endorsers of organization projects. The better projects look first to get endorsement of elites, allowing those elites to sustain a consistently higher quality of projects that they endorse. And to extract higher rents from those who apply to them. If such an organization were instead to use prediction markets to rate projects, elite evaluators would lose such rents. So such elites naturally oppose prediction markets.

For a more concrete example, consider that in 2010 the movie industry successfully lobbied the US congress to outlaw the Hollywood Stock Exchange, a real money market just then approved by the CFTC for predicting movie success, and about to go live. Hollywood is dominated by a few big studios. People with movie ideas go to these studios first with proposals, to gain a big studio endorsement, to be seen as higher quality. So top studios can skim the best ideas, and leave the rest to marginal studios. If people were instead to look to prediction markets to estimate movie quality, the value of a big studio endorsement would fall, as would the rents that big studios can extract for their endorsements. So studios have a reason to oppose prediction markets.

While I find this story as stated pretty persuasive, most economists won’t take it seriously until there is a precise formal model to illustrate it. So without further ado, let me present such a model. Math follows. Continue reading "Elite Evaluator Rents" »

GD Star Rating
loading...
Tagged as: , ,

SciCast Contest

SciCast is holding a new contest:

We’ll be offering $16,000 in prizes for conditional forecasts only made from April 23 to May 22.

GD Star Rating
loading...
Tagged as: