Tag Archives: Prediction Markets

Regulating Self-Driving Cars

Warning: I’m sure there’s a literature on this, which I haven’t read. This post is instead based on a conversation with some folks who have read more of it. So I’m “shooting from the hip” here, as they say.

Like planes, boats, submarines, and other vehicles, self-driving cars can be used in several modes. The automation can be turned off. It can be turned on and advisory only. It can be driving, but with the human watching carefully and ready to take over at any time. Or it can be driving with the human not watching very carefully, so that the human would take a substantial delay before being able to take over. Or the human might not be capable of taking over at all; perhaps a remote driver would stand ready to take over via teleoperation.

While we might mostly trust vehicle owners or passengers to decide when to use which modes, existing practice suggest we won’t entirely trust them. Today, after a traffic accident, we let some parties sue others for damages. This can improves driver incentives to drive well. But we don’t trust this to fully correct incentives. So in addition, we regulate traffic. We don’t just suggest that you stop at a red light, keep in one lane, or stay below a speed limit. We require these things, and penalize detected violations. Similarly, we’ll probably want to regulate the choice of self-driving mode.

Consider a standard three-color traffic light. When the light is red, you are not allowed to go. When it is green you are allowed, but not required, to go; sometimes it is not safe to go even when a light is green. When the light is yellow, you are supposed to pay extra attention to a red light coming soon. We could similarly use a three color system as the basis of a three-mode system of regulating self-driving cars.

Imagine that inside each car is a very visible light, which regulators can set to be green, yellow or red. When your light is red you must drive your car yourself, even if you get advice from automation. When the light is yellow you can let the automation take over if you want, but you must watch carefully, ready to take over. When the light is green, you can usually ignore driving, such as by reading or sleeping, though you may watch or drive if you want.

(We might want a standard way to alert drivers when their color changed away from green. Of course we could imagine adding more colors, to distinguish more levels of attention and control. But a three level system seems a reasonable place to start.)

Under this system, the key regulatory choice is the choice of color. This choice could in principle be set different for each car at each moment. But early on the color would probably be set the same for all cars and drivers of a type, in a particular geographic area at a particular time. The color might come from in part a broadcasted signal, with the light perhaps defaulting to red if it can’t get a signal.

One can imagine a very bureaucratic system to set the color, with regulators sitting in a big room filled with monitors, like NASA mission control. That would probably be too conservative and fail to take local circumstances enough into account. Or one might imagine empowering fancy statistical or machine learning algorithms to make the choice. But most any algorithm would make a lot of mistakes, and the choice of algorithm might be politicized, leading to a poor choice.

Let me suggest using prediction markets for this choice. Regulators would have to choose a large set of situation buckets, such that the color must be the same for all situations in the same bucket. Then for each bucket we’d have three markets, estimating the accident rate conditional on a particular color. Assuming that drivers gain some direct benefit from paying less attention to driving, we’d set the color to green unless the expected difference between the green and yellow accident rate became high enough. Similarly for the choice between red and yellow.

Work on combinatorial prediction markets suggests that it is feasible to have billions or more such buckets at a time. We might use audit lotteries and only actually estimate accident rates for some small fraction of these buckets, using bets conditional on such auditing. But even with a much smaller number of buckets, our experience with prediction markets suggests that such a system would work better than either a bureaucratic or statistical system with a similar number of buckets.

Added 1p: My assumptions were influenced by the book Our Robots, Ourselves on the history of automation.

GD Star Rating
loading...
Tagged as: , ,

Merkle’s Futarchy

My futarchy paper, Shall We Vote on Values But Bet on Beliefs?, made public in 2000 but officially “published” in 2013, has gotten more attention lately as some folks talk about using it to govern blockchain organizations. In particular, Ralph Merkle (co-inventor of public key cryptography) has a recent paper on using futarchy within “Decentralized Autonomous Organizations.”

I tried to design my proposal carefully to avoid many potential problems. But Merkle seems to have thrown many of my cautions to the wind. So let me explain my concerns with his variations.

First, I had conservatively left existing institutions intact for Vote on Values; we’d elect representatives to oversee the definition and measurement of a value metric. Merkle instead has each citizen each year report a number in [0,1] saying how well their life has gone that year:

Annually, all citizens are asked to rank the year just passed between 0 and 1 (inclusive). .. it is intended to provide information about one person’s state of satisfaction with the year that has just passed. .. Summed over all citizens and divided by the number of citizens, this gives us an annual numerical metric between 0 and 1 inclusive. .. An appropriately weighted sum of annual collective welfares, also extending indefinitely into the future, would then give us a “democratic collective welfare” metric. .. adopting a discount rate seems like at least a plausible heuristic. .. To treat their death: .. ask the person who died .. ask before they die. .. [this] eliminates the need to evaluate issues and candidates. The individual citizen is called upon only to determine whether the year has been good or bad for themselves. .. We’ve solved .. the need to wade through deceptive misinformation.

Yes, it could be easy to decide how your last year has gone, even if it is harder to put that on a scale from worst to best possible. But reporting that number is not your best move here! Your optimal strategy here is almost surely “bang-bang”, i.e., reporting either 0 or 1. And you’ll probably want to usually give the same consistent answer year after year. So this is basically a vote, except on “was this last year a good or a bad year?”, which in practice becomes a vote on “has my life been good or bad over the last decades.” Each voter must pick a threshold where they switch their vote from good to bad, a big binary choice that seems ripe for strong emotional distortions. That might work, but it is pretty far from what voters have done before, so a lot of voter learning is needed.

I’m much more comfortable with futarchy that uses value metrics tied to the reason an organization exists. Such as using the market price of investment to manage an investment, attendance to manage a conference, or people helped (& how much) to manage a charity.

If there are too many bills on the table at anyone one time for speculators to consider, many bad ones can slip through and have effects before bills to reverse them can be proposed and adopted. So I suggested starting with a high bar for bills, but allowing new bills to lower the bar. Merkle instead starts with a very low bar that could be raised, and I worry about all the crazy bills that might pass before the bar rises:

Initially, anyone can propose a bill. It can be submitted at any time. .. At any time, anyone can propose a new method of adopting a bill. It is evaluated and put into effect using the existing methods. .. Suppose we decided that it would improve the stability of the system if all bills had a mandatory minimum consideration period of three months before they could be adopted. Then we would pass a bill modifying the DAO to include this provision.

I worried that the basic betting process could bias the basic rules, so I set basic voting and process rules off limits from bet changes, and set an independent judiciary to judge if rules are followed. Merkle instead allows this basic bet process to change all the rules, and all the judges, which seems to me to risk self-supporting rule changes:

How the survey is conducted, and what instructions are provided, and the surrounding publicity and environment, will all have a great impact on the answer. .. The integrity of the annual polls would be protected only if, as a consequence, it threatened the lives or the well-being of the citizens. .. The simplest approach would be to appoint, as President, that person the prediction market said had the highest positive impact on the collective welfare if appointed as President. .. Similar methods could be adopted to appoint the members of the Supreme Court.

Finally, I said explicitly that when the value formula changes then all the previous definitions must continue to be calculated to pay off past bets. It isn’t clear to me that Merkle adopts this, or if he allows the bet process to change value definitions, which also seems to me to risk self-supporting changes:

We leave the policy with respect to new members, and to births, to our prediction market. .. difficult to see how we could justify refusing to adopt a policy that accepts some person, or a new born child, as a member, if the prediction market says the collective welfare of existing members will be improved by adopting such a policy. .. Of greater concern are changes to the Democratic Collective Welfare metric. Yet even here, if the conclusion reached by the prediction market is that some modification of the metric will better maximize the original metric, then it is difficult to make a case that such a change should be banned.

I’m happy to see the new interest in futarchy, but I’m also worried that sloppy design may cause failures that are blamed on the overall concept instead of on implementation details. As recently happened to the DAO concept.

GD Star Rating
loading...
Tagged as: , ,

Against Prestige

My life has been, in part, a series of crusades. First I just wanted to understand as much as possible. Then I focused on big problems, wondering how to fix them. Digging deeper I was persuaded by economists: our key problems are institutional. Yes we can have lamentable preferences and cultures. But it is hard to find places to stand and levers to push to move these much, or even to understand the effects of changes. Institutions, in contrast, have specific details we can change, and economics can say which changes would help.

I learned that the world shows little interest in the institutional changes economists recommend, apparently because they just don’t believe us. So I focused on an uber institutional problem: what institutions can we use to decide together what to believe? A general solution to this problem might get us to believe economists, which could get us to adopt all the other economics solutions. Or to believe whomever happens to be right, when economists are wrong. I sought one ring to rule them all.

Of course it wasn’t obvious that a general solution exists, but amazingly I did find a pretty general one: prediction markets. And it was also pretty simple. But, alas, mostly illegal. So I pursued it. Trying to explain it, looking for everyone who had said something similar. Thinking and hearing of problems, and developing fixes. Testing it in the lab, and in the field. Spreading the word. I’ve been doing this for 28 years now. (Began at age 29.)

And I will keep at it. But I gotta admit it seems even harder to interest people in this one uber solution than in more specific solutions. Which leads me to think that most who favor specific solutions probably do so for reasons other than the ones economists give; they are happy to point to economist reasons when it supports them, and ignore economists otherwise. So in addition to pursuing this uber fix, I’ve been studying human behavior, trying to understand why we seem so disinterested.

Many economist solutions share a common feature: a focus on outcomes. This feature is shared by experiments, incentive contracts, track records, and prediction markets, and people show a surprising disinterest in all of them. And now I finally think I see a common cause: an ancient human habit of strong deference to the prestigious. As I recently explained, we want to affiliate with the prestigious, and feel that an overly skeptical attitude toward them taints this affiliation. So we tend to let the prestigious in each area X decide how to run area X, which they tend to arrange more to help them signal than to be useful. This happens in school, law, medicine, finance, research, and more.

So now I enter a new crusade: I am against prestige. I don’t yet know how, but I will seek ways to help people doubt and distrust the prestigious, so they can be more open to focusing on outcomes. Not to doubt that the prestigious are more impressive, but that letting them run the show produces good outcomes. I will be happy if other competent folks join me, though I’m not especially optimistic. Yet. Yet.

GD Star Rating
loading...
Tagged as: , , ,

Does Money Ruin Everything?

Imagine someone said:

The problem with paying people to make shoes is that then they get all focused on the money instead of the shoes. People who make shoes just because they honestly love making shoes, and who aren’t paid anything at all, make better shoes. Once money gets involved people lie about how good their shoes are, and about which shoes they like how much. But without money involved, everyone is nice and honest and efficient. That’s the problem with capitalism; money ruins everything.

Pretty sad argument, right? Now read Tyler Cowen on betting:

This episode is a good example of what is wrong with betting on ideas. Betting tends to lock people into positions, gets them rooting for one outcome over another, it makes the denouement of the bet about the relative status of the people in question, and it produces a celebratory mindset in the victor. That lowers the quality of dialogue and also introspection, just as political campaigns lower the quality of various ideas — too much emphasis on the candidates and the competition. Bryan, in his post, reaffirms his core intuition that labor markets usually return to normal pretty quickly, at least in the United States. But if you scrutinize the above diagram, as well as the lackluster wage data, that is exactly the premise he should be questioning. (more)

Sure, relative to ideal people who only discuss and think about topics with a full focus on and respect for the truth and their disputants, what could be the advantage of bets? Money will only distract them from studying truth, right?

But just because people don’t bet doesn’t mean they don’t have plenty of other non-truth-oriented incentives and interests. They are often rooting for positions, and celebrating some truths over others, due to these other interests. Bet incentives are at least roughly oriented toward speaking truth; the other incentives, not so much. Don’t let the fictional best be the enemy of the feasible-now good. For real people with all their warts, bets promote truth. But for saints, yeah, maybe not so much.

GD Star Rating
loading...
Tagged as:

Could Gambling Save Psychology?

A new PNAS paper:

Prediction markets set up to estimate the reproducibility of 44 studies published in prominent psychology journals and replicated in The Reproducibility Project: Psychology predict the outcomes of the replications well and outperform a survey of individual forecasts. … Hypotheses being tested in psychology typically have low prior probabilities of being true (median, 9%). … Prediction markets could be used to obtain speedy information about reproducibility at low cost and could potentially even be used to determine which studies to replicate to optimally allocate limited resources into replications. (more; see also coverage at 538AtlanticScience, Gelman)

We’ve had enough experiments with prediction markets over the years, both lab and field experiments, to not be at all surprised by these findings of calibration and superior accuracy. If so, you might ask: what is the intellectual contribution of this paper?

When one is trying to persuade groups to try prediction markets, one encounters consistent skepticism about experiment data that is not on topics very close to the proposed topics. So one value of this new data is to help persuade academic psychologists to use prediction markets to forecast lab experiment replications. Of course for this purpose the key question is whether enough academic psychologists were close enough to the edge of making such markets a continuing practice that it was worth the cost of a demonstration project to create closely related data, and so push them over the edge.

I expect that most ordinary academic psychologists need stronger incentives than personal curiosity to participate often enough in prediction markets on whether key psychology results will be replicated (conditional on such replication being attempted). Such additional incentives could come from:

  1. direct monetary subsidies for market trading, such as via subsidized market makers,
  2. traders with higher than average trading records bragging about it on their vitae, and getting hired etc. more because of that, or
  3. prediction market prices influencing key decisions such as what articles get published where, who gets what grants, or who gets what jobs.

For example, imagine that one or more top psychology journals used prediction market chances that an empirical paper’s main result(s) would be confirmed (conditional on an attempt) as part of deciding whether to publish that paper. In this case the authors of a paper and their rivals would have incentives to trade in such markets, and others could be enticed to trade if they expected trades by insiders and rivals alone to produce biased estimates. This seems a self-reinforcing equilibrium; if good people think hard before participating in such markets, others could see those market prices as deserving of attention and deference, including in the journal review process.

However, the existing equilibrium also seems possible, where there are few or small markets on such topics off to the side, markets that few pay much attention to and where there is little resources or status to be won. This equilibrium arguably results in less intellectual progress for any given level of research funding, but of course progress-inefficient academic equilibria are quite common.

Bottom line: someone is going to have to pony up some substantial scarce academic resources to fund an attempt to move this part of academia to a better equilibria. If whomever funded this study didn’t plan on funding this next step, I could have told them ahead of time that they were mostly wasting their money in funding this study. This next move won’t happen without a push.

GD Star Rating
loading...
Tagged as: ,

Intelligence Futures

For many purposes, such as when choosing if to admit someone to a college, we care about both temporary features, who they are now, and permanent features, who they have the ultimate potential to become. One of those features is intelligence; we care about how smart they are now, and about how smart they have the potential to become.

A standard result in intelligence research is that intelligence as measured late in life, such as at age fifty, is a much better indicator of ultimate potential than is intelligence measured at early ages. That is, environments have a stronger influence over measured intelligence of the young, relative to the old.

So if you want a measure of an ultimate potential, such as to use in college admissions, then instead of using current tests like SAT scores, you’d do better to use a good prediction of future test scores, such as predictions of related tests at age fifty.

Now of course colleges could try to do this prediction themselves. They could collect a dataset of people where they have late life test scores and also many possible early predictors of those future test scores, and then fit a statistical model to all that. But such data is hard to collect, this approach limits you to predictors available in your dataset, and the world changes, so that models that work on old data may not predict new data.

Let me propose a prediction market solution: create prediction markets on late life test scores. To make sure people try hard enough later, collect a fund to pay out to the person later in proportion to their late life test score. Then open (and subsidize) a market today in that future test score, and post any associated info that this person will allow. Speculators could then use that info, and anything else they could figure out, to guess the future test score. Finally, use market prices as estimate of future test scores, and thus of ultimate potential, in college admissions.

This approach could of course also be used by employers and other individuals or organizations that care about potential. A single market on a future test score could inform many audiences at once. And this approach could also be used for any other measures of potential where late life measures are more reliable than early life measures.

GD Star Rating
loading...
Tagged as: ,

Elite Evaluator Rents

The elite evaluator story discussed in my last post is this: evaluators vary in the perceived average quality of the applicants they endorse. So applicants seek the highest ranked evaluator willing to endorse them. To keep their reputation, evaluators can’t consistently lie about the quality of those they evaluate. But evaluators can charge a price for their evaluations, and higher ranked evaluators can charge more. So evaluators who, for whatever reason, end up with a better pool of applicants can sustain that advantage and extract continued rents from it.

This is a concrete plausible story to explain the continued advantage of top schools, journals, and venture capitalists. On reflection, it is also a nice concrete story to help explain who resists prediction markets and why.

For example, within each organization, some “elites” are more respected and sought after as endorsers of organization projects. The better projects look first to get endorsement of elites, allowing those elites to sustain a consistently higher quality of projects that they endorse. And to extract higher rents from those who apply to them. If such an organization were instead to use prediction markets to rate projects, elite evaluators would lose such rents. So such elites naturally oppose prediction markets.

For a more concrete example, consider that in 2010 the movie industry successfully lobbied the US congress to outlaw the Hollywood Stock Exchange, a real money market just then approved by the CFTC for predicting movie success, and about to go live. Hollywood is dominated by a few big studios. People with movie ideas go to these studios first with proposals, to gain a big studio endorsement, to be seen as higher quality. So top studios can skim the best ideas, and leave the rest to marginal studios. If people were instead to look to prediction markets to estimate movie quality, the value of a big studio endorsement would fall, as would the rents that big studios can extract for their endorsements. So studios have a reason to oppose prediction markets.

While I find this story as stated pretty persuasive, most economists won’t take it seriously until there is a precise formal model to illustrate it. So without further ado, let me present such a model. Math follows. Continue reading "Elite Evaluator Rents" »

GD Star Rating
loading...
Tagged as: , ,

SciCast Contest

SciCast is holding a new contest:

We’ll be offering $16,000 in prizes for conditional forecasts only made from April 23 to May 22.

GD Star Rating
loading...
Tagged as:

Show Outside Critics

Worried that you might be wrong? That you might be wrong because you are biased? You might think that your best response is to study different kinds of biases, so that you can try to correct your own biases. And yes, that can help sometimes. But overall, I don’t think it helps much. The vast depths of your mind are quite capable of tricking you into thinking you are overcoming biases, when you are doing no such thing.

A more robust solution is to seek motivated and capable critics. Real humans who have incentives to find and explain flaws in your analysis. They can more reliably find your biases, and force you to hear about them. This is of course an ancient idea. The Vatican has long had “devil’s advocates”, and many other organizations regularly assign critics to evaluate presented arguments. For example, academic conferences often assign “discussants” tasked with finding flaws in talks, and journals assign referees to criticize submitted papers.

Since this idea is so ancient, you might think that the people who talk the most about trying to overcoming bias would apply this principle far more often than do others. But from what I’ve seen, you’d be wrong.

Oh, almost everyone circulates drafts among close associates for friendly criticism. But that criticism is mostly directed toward avoiding looking bad when they present to a wider audience. Which isn’t at all the same as making sure they are right. That is, friendly local criticism isn’t usually directed at trying to show a wider audience flaws in your arguments. If your audience won’t notice a flaw, your friendly local critics have little incentive to point it out.

If your audience cared about flaws in your arguments, they’d prefer to hear you in a context where they can expect to hear motivated capable outside critics point out flaws. Not your close associates or friends, or people from shared institutions via which you could punish them for overly effective criticism. Then when the flaws your audience hears about are weak, they can have more confidence that your arguments are strong.

And if even if your audience only cared about the appearance of caring about flaws in your argument, they’d still want to hear you matched with apparently motivated capable critics. Or at least have their associates hear that such matching happens. Critics would likely be less motivated and capable in this case, but at least there’d be a fig leaf that looked like good outside critics matched with your presented arguments.

So when you see people presenting arguments without even a fig leaf of the appearance of outside critics being matched with presented arguments, you can reasonably conclude that this audience doesn’t really care much about appearing to care about hidden flaws in your argument. And if you are the one presenting arguments, and if you didn’t try to ensure available critics, then others can reasonably conclude that you don’t care much about persuading your audience that your argument lacks hidden flaws.

Now often this criticism approach is often muddled by the question of which kinds of critics are in fact motivated and capable. So often “critics” are used who don’t have in fact have much relevant expertise, or who have incentives that are opaque to the audience. And prediction markets can be seen as a robust solution to this problem. Every bet is an interaction between two sides who each implicitly criticize the other. Both are clearly motivated to be accurate, and have clear incentives to only participate if they are capable. Of course prediction market critics typically don’t give as much detail to explain the flaws they see. But they do make clear that they see a flaw.

GD Star Rating
loading...
Tagged as: , , ,

Me At NIPS Workshop

Tomorrow I’ll present on prediction markets and disagreement, in Montreal at the NIPS Workshop on Transactional Machine Learning and E-Commerce. A video will be available later.

GD Star Rating
loading...
Tagged as: , ,