Investment advisor Michael Covel interviewed me on prediction markets for his podcast show here. I couldn’t be very encouraging about his main strategy of trend-following, but we covered many interesting issues.
Investment advisor Michael Covel interviewed me on prediction markets for his podcast show here. I couldn’t be very encouraging about his main strategy of trend-following, but we covered many interesting issues.
When I first got into prediction markets twenty five years ago, I called them “idea futures”, and I focused on using them to reform how we deal with controversies in science and academia (see here, here, here, here). Lately I’ve focused on what I see as the much higher value application of advising decisions and reforming governance (see here, here, here, here). I’ve also talked a lot lately about what I see as the main social functions of academia (see here, here, here, here). Since prediction markets don’t much help to achieve these functions, I’m not optimistic about the demand for using prediction markets to reform academia.
But periodically people do consider using prediction markets to reform academia, as did Andrew Gelman a few months ago. And a few days ago Scott Alexander, who I once praised for his understanding of prediction markets, posted a utopian proposal for using prediction markets to reform academia. These discussions suggest that I revisit the issue of how one might use prediction markets to reform academia, if in fact enough people cared enough about gaining accurate academic beliefs. So let me start by summarizing and critiquing Alexander’s proposal.
Alexander proposes prediction markets where anyone can post any “theory” broadly conceived, like “grapes cure cancer.” (Key quotes below.) Winning payouts in such market suffer a roughly 10% tax to fund experiments to test their theories, and in addition some such markets are subsidized by science patron orgs like the NSF. Bettors in each market vote on representatives who then negotiate to pick someone to pay to test the bet-on theory. This tester, who must not have a strong position on the subject, publishes a detailed test design, at which point bettors could leave the market and avoid the test tax. “Everyone in the field” must make a public prediction on the test. Then the test is done, winners paid, and a new market set up for a new test of the same question. Somewhere along the line private hedge funds would also pay for academic work in order to learn where they should bet.
That was the summary; here are some critiques. First, people willing to bet on theories are not a good source of revenue to pay for research. There aren’t many of them and they should in general be subsidized not taxed. You’d have to legally prohibit other markets to bet on these without the tax, and even then you’d get few takers.
Second, Alexander says to subsidize markets the same way they’d be taxed, by adding money to the betting pot. But while this can work fine to cancel the penalty imposed by a tax, it does not offer an additional incentive to learn about the question. Any net subsidy could be taken by anyone who put money in the pot, regardless of their info efforts. As I’ve discussed often before, the right way to subsidize info efforts for a speculative market is to subsidize a market maker to have a low bid-ask spread.
Third, Alexander’s plan to have bettors vote to agree on a question tester seems quite unworkable to me. It would be expensive, rarely satisfy both sides, and seems easy to game by buying up bets just before the vote. More important, most interesting theories just don’t have very direct ways to test them, and most tests are of whole bundles of theories, not just one theory. Fourth, for most claim tests there is no obvious definition of “everyone in the field,” nor is it obvious that everyone should have opinion on those tests. Forcing a large group to all express a public opinion seems a huge cost with unclear benefits.
OK, now let me review my proposal, the result of twenty five years of thinking about this. The market maker subsidy is a very general and robust mechanism by which research patrons can pay for accurate info on specified questions, at least when answers to those questions will eventually be known. It allows patrons to vary subsidies by questions, answers, time, and conditions.
Of course this approach does require that such markets be legal, and it doesn’t do well at the main academic function of credentialing some folks as having the impressive academic-style mental features with which others like to associate. So only the customers of academia who mainly want accurate info would want to pay for this. And alas such customers seem rare today.
For research patrons using this market-maker subsidy mechanism, their main issues are about which questions to subsidize how much when. One issue is topic. For example, how much does particle physics matter relative to anthropology? This mostly seems to be a matter of patron taste, though if the issue were what topics should be researched to best promote economic growth, decision markets might be used to set priorities.
The biggest issue, I think, is abstraction vs. concreteness. At one extreme one can ask very specific questions like what will be the result of this very specific experiment or future empirical measurement. At the other extreme, one can ask very abstract questions like “do grapes cure cancer” or “is the universe infinite”.
Very specific questions offer bettors the most protection against corruption in the judging process. Bettors need worry less about how a very specific question will be interpreted. However, subsidies of specific questions also target specific researchers pretty directly for funding. For example, subsidizing bets on the results of a very specific experiment mainly subsidizes the people doing that experiment. Also, since the interest of research patrons in very specific questions mainly results from their interest in more general questions, patrons should prefer to directly target the more general questions directly of interest to them.
Fortunately, compared to other areas where one might apply prediction markets, academia offers especially high hopes for using abstract questions. This is because academia tends to house society’s most abstract conversations. That is, academia specializes in talking about abstract topics in ways that let answers be consistent and comparable across wide scopes of time, space, and discipline. This offers hope that one could often simply bet on the long term academic consensus on a question.
That is, one can plausibly just directly express a claim in direct and clear abstract language, and then bet on what the consensus will be on that claim in a century or two, if in fact there is any strong consensus on that claim then. Today we have a strong academic consensus on many claims that were hotly debated centuries ago. And we have good reasons to believe that this process of intellectual progress will continue long into the future.
Of course future consensus is hardly guaranteed. There are many past debates that we’d still find to hard to judge today. But for research patrons interested in creating accurate info, the lack of a future consensus would usually be a good sign that info efforts in that area less were valuable than in other areas. So by subsidizing markets that bet on future consensus conditional on such a consensus existing, patrons could more directly target their funding at topics where info will actually be found.
Large subsidies for market-makers on abstract questions would indirectly result in large subsidies on related specific questions. This is because some bettors would specialize in maintaining coherence relationships between the prices on abstract and specific questions. And this would create incentives for many specific efforts to collect info relevant to answering the many specific questions related to the fewer big abstract questions.
Yes, we’d probably end up with some politics and corruption on who qualifies to judge later consensus on any given question – good judges should know the field of the question as well as a bit of history to help them understand what the question meant when it was created. But there’d probably be less politics and lobbying than if research patrons choose very specific questions to subsidize. And that would still probably be less politics than with today’s grant-based research funding.
Of course the real problem, the harder problem, is how to add mechanisms like this to academia in order to please the customers who want accuracy, while not detracting from or interfering too much with the other mechanisms that give the other customers of academia what they want. For example, should we subsidize high relevant prestige participants in the prediction markets, or tax those with low prestige?
Those promised quotes: Continue reading "Fixing Academia Via Prediction Markets" »
Alexander Berger from GiveWell interviewed me on prediction markets, and has posted his notes here. Alex and I seem to disagree about the importance of this topic:
Organizational obstacles The main barrier to wider-scale adoption of prediction markets is that most organizations are reluctant to use them. It is unclear why this is the case. Those currently in power within firms may resist prediction markets because the markets would spread previously privileged information across the company and change perceptions of what is knowable and who knows
I tried to emphasize this topic, but Alex devotes only 60 out of 1800 words to it.
In a column, Andrew Gelman and Eric Loken note that academia has a problem:
Unfortunately, statistics—and the scientific process more generally—often seems to be used more as a way of laundering uncertainty, processing data until researchers and consumers of research can feel safe acting as if various scientific hypotheses are unquestionably true.
They consider prediction markets as a solution, but largely reject them for reasons both bad and not so bad. I’ll respond here to their article in unusual detail. First the bad:
Would prediction markets (or something like them) help? It’s hard to imagine them working out in practice. Indeed, the housing crisis was magnified by rampant speculation in derivatives that led to a multiplier effect.
Yes, speculative market estimates were mistaken there, as were most other sources, and mistaken estimates caused bad decisions. But speculative markets were the first credible source to correct the mistake, and no other stable source had consistently more accurate estimates. Why should the most accurate source should be blamed for mistakes made by all sources?
Allowing people to bet on the failure of other people’s experiments just invites corruption, and the last thing social psychologists want to worry about is a point-shaving scandal.
What about letting researchers who compete for grants, jobs, and publications write critical referee reports and publish criticism, doesn’t that invite corruption too? If you are going to forbid all conflicts of interest because they invite corruption, you won’t have much left you will allow. Surely you need to argue that bet incentives are more corrupting that other incentives. Continue reading "Academic Stats Prediction Markets" »
It looks bad for a manager to have one of his projects fail. So to “cover his ass”, such a manager often tries to prevent any records showing that people saw failure coming. After a failure, he wants to say “this was just random bad luck; no one could have foreseen seen it.” His bosses up the chain of command tend to allow this, because they also want to avoid being held responsible for failures during their watch. So they also prefer the random back luck story.
Unfortunately, this approach tends to prevent organizations from getting signals that would let them mitigate failures, such as by quitting projects earlier. For example, most startup firms don’t fail until they have spent nearly all of the cash they were given. It is rare for a startup to admit it isn’t going to work out, and give some cash back to investors. Similarly, government agencies created to achieve some purpose rarely recommend to legislatures that they be eliminated when their find that they aren’t achieving their intended purposes.
Of course bosses don’t want to be too obvious about silencing possible signals of failure. They find it hard to silence what have become standard signals, like cost accounting measures.
A great application of prediction markets is to give better and clearer warnings of upcoming failure, to enable better mitigation, such as quitting. Of course project bosses anticipate this, and oppose prediction markets on their projects, for exactly this reason. But we can still hope that prediction market warnings may someday become a standard signal, and thus hard to silence:
I hope prediction markets within firms may someday gain a status like cost accounting today. In a world were no one else did cost accounting, proposing that your firm do it would basically suggest that someone was stealing there. Which would look bad. But in a world where everyone else does cost accounting, suggesting that your firm not do it would suggest that you want to steal from it. Which also looks bad.
Similarly, in a world where few other firms use prediction markets, suggesting that your firm use them on your project suggests that your project has an unusual problem in getting people to tell the truth about it via the usual channels. Which looks bad. But in a world where most firms use prediction markets on most projects, suggesting that your project not use prediction markets would suggest you want to hide something. (more)
Long ago our primate ancestors learned to be “political.” That is, instead of just acting independently, we learned to join into coalitions for mutual advantage, and to switch coalitions for private advantage. Our human ancestors added social norms, i.e., rules enforced by feelings of outrage in broad coalitions. Foragers used norms and coalitions to manage bands of roughly thirty members, and farmers applied similar behaviors to village communities of roughly a thousand.
In ancient politics, people learned to attract allies, to judge who else was reliable as an ally, to gossip about who was allied with who, and to help allies and hurt rivals. In particular we learned to say good things about allies and bad things about rivals, such as accusing rivals of violating key social norms, and praising allies for upholding them.
Today many people consider themselves to be very “political”, and they treat this aspect of themselves as central to their identity. They spend lots of time talking about related views, associating with those who share them, and criticizing those who disagree. They often feel especially proud of how boldly and freely they do these things, relative to their ancestors and those in “backward” cultures.
Trouble is, such folks are mostly “political” about national or international politics. Their interest fades as the norms and coalitions at stake focus on smaller scales, such as regions, cities, or neighborhoods. The politics of firms, clubs, and families hardly engage them at all. Of course such people are members of local coalitions, and do sometimes voice support for enforcing related norms. So they are political there to some extent. But they are much less bold, self-righteous, and uncompromising about local politics, and don’t consider related views to be central to their identity. Such folks are eager to associate with those who sacrifice to improve world politics, but are only mildly interested in associating with those who sacrifice to improve local politics.
This focus on politics at the largest scale is both relatively safe, and relatively useless. On the one hand, your efforts to take sides and support norm enforcement at very local levels are far more likely to benefit you personally via better local outcomes. On the other hand, such efforts are far more likely to bother opposing coalitions, leaving you vulnerable to retaliation. Given these risks, and the greater praise given to for those who push politics at the largest scales, it is understandable if people tend to focus on safe-scale politics, unlikely to cause them personal troubles.
Near-far theory predicts that we’d tend to focus our ideals and moral outrage and praise more on the largest social scales. But a net result of this tendency is that we seem far less effective today than were our ancestors at enforcing very-local-level social norms, and at discouraging related harms from local coalitions. We chafe at the idea of letting our nation be dominated by a king, but we easily and quietly submit to local kings in firms, clubs, and families.
Our political instincts and efforts are largely wasted, because we just are much less able to coordinate to identify and right wrongs on the largest scales. Now to some extent this is healthy. There was a lot of destructive waste when most political efforts were directed at very local politics. But many wrongs were also detected and righted. The human political instinct does serve some positive functions. After all, human bands were much larger than other primate bands, suggesting that human politics was less destructive than other primate politics.
I’ve suggested that organizations use decision markets to help advise key decisions. And to illustrate the idea, I’ve discussed the example of how it could apply to national politics. I’ve done this because people seem far more interested in reforming national politics, relative to reforming local small organizations. But honestly, I see a much bigger gains overall from smaller scale applications. And small scale application is where the idea needs to start, to work out the kinks. And such trials are feasible now. If only I could get some small orgs to try. Sigh.
I posted back in ’07 on a hero of local politics:
A colleague of my wife was a nurse at a local hospital, and was assigned to see if doctors were washing their hands enough. She identified and reported the worst offender, whose patients were suffering as a result. That doctor had her fired; he still works there not washing his hands. (more)
I’d admire you much more if you acted like this, relative to your marching on Washington, soliciting door-to-door for a presidential candidate, or posting ever so many political rants on Facebook. Shouldn’t you admire such folks far more as well?
Most of us live in worlds of conversation, like books or blogs or chats, where we tend to give many others the benefit of the doubt that they are mostly talking “in good faith.” We don’t just talk to show off or to support allies and knock rivals – we hold our selves to higher standards. But let me explain why that may often be wishful thinking.
I’ve previously suggested that coalition politics infuses a lot of human behavior. That is, we tend to use all available means to try to help “us “and hurt “them”, even if on average these games hurt us all. Coalition politics is a dirt that regularly accumulates in most any corner that is not vigorously and regularly cleaned.
This view predicts that coalition politics also infuses a lot of how writings (and speeches, etc.) are evaluated. That is, when we evaluate the writings of others, we attend to how such evaluations may help our coalitions and hurt rival coalitions. Especially for writings on subjects that have little direct relevance for how we live our lives. Like most topics in most blogs, magazines, journals, books, speeches, etc.
However, while we may find such cynicism plausible as a theory of rivals, we are reluctant to consciously embrace it as theory of ourselves. We instead want to say that we mostly evaluate the writings of others using different criteria. And when we are part of a group that evaluates writings similarly, we want to say this is because our group shares key evaluation criteria beyond “us good, them bad.”
Now some groups can offer concrete evidence for their claims to be relatively clean of coalition politics. These are groups who declare specific “objective” standards to judge writing. That is, they use standards that are relatively easy for outsiders to check. For example, outsiders can relatively easily check groups who evaluate writings based on word count, or on correctness of spelling and grammar. Yes, a commitment to such standards may favor some groups over others, such as good spellers over bad spellers. But it can’t be adjusted very easily to shifting coalitions. Which makes it a poor tool for supporting coalition politics.
Some groups say they judge writings based on their popularity in some audience. And yes, it can be pretty easy to evaluate the popularity of writings. However, it could easily be the audience that is using coalition politics to decide what is popular. Thus using popularity to evaluate writings doesn’t at all ensure that coalition politics doesn’t dominate evaluations.
Some groups claim to evaluate written “maps” based on how well they match intended “territories”. And when it is easy for many clearly-neutral outsiders to visit a territory, it can be easy for outsiders to check that territory-matching is actually how this group evaluates maps. But the harder it is for outsiders to see territories, or to read their supposedly matching maps, and the more easily that outside critics can be credibly accused of political bias, the more easily a group could pretend to evaluate maps based on territory matches, but actually evaluate them via coalition politics. For example, anthropologists watching the private lives of the very rich might write descriptions of those lives that pander to academic presumptions about the very rich, since few academics ever see those lives directly, and the few who do can be accused of biased by association.
Some groups use objective criteria for evaluations, but don’t give those criteria enough weight to stop coalition politics from dominating evaluations. For example, economic theory journals can claim that they only publish articles containing proofs without obvious errors. And the ability of readers to seek errors may ensure that this criteria is usually satisfied. But such journals may still reject most submissions that meet this criteria, allowing coalition politics to dominate which articles are accepted. Winning coalitions may be constrained to include only members capable of constructing proofs without obvious errors, but this need not be very constraining to them.
Another approach is to only use objective evaluation criteria, but to use many such criteria and to be unclear about their relative weights. The more such criteria, the greater the chance of finding criteria to reach whatever evaluation one wants. For example, in many legal areas there is wide agreement on the relevant factors, and on which directions each factor points to in a final decision. Nevertheless, given enough relevant factors, courts may usually have enough discretion to favor either side.
For any one group and their declared criteria of evaluation, it can be hard for outsiders to judge just how much leeway that group has left for coalition politics to influence evaluations. We tend to give the benefit of the doubt to our own groups, but not to rivalrous groups. For example, pro-science anti-religion folks may presume that peer review in scientific journals is mainly used to enforce good evidence norms, but that religious leaders mainly use their discretion in interpreting scriptures to favor their allies.
If they were honest, each group would either declare objective evaluation criteria that leave little room for coalition politics, or accept that outsiders can reasonably presume that coalition politics probably dominates their evaluations. And everyone should expect that even if their group now seems an exception where other criteria dominate, it will probably not remain so for long. Because these are in fact reasonable assumptions in a world where collation politics is a dirt that regularly and rapidly accumulates in any corner not vigorously and regularly cleaned.
Hey there reader, I’m really am talking about you and the worlds of writing where you live. Do you presume that your worlds are mostly dominated by politics, where different coalitions vie to support allies and knock rivals? Or do you see the groups you hang with as holding themselves to higher standards? If higher standards, are they standards that outsiders can easily check on? Or do you in practice mostly have to trust a small group of insiders to judge if standards are met? And if you have to trust insiders, how sure can you be their choices aren’t mostly driven by coalition politics?
Years ago I struggled with this issue, and wondered what evaluation criteria a group could adopt to robustly induce their writings to roughly tract truth on a wide range of topics, and resist the corrupting pressures of coalition politics to say what key audiences want or expect to hear. I was delighted to find that for a wide range of topics open prediction markets offer such robust criteria. Each trade can be an “edit” of the highly-evaluated “writing” that is the current market odds on each topic. Such edits are rewarded or punished via cash for moving the consensus toward or away from the truth.
I had hoped that many groups would be anxious to avoid the appearance that coalition politics may dirty their evaluations, and thus be eager to adopt new standards that can avoid such an appearance. So I hoped that many groups would want to adopt prediction markets, once they were clearly shown to be feasible and practical. Alas, that seems to not be so.
Today’s winning coalitions seem to prefer to let coalition politics continue to determine who wins in each group. This seems like how police departments would like to appear free from corruption, but not enough to actually make their internal affairs departments report to someone other than the chief of police. We are fond of tarring rival groups with the accusation that coalition politics dominates their evaluations, and we are fond of pretending that we are different. But not enough to visibly block that politics.
I hereby offer Robin Hanson (only) 2-to-1 odds on the following statement:
“There will, by 1 January 2010, exist a robotic system capable of the cleaning an ordinary house (by which I mean the same job my current cleaning service does, namely vacuum, dust, and scrub the bathroom fixtures). This system will not employ any direct copy of any individual human brain. Furthermore, the copying of a living human brain, neuron for neuron, synapse for synapse, into any synthetic computing medium, successfully operating afterwards and meeting objective criteria for the continuity of personality, consciousness, and memory, will not have been done by that date.”
Since I am not a bookie, this is a private offer for Robin only, and is only good for $100 to his $50. –JoSH
At the time I replied that my estimate for the chance of this was in the range 1/5 to 4/5, so we didn’t disagree. But looking back I think I was mistaken – I could and should have known better, and accepted this bet.
I’ve posted on how AI researchers with twenty years of experience tend to see slow progress over that time, which suggests continued future slow progress. Back in ’91 I’d had only seven years of AI experience, and should have thought to ask more senior researchers for their opinions. But like most younger folks, I was more interested in hanging out and chatting with other young folks. While this might sometimes be a good strategy for finding friends, mates, and same-level career allies, it can be a poor strategy for learning the truth. Today I mostly hear rapid AI progress forecasts from young folks who haven’t bothered to ask older folks, or who don’t think those old folks know much relevant.
I’d guess we are still at least two decades away from a situation where over half of US households use robots do to over half of the house cleaning (weighted by time saved) that people do today.