We Can Do Low-Treewidth Combinatorial Prediction Markets!

In my last post I said I hoped prediction markets would become

an “our answers” institution with easily-found accurate answers on as many questions as possible.

Today prediction markets’ main problem is laws banning them (and customs limiting firm interest in internal markers). Alas as an academic, I can’t do much to change such laws. But I can work to improve the basic tech, for the day when prediction markets are legal. Yesterday I also said:

[Here’s] one way to expand the range of questions prediction markets can cheaply answer: start with a set of base questions, and then let users ask and answer questions from the vast space of combinations of those base questions. For example, starting with a base consisting of all the specific future readings of all weather stations, users could ask most any weather question of interest, such as whether this next winter will be colder where they are living now, or in the particular city where they are thinking of moving. In my next post I’ll talk about a big advance my research group has achieved in the implementation of such combinatorial prediction markets.

The DAGGRE project that I’ve been part of for over a year now has been working to advance the theory and practice of combinatorial prediction markets. Within a few months we will field an edit-based system where users can browse current answer estimates, and for each estimate can:

  • Edit the value. After you change an estimate to a new value, estimates that users see on all questions are Bayes-rule updates from that new value.
  • Assume a value. After you assume a value for this estimate, all estimates you see on all questions are conditional on this assumption.

To support this interface, we have three computing tasks:

  1. When a user has made assumptions A and browses to a possible question answer T, compute and show the current value v = P(T|A).
  2. To see how far this user could change this number, compute the edit values, v-, v+, in each direction that give him or her zero assets in some state.
  3. To show the user if he or she is currently long or short on this topic, compute and compare his or her expected assets given A&T, and given A&notT.

The problem is, even though a simple math formula says how a user’s (state-dependent) assets change when he or she makes such an edit, it is in general infeasible to quickly calculate the above numbers for more than a few dozen base questions.

To make computing feasible, we must somehow limit the space of allowable question answers. So the big question is: what limits will allow as many as possible of the combinations that user will typically want to edit, while still enabling accurate computation for allowed combinations.

One approach is to group base questions into sets of roughly twenty or less, and allow arbitrary combinations within each group, but no combinations between different groups. This is feasible, but we can do better, via Bayesian (or Markov) nets.

These nets limit the space of possible answers by imposing conditional independence assumptions. In a net of (directly) connected variables (i.e., questions), all variables are assumed independent of unconnected variables, conditional on connected variables. A standard (junction tree) algorithm allows exact computation of conditional probabilities for nets with a low “treewidth” (roughly, the number of variables you’d have to merge to make the net into a tree).

Of course that only covers task #1 above; what about the other two? My DAGGRE group has just published a paper (also here), to be presented at a conference (UAI) in August, showing how to exactly (well, up to machine precision) compute tasks #2,3 in this same situation. (Task #2 needs a low treewidth, but #3 works on any net.) We’ve implemented this algorithm and shown that an ordinary laptop can handle a thousand variables and a treewidth of ten in a fraction of a second. Within a few months this will be the (public domain) backend of our public edit-based combinatorial prediction market with hundreds of questions and active users. I’ll announce it here on this blog when it is ready to show.

Of course our real best-answers don’t actually fit in a low treewidth net. So we have more work to do: finding efficient approximations for tasks #1,2 in more realistic nets. There is already a large literature on ways to compute conditional probabilities in high treewidth Markov nets; we just need to study and choose among them. And I already know of a very promising general way to do task #2 well enough; we just need to try it.

Even when we can handle more realistic nets, we will still have to limit their size to make computing feasible. So we’ll need ways to let users edit the net structures – to tell us where to add and delete connections. I have some ideas here, but we are far from a satisfactory solution.

Even so, progress has been surprisingly rapid, and we have good reasons to expect continued rapid progress. We seem just a few years away from having the tech to field general robust combinatorial prediction markets! Then we’ll just have to figure out how to make it all legal.

GD Star Rating
Tagged as: ,
Trackback URL:
  • Ilya Shpitser

    Robin, thanks for the very interesting post.  The camera ready draft does not appear to be up at the url you linked.

    One issue with this approach is it’s unclear the independence assumptions in your graph always make sense.  For example you probably group questions by topic somehow.  But it is unclear why people’s answers are independent of the rest of their answers given answers to questions in neighboring topics.  It may be that answers are correlated across topics due to some broad confounders.  Have you considered using some sort of latent model?

    Are you going to this year’s UAI?  I would love to chat about this more in person.

    • Scott

      In machine learning, naive Bayes predictors make even stronger independence assumptions (that are trivially incorrect!) and still perform well. To the extent that this approach makes incorrect assumptions, it will likely still produce better results than a computationally infeasible approach.

    • Yes, I’ll be there at UAI. For the full paper, see the alternate link I give. A latent model would still be a Markov net.

  • explodicle

    We don’t necessarily need to wait for these to become legal. By converting fiat currency into cryptocurrency (like Bitcoin), you can run and participate in free online bets anonymously. I’ve made a small post on one way to do it here:

    My simplified proposal isn’t combinatorial, but allows for customized smart contracts that could be. Of course, I’m not saying we should count on prediction markets being suppressed for a long time, but they might be.

    • There is a huge demand for sports betting, and a large black market in it. Yet none of that uses Bitcoin, even though the profit potential would seem huge. So clearly many expect large costs to actually trying to use Bitcoin for that purpose. I don’t have a reason to disagree with them.

      • explodicle

        There is sports betting with Bitcoin already: https://en.bitcoin.it/wiki/Trade#Gambling

        IMHO gambling is one of the biggest current uses for Bitcoin, after of course the illegal drug trade and speculation on the currency itself. But due to its near-anonymous nature, measuring exactly how much of what goes on is impossible. And with its volatile exchange rate, I’d imagine most sports gamblers would just use it as a means to transfer US dollars to/from larger international sites instead of Bitcoin-denominated startups.

  • Stuart Armstrong

    Minor query: in practice, when you make people become consistent in their bets (through arbitrage), does this improve their accuracy?

  • Doug

    I need to read through this more closely, but a question comes to mind. Why can’t these values just be estimated numerically, instead of analytically, through Monte Carlo.

    If the market has historical data and you want to estimate your position’s expected delta relative to the movement of some other combinatorial movement then just sample historically with a bias towards those return periods that were closer to the combination.

    This is what’s done in finance to model complex derivatives exposures to arbitrary market factors or riskiness in historically stressed markets.

    • Matthew Graves

      This could work, and a lot of high tree-width Bayes net applications do use Monte Carlo sampling. For prediction markets, one wouldn’t have historical data to go off of, but you can still get this from the conditional estimates that you do have. One way is to calculate the conditional probability for one arbitrary node given naively chosen starting values (which is cheap), randomly set it according to those probabilities, and then move to another relevant node, again varying just it- after a while, you have a sample of the relevant distribution.

      It’s still worth avoiding if possible: generally, if you can figure out a way to make your system deterministic instead of stochastic, you should take it.

  • andagain

    “Today prediction markets’ main problem is laws banning them ”

    Aren’t they legal in the UK? Are they much used there?

    • UK gambling markets are heavily regulated.

      • andagain

         There is a big difference between “regulated” and “banned”.

  • This is all well and good, but WHAT practical application has it outside the classroom? 

    Not one economist “predicted” the 2008 economic collapse despite mountains of data in their academic warrens. 
    No matter what they said – the day after…

    • explodicle

      Lots of economists predicted a collapse years ahead of time, but most couldn’t say exactly when and no one listened anyways. Maybe next time when they can put their money where their mouth is, we will listen.

      The practical applications once widespread would be immense – we would have the best possible way to assess the likelihood of important future events. Imagine if in 2003 we heard “but Mr. President, over X million dollars says there are no WMDs in Iraq. Would you bet some of your personal fortune that there are WMDs?”

  • Matthew Graves

    “Even when we can handle more realistic nets, we will still have to
    limit them to make computing feasible. So we’ll need ways to let users
    edit the net structures – to tell us where to add and delete
    connections. I have some ideas here, but we are a way from a
    satisfactory solution.”

    The first two things that came to mind: giving users private conditional independence networks, so that they can add or subtract arcs to their heart’s content without polluting the general network, and relatedly, treating conditional independence like any other piece of knowledge in the system- one that has an attached price structure. (It would be difficult to resolve a bet on whether or not two things are conditionally independent, though, and so there might need to be a different type of price. A tax to cover the computational cost?)

  • When the Good Judgement ended its first season, I didn’t sign up again because I wanted to try DAGGRE (I had already joined Tetlock’s group before I heard of yours). But I haven’t been able to sign in to that and haven’t gotten a response when I emailed the contact given at the DAGGRE site.

    Firepower, the Efficient Markets Hypothesis says that an economic collapse will be difficult to predict. If the information is available, speculators will start shorting/selling to bring prices down, and voila you’ve got a recession. Also, you seem quite confident in your beliefs even when they are wrong.

    • Charles R. Twardy

      @a527a356c2e9dc6fbc6067ff0648e6f3:disqus I just re-tested the “Join!” button on http://daggre.org, and it sent the registration email.  If you’re not getting the email, please check spam. If that doesn’t work, please email or call Marva Tokhi: daggre_at_c4i.gmu.edu (replace _at_ with @) or at 703-993-2021.  We had a serious backlog but you should now get a response within 24h.  If not, please escalate to “daggre_admin@c4i….”

  • Robert Koslover

    I can’t claim to understand the technical details, but I thought you had previously made very clear that the only way prediction markets can work is if real people bet with real wealth.  Since you apparently aren’t doing that in this combinatorial market thingy that you are discussing here, what exactly are you testing and/or accomplishing?  Is the purpose of the present effort to test software, user-interfaces, computer speeds, and/or various algorithms for their basic functionality, while assuming that the value of any/all of the predictions generated is zero?  And…. if so, is that for the purpose that you’ll be all ready to go if/when prediction markets become legal?  Or something else?

    • While stronger incentives give more accuracy, weaker incentives do not give zero accuracy. And yes, we do need to test software, interfaces, etc.

  • I don’t think there is any question that a combinatorial prediction market model has the potential to yield better predictions for some outcomes.  The question is whether there exists sufficient information completeness to achieve a reasonable (and consistent) prediction. 

    If no one has sufficient information on which to base an accurate prediction, there is no way any market is going to provide the missing information.  We can provide incentives to search for information, but that’s about it.  At least a combinatorial market utilizes as much of the available information as possible.  Still, it may not be enough for the most important questions that we wish to predict.

    • We will usually start the market prices assuming that all variables are independent, and that is how they will stay until a user makes an edit to induce a correlation. It is hard to see how this is worse than a market that implicitly  requires all variables to always remain independent.

      • Daniel

         It is hard to see how this is worse, but it is easy to see that this can still be bad. For example, if the topic is US elections, variables are ridiculously non-independent, and just letting users add correlations until the available treewidth is exhausted can lead to some really weird models, I suspect.

  • Silent Cal

    When will results DAGGRE’s performance be available?

  • daedalus2u

     Aren’t credit default swaps a specific type of prediction market?   A prediction market based on who was going to default?  

    Wasn’t the problem with credit default swaps that once the value of a credit event to those playing with the credit default swaps becomes large compared to the value of the actual credit event, there are strong incentives to game the system, plus the fact that many of the players are running on margin?  

    If you are betting in a prediction market (or any market) using margin, you are not using your “own” money. 

  • Just a thought: until it’s legal to use actual money, you could try some sort of fake currency (like Reddit’s “karma”, for example). Make it a bit of a game, even.

    Then hook it up to social networks, get some word-of-mouth going…

  • Pingback: Overcoming Bias : AI Progress Estimate()

  • Pingback: Overcoming Bias : Wanted: Elite Crowds()

  • Pingback: Overcoming Bias()

  • Pingback: Overcoming Bias : Regulating Self-Driving Cars()

  • Pingback: Overcoming Bias : A Post-Em-Era Hint()

  • Pingback: Overcoming Bias : Compare Institutions To Institutions, Not To Perfection()

  • Pingback: Overcoming Bias : MRE Futures, To Not Starve()

  • Pingback: Overcoming Bias : Prediction Markets Update()

  • Pingback: Overcoming Bias : Markets That Explain, Via Markets To Pick A Best()