Surprising Popularity

This week Nature published some empirical data on a surprising-popularity consensus mechanism (a previously published mechanism, e.g., Science in 2004, with variations going by the name “Bayesian Truth Serum”). The idea is to ask people to pick from several options, and also to have each person forecast the distribution of opinion among others. The options that are picked surprisingly often, compared to what participants on average expected, are suggested as more likely true, and those who pick such options as better informed.

Compared to prediction markets, this mechanism doesn’t require that those who run the mechanism actually know the truth later. Which is indeed a big advantage. This mechanism can thus be applied to most any topic, such as the morality of abortion, the existence of God, or the location of space aliens. Also, incentives can be tied to this method, as you can pay people based on how well they predict the distribution of opinion. The big problem with this method, however, is that it requires that learning the truth be the cheapest way to coordinate opinion. Let me explain.

When you pay people for better predicting the distribution of opinion, one way they can do this prediction task is to each look for and report their best estimate of the truth. If everyone does this, and if participant errors and mistakes are pretty random, then those who do this task better will in fact have a better estimate of the distribution of opinion.

For example, imagine you are asked which city is the the capital of a particular state. Imagine you are part of a low-incentive one-time survey, and you don’t have an easy way to find and communicate with other survey participants. In this case, your best strategy may well be to think about which city is actually the capital.

Of course even in this case your incentive is to report the city that most sources would say is the capital. If you (and a few others) in fact know that according to the detailed legal history another city is rightfully the capital, not the city that the usual records give, your incentive is still to go with usual records.

More generally, you want to join the largest coalition who can effectively coordinate to give the same answers. If you can directly talk with each other, then you can agree on a common answer and report that. If not, you can try to use prearranged Schelling points to figure out your common answer from the context.

If this mechanism were repeated, say daily, then a safe way to coordinate would be to report the same answer as yesterday. But since everyone can easily do this too, it doesn’t give your coalition much of a relative advantage. You only win against those who make mistakes in implementing this obvious strategy. So you might instead coordinate to change your group’s answer each day based on some commonly observed changing signal.

To encourage this mechanism to better track truth, you’d want to make it harder for participants to coordinate their answers. You might ask random people at random times to answer quickly, put them in isolated rooms where they can’t talk to others, and ask your questions in varying and unusual styles that make it hard to guess how others will frame those questions. Prefer participants with more direct personal reasons to care about telling related truth, and prefer those who used different ways to learn about a topic. Perhaps ask different people for different overlapping parts and then put the final answer together yourself from those parts. I’m not sure how far you could get with these tricks, but they seem worth a try.

Or course these tricks are nothing like the way most of us actually consult experts. We are usually eager to ask standard questions to standard experts who coordinate heavily with each other. This is plausibly because we usually care much more to get the answers that others will also get, so that we don’t look foolish when we parrot those answers to others. That is, we care more about getting a coordinated standard answer than a truthful answer.

Thus I actually see a pretty bright future for this surprisingly-popular mechanism. I can see variations on it being used much more widely to generate standard safe answers that people can adopt with less fear of seeming strange or ignorant. But those who actually want to find true answers even when such answers are contrarian, they will need something closer to prediction markets.

GD Star Rating
Tagged as: , ,
Trackback URL:
  • Thanks for sharing this! I’m reading Prelec’s papers on the Bayesian Truth Serum now, and they are very interesting. I have two concerns:

    First, in his 2004 paper, Prelec says, “In actual applications of the method, one would not teach respondents the mathematics of scoring or explain the notion of equilibrium. Rather, one would like to be able to tell them that truthful answers will maximize their expected scores, and that in arriving at their personal true answer they are free to ignore what other respondents might say.” This is the same thing we do in experimental studies using the Karni (2009) mechanism. What if all these elaborate mechanisms are too complex for most people to understand, and they just tell us their truthful beliefs because an authority figure told them to?

    Second, I worry there’s a flat maximum problem. If I participate in this mechanism N times, how high does N have to be for the expected payoff of truthful play to be noticeably higher than the expected payoff of playing randomly?

    • Most people know that when an authority tells them to “just tell the truth”, that means the truth that the authority wants to hear.

  • sflicht

    I wonder if this method can be usefully combined with prediction markets.

    It also suggests potentially interesting applications in machine learning. Train lots of cat-recognizing neural nets both on true labels and on one another’s predicted labels. (You’d need to do this recursively, I suppose.) Then on the test data, ask each model for its own prediction and what it thinks the most popular prediction will be. Apply surprisingly-popular and see if it beats voting or other aggregation techniques to get a consensus label from the model ensemble. This is an empirical question, and it’s actually a little surprising to me that the Nature reviewers didn’t ask Prelec to attempt to answer it. (He’s at MIT; it should be easy for him to find and employ machine learning graduate students to do the part he doesn’t know how to do himself.)

    • To get an ML system to predict what others systems will predict, it will need data on what those other systems have predicted in the past in similar situations.

      • sflicht

        Yes but that can be readily arranged. There’s no shortage of cat pictures.

      • Readily but not cheaply. You’d have run many systems in parallel, each learning about a base area but also about each other’s predictions on that area.

    • UWIR

      If you tell me that you’ve created a perfect copy of me in a simulation, and ask me to predict what what it will say is the capital of Nebraska, my best strategy is to simply say whatever I think the capital of Nebraska is. How will training neural network to predict other neural networks be different?

      • sflicht

        The whole point is that the copy is imperfect.

      • UWIR

        But MY point is that the first-order strategy is still for the neural net to simply answer the object-level question to the beset of its ability, and if you think that there is some second-order strategy, it’s incumbent on you to explain why. Even if the copy isn’t perfect, and I know that its answer will vary from mine, unless I have some reason to expect it to vary in particular direction, my best strategy is to simply give whatever answer I think is correct, and figure that any variances among copies of me are clustered around that answer (or, more precisely, all copies of me employing that strategy will cluster around that being the correct strategy).

  • dat_bro06

    The first part of this is the basis of the television show, ‘Family Feud.’ Just sayin’.

  • jhertzli

    This could be used as an argument for Trump’s election.

  • arqiduka

    So, you would agree that the practice of having jurors sit together and deliberate at length with each-other is counter-productive, at least on this account?

    • It has a downside, in addition to the upsides. Hard judgment call to guess overall effect.

    • UWIR

      Compared to what? Have each juror make a separate vote, and if the first vote is inconclusive, have a mistrial?

      • arqiduka

        Why the mistrial part? If the independent votes do not tally to the required quorum, the defendant is not guilty. The required quorum might need to be amended though.

      • UWIR

        The point is you can’t evaluate the value of deliberation other than with respect to the other factors, and what your goals are.

      • arqiduka

        True, even if we ran a long-term controlled experiment it wouldn’t be clear as to what we would be looking for in the resulting data, how to decide which system would have outperformed the other.

  • marshall bolton

    Trying to understand this… In a roomful of liars, how can you get to the “truth”? Everbody is going to be lying. Most are experienced and good at it – but some are not very good at it. Thus the average answers (from oneself and about others) are but skillful lies, whilst the surprises are the amateurs and autistics stumbling on the truth. Ak, yeah: The naked ape does prefer to have clothes on – and even without there are always vested interests.

  • Grant

    This sounds similar to the consensus mechanism used by Its supposed to be decentralized truth-finding, though I suspect its really just finds Shelling points. These should be the truth in situations where finding the truth is not costly, and there is little collusion.

    You can read about the algorithm on page 12:

  • UWIR

    “This mechanism can thus be applied to most any topic, such as the morality of abortion, the existence of God, or the location of space aliens.”

    Nonsense. The first one is not a factual question, and the second is not well formed, not verifiable, is in a completely different reference class, and has only two main answers. Only the last comes close to being a case where this is applicable.

    • The first one is not a factual question…

      It is for a moral realist, and in another posting, Robin reports that the “expert concensus” supports moral realism.

      [Of course, belief in moral realism is actually as absurd as belief in deities.]

  • Stuart Armstrong

    >Of course even in this case your incentive is to report the city that most sources would say is the capital. If you in fact know that according to the detailed legal history another city is rightfully the capital, not the city that the usual records give, your incentive is still to go with usual records.

    I believe that’s incorrect. In fact, from the Nature paper:

    > Imagine that there are two possible worlds, the actual one in which Philadelphia is not the capital of Pennsylvania, and the counterfactual one in which Philadelphia is the capital. It is plausible that in the actual world fewer people will vote yes than in the counterfactual world. This can be formalized by the toss of a biased coin where, say, the coin comes up yes 60% of the time in the actual world and 90% of the time in the counterfactual world. Majority opinion favours yes in both worlds. People know these coin biases but they do not know which world is actual. Consequently, their predicted frequency of yes votes will be between 60% and 90%. However, the actual frequency of yes votes will converge to 60% and no will be the surprisingly popular, and correct, answer.

    So if you’re the only one that knows that a certain city is the true capital, then the method doesn’t work; but if there a reasonable minority that knows that, then the method works.

    • If only a few people knew the answer and everyone else answered randomly, then yes the mechanism would work. But if many people answer randomly and then a large group coordinates to an incorrect answer and a small group coordinates to a correct answer, this mechanism will give and reward the first incorrect answer.

      • Stuart Armstrong

        I think that’s still wrong. The majority wrong answer will have the most answers, but it won’t be surprisingly popular.

        The minority answer will be surprisingly popular, since the majority won’t consider it a likely answer.

  • Stuart Armstrong

    >This mechanism can thus be applied to most any topic, such as the morality of abortion

    That question will reduce to something like: “according to widely shared moral criteria, is abortion moral?”

    • Of course.

      • To which I say: of course not.

        The morality of abortion doesn’t reduce to “widely shared moral criteria” for most ostensible experts. Moral realism is alive and well, however misguided. As I understand him, the founder of Less Wrong is a moral realist, for whom morality doesn’t reduce to “widely shared moral criteria.”

  • Gunnar Zarncke

    I’d like to repost a highly voted comment over from LW about an interesting failure mode of the SP heuristic:

    > whpearson 03 February 2017 08:35:51PM 9 points

    > I wonder how well it would work on questions like.

    > “Does homeopathy cure cancer”.

    > Or in general where there are people in the minority that know the majority won’t side with them, but the majority might not know how many believe the fringe view.