Extraordinary Claims ARE Extraordinary Evidence

There is a common saying used to dismiss surprising claims: "extraordinary claims require extraordinary evidence."  This idea is used to justify holding controversial claims to a higher standard of evidence than uncontroversial claims.   

Now the saying is obviously true in a simple Bayesian sense:  the lower your pre-evidence probability for a claim, the stronger your evidence must be (in likelihood ratio terms) to raise your post-evidence probability above any given threshold.  But this saying can be a misleading way to think about testimonial evidence. 

Consider that in ordinary conversation we commonly believe claims with very low pre-evidence probabilities.  Imagine that I were to tell you that my children had just died in a horrible freak accident involving a cell phone, a plane and a gas truck, or that I would meet you next Tuesday at 8:47am at 11 feet NW of the smaller statue in a certain square.  You would probably just believe me, as you usually believe things I tell you, even though you would have assigned a very low probability to those claims before you heard my statement.   

Are we gullible to believe such unlikely claims without asking for extra evidence?  No; the fact that I make such an extraordinary claim is usually itself extraordinary evidence (with a very high likelihood ratio); I would be very unlikely to make such claims in situations where I did not have good reasons to think them true. 

The times to be more skeptical of unlikely claims are when there is a larger than usual chance that someone would make such a claim even if it were not true.  That is, if there is a kind of "wild" claim and "wild" person, such that this type of person tends to be more rewarded for making this kind of claim, relative to silence or other claims, even when they do not have good reasons to think them true, then we are justified in holding such claims to a higher standard of evidence.   

On the other hand, if there are kinds of claims and types of people such that these people are rewarded less for making such claims, relative to silence or other claims, then we should hold these claims to a lower standard of evidence.   So while we should be extra skeptical of hard to check claims that would bring media attention to media hogs, we should be extra trusting of embarrassing claims from shy people, or of claims that associates will interpret as betrayal or lunacy. 

Since the right standard of evidence depends on the claimer’s incentives, it is appropriate to consider these incentives.  But it is not true in general that extraordinary claims require extraordinary evidence, beyond the extraordinary evidence already embodied in the claims themselves.

Summary:  Be skeptical about any claim people tend to make without enough evidence, but not otherwise skeptical of extraordinary claims.

GD Star Rating
Tagged as:
Trackback URL:
  • I usually hesitate to get into this type of philosophical discussion, but I’ll take a brief crack at some of it anyway.

    You’re example statement, “I would meet you next Tuesday at 8:47am at 11 feet NW of the smaller statue in a certain square.” is really not a factual “claim”. It can’t be “factual”, because I can falsify it by simply failing to attend the asserted meeting.

    What it *is* is an assertion of intent, and is neither objectively true – unless and until you instantiate that intent with action towards the stated goal – nor false – unless you are lying as to your true intent or are thwarted in achieving the stated goal.

    Implicit in the title statement “Extraordinary claims are extraordinary evidence” is a redefinition of terms that have clear, concise meanings not acknowledged. I believe the original popular statement originated with – or was popularized by – Carl Sagan, with reference to science, the scientific method, and (implicitly) fringe beliefs and assertions. Not “intent”. As such, the terms “claims” and “evidence” must be viewed in terms of their rigorous scientific interpretation. On that basis, the articles’ arguments appear to fail – if for no other reason than false context and application.

  • Stuart Armstrong

    Hum, these seem to pose a problem to a strict Bayesian.

    To avoid worrying about intent, consider X saying “I met a friend last Tuesday at 8:47am at 11 feet NW of the smaller statue in a certain square.” It is an extraordinary claim only in that it is one of a vast class of claims that make up the “I was somewhere last Tuesday at 8:47am”, a claim that is certainly true. I can get a probability distribution about where X was, and update it with information about the honesty of X.

    “I have invented an infinite energy machine” is again, part of “I have, or have not, invented an infinite energy machine”, again strictly true. The a priori estimate about whether an infinite energy machine has been invented would be very low. Then it is updated whatever way is appropriate.

    Both of these claims are extraordinary because their a priori inverses are much more likely than them.

    However, suppose I equip my friend X with a GPS receiver, and ask him to state his position precisely at the time. He may choose to tell me whatever degree of precision he desires, to be determined randomly.

    Then the more precise his announcement, the more unlikely I should treat his statement! Given more precise instruments, it certainly seems that we could make the a postiori statement “I was at a certain place,” more unlikely than “I have invented an infinite energy machine” (ignore quantum effects for the moment).

    Yet this seems most unsatisfactory – my friend is either lying or not, and the degree of precision of his lie should not be the issue.

    Am I missing something obvious here? Some measure theoretic issues might resolve this (the universe omega of possibilities being changed after my friend has spoken) but this seems a fudge.

  • Stuart: in order to figure out whether or not your friend is lying, you need three pieces of information:

    1. your prior probability that he is lying
    2. your likelihood distribution for statements given that he is telling the truth, and
    3. your likelihood distribution for statements given that he is lying.

    (3) is the one you didn’t mention in your post.

  • I think the source of any confusion is semantics and rigor of context.

    The statement, “I can get a probability distribution about where X was, and update it with information about the honesty of X.” is semanticly inaccurate because “a probability distribution about where X was” is – in context – false. The accurate wording would be “could have been” rather than “was”.

    “X” either “was” (past tense) at a certain spot, or was not. The probability is (now) either 1 or 0 that that descriptive statement is objectively accurate, independent of X’s honesty.

    Similarly, the claim, “It is an extraordinary claim only in that it is one of a vast class of claims that make up the “I was somewhere last Tuesday at 8:47am”, a claim that is certainly true.” misses the point.

    There are an infinite set of possible claims one can make “now” as to one’s location at a particular instant in the past. But only one of them is objectively true. There is nothing implicitly extraordinary about claiming to have existed at a particular location in the past, ignoring the obvious disqualifying assertions like “I was on the surface of the sun” and the like, and granting that when one asserts one’s past position one does not mean “to the nearest superstring radius”. perceiving an assertion of past location as “perfectly” accurate in every possible way is to ignore context.

  • Seems like what is needed here is a distinction between “extraordinary” and merely “bizarre.” Furthermore, I do not think the usual usage of this argument has to do with a request or claim about future actions, that is this peculiar way of describing (very precisely) where to meet somebody. This latter is just silly or game playing, not “extraordinary.”

    Going back to academic publishing issues, the argument is clearly correct. If by “extraordinary” we mean going contrary to widely accepted or believed ideas, then clearly it is correct to some degree, especially if we substitute the word “strong” or maybe “convincing” rather than “extraordinary” to describe the evidence needed for the claim.

    The flip side is that at least some journals/outlets are positively attracted by arguments that go against the accepted or the mainstream or the ordinary. This is clearly something “new” or “innovative,” and lots of outlets at least claim to want at least the “new,” within some limits. However, it is also clearly the case that the more “extraordinary” the claim, then the stronger had better be the arguments or evidence that supports the claim, if one wants to get published in a reasonably respectable outlet anyway.

  • Paul Gowder

    Doesn’t it run in both directions? We might take the fact that someone makes a wild claim as evidence that they’re a wild person.

  • Robin, You are assuming that scientists incorporate Bayesian priors into their experiments, but is this true? For example, in another post you wrote that “As an undergraduate I helped Riley Newman measure the strength of gravity at short distances.” Presumably Newman was testing whether Newton’s laws hold at very small distances. Did he incorporate Bayesian priors about the probability of Newton’s laws being correct into his calculations or error terms? (If so what priors did he use?) If he did not incorporate Bayesian priors then other people looking at his data should give less weight to extraordinary claims, that is claims which contradict their priors. A

  • I think there’s some confusion here about the basic Bayesian view of probability; I would recommend http://yudkowsky.net/bayes/technical.html at the section beginning “Suppose I flip a coin twenty times.”

    I would say that an “extraordinary claim” is one which violates a previously inducted qualitative generalization – in the most extreme case, it violates an apparently absolutely universal and mathematically precise generalization, such as the laws of physics. It is not merely a small prior probability. A random-looking sequence of 100 coin flips has a prior probability of 2^-100 on the fair coin hypothesis, yet we don’t disbelieve our eyes.

  • Suppose someone flips a coin 100 times, and comes to me and reports on the sequence of outcomes (reading from a list he wrote down). The probability of that sequence occurring is only one in 2^100. But wouldn’t the probability that a person’s brain makes a mistake and misreports a result be substantially greater than one in 2^100? That would be an extraordinary and probably unreachable degree of reliability in any mechanism. So why do we believe his report is probably accurate?

  • rcriii

    I don’t buy this, Robin. First of all the extraordinary claim is not neccessarily evidence of the claim itself. If I say that my dog is purple, the statement tells you nothing of the truth of the claim. It may be evidence that I have a pet, or that I am unreliable, or that I am given to dyeing animals, but the statement is not evidence of it’s own truthfullness. The fact that I am the one making the statement _is_ evidence (one way or the other), but that is not the same thing.

    I also don’t buy your meeting place analogy. The fact that there is a huge number of possible meeting places does not make a meeting less likely. If we are going to meet then we have to meet somewhere. This is the same switch Creationists often pull – grossly inflating the number of conceivable outcomes, then saying “See, it couldn’t possibly be by chance.”

  • Hal:

    A 2^-100 outcome is only evidence that the person is misreporting if, given that the person did make a mistake, the probability of them reporting that specific outcome would be greater than 2^-100.

  • Hal: Even if you think that the person has a per-item error probability in excess of 1%, the particular coin sequence he reports is still the maximum-likelihood hypothesis given that evidence. The report will certainly be helpful – after seeing the report, and even taking into account the possibility of error with an appropriate distribution, you will still assign vastly more probability to the true sequence (whatever it is) than you did before you heard the report.

    To amplify on de Blanc’s remark: The probability of making a mistake is indeed much higher than 2^-100, but this “mistake” probability is spread (unevenly) across many possible evidences you could have seen – it is not all concentrated into the one sequence the person reported. If that sequence were *always* reported whenever someone made a mistake, you would indeed be nearly certain that they had in fact made a mistake (because the probability of seeing it honestly is, after all, only 2^-100).

    So we believe the report is probably accurate because, even conditioning on the hypothesis that the person made a mistake, the probability of seeing that *exact* evidence is still exactly 2^-100. (Unless you know something about the kind of mistakes people make!) So if a person comes to you and reports a long, random-looking coin sequence, your posterior probability that they made “some kind of mistake” should exactly equal your prior probability – the particular coin sequence reported doesn’t tell you anything about it – unless you have knowledge about particular mistakes that people make.

  • [I’ve been away at a conference all day.]

    Biscuit and Barkley, nothing prevents us from formally applying standard Bayesian analysis to future events, even ones that people can influence, and I see no reason not to so apply.

    Stuart, the more precise his statement, the more a priori unlikely it was, but the stronger the evidence of his statement is, and so you still believe him afterward.

    James, my claim here is not about the format scientists use to report their experiments, and I don’t see how that is relevant.

    Eliezer, the analysis I described would apply just as well to a “qualitative generalization” as to any other claim, so I don’t see why that distinction is relevant.

    Rcriii, evidence that might depend on other evidence is still evidence.

  • Robin, my point is that when we have a generalization of improbability – “On any given day, most 35-year-olds don’t die of heart attacks” – then when someone says, “My 35-year-old husband died of a heart attack three days ago”, the improbability of her saying this, if her husband is alive, or if he died on any other particular day, overbalances the prior improbability of the event itself. Thus, in the posterior, we expect that most of the time someone says this, it’s because it happened.

    But when someone says, “Hey, this tennis ball doesn’t obey conservation of angular momentum!” – well, as compared to the woman claiming her husband died, I don’t think it’s particularly less likely that someone would make this claim, given that the claim were true; and I don’t think that someone is particularly more likely to lie about that claim, given that the statement were false. In fact, my prior for the woman lying about her husband, if her husband is alive, is substantially higher than my prior for someone lying about the tennis ball if it does in fact conserve angular momentum. I can’t think of any good motive for someone to lie about the tennis ball. *Nonetheless*, my prior probability for the tennis ball disobeying conservation is so incredibly low that verbal evidence fails to be extraordinary *enough*.

    This is what I mean by violating a qualitative universal generalization – it’s one of the chief circumstances in which the extraordinariness of verbal evidence is just not extraordinary enough.

  • Eliezer, I think what you are saying is that people are particularly likely to falsely claim violations of universal generalizations, compared to other false things they are likely to say. If true my analysis agrees that you should require higher standards of evidence for such claims. But it raises the question of why such false claims are more likely than other false claims.

  • Marc Resnick

    I think there is a piece of the puzzle that you are missing. In this kind of probability estimate, the potential gains and losses an individual would experience also matter. If you are going to meet a friend at whatever address, I can’t think of a reason for you lie, so I will believe you. On the other hand, that email I received this morning about a hot stock tripling by the end of the year is not that improbable as many small companies have the potential to triple. But I believe the statement less because of the benefits to stock pump and dump schemes.

  • Marc, your example is exactly what my analysis supports; because you believe they have a higher incentive to lie, you hold them to a higher standard of evidence.

  • Robin,

    The format scientists use to report their experiments is relevant because it tells you whether you can trust their extraordinary claims.

    For example, imagine that although Newton’s law of gravity has never been tested for objects only X nanometers apart physicists are very confident that it should hold at this distance. Physicists have no objective scientific way of coming up with a good estimate of how confident they are of this, however. As a result, they don’t use Bayesian analysis when evaluating experimental evidence on gravity. (I’m only guessing that these last two sentences are accurate. Please forgive me if they are not.)

    Now imagine that some scientist finds a way of testing gravity at X nanometers. He comes up with evidence showing that Newton’s law of gravity doesn’t hold. He reports his data, however, in a way that doesn’t take into account of Bayesian priors.

    I learn about this data. To properly evaluate it I should use Bayesian priors. Consequently, the more extraordinary the finding, the lower my Bayesian priors of the finding being true, and the more likely I should think that Newton’s law of gravity does hold at X nanometers.

    In short, many scientists don’t properly use Bayesian analysis because they don’t have good means of coming up with priors. (I think.) As a result, consumers of scientific reports should use some Bayesian analysis themselves and be distrustful of extraordinary scientific claims.

  • Doug S.

    It’s not so much that people who make “extraordinary” claims are lying as that they often believe things that are, well, wrong. Perhaps a person visits a “psychic medium” who performs a cold reading, and then tells you that his dead mother told him to follow his dreams. I simply conclude that the person was fooled by the “psychic medium” and was not deliberately being dishonest. It’s not that hard to fool people, since, as you said, when someone says something happened, they usually are saying what they think actually did happen.


    A: “I was abducted by aliens last night!”
    B: “No, you weren’t. You were dreaming and your brain’s mechanism for telling the difference between dreaming and waking failed.”
    A: “But I saw them!”
    B: “Yes, you did, but what you saw wasn’t real. If you read the literature on sleep paralysis and hallucinations, you’d agree with me.”
    A: “What do scientists know, anyway? What happened to me was real!”
    B: “I could continue to argue with you, but it is clear that further discussion will be fruitless because we assign different weight to different kinds of evidence. Your method of weighing evidence is wrong, but I do not believe that I can convince you of that. Goodbye.”
    A: “You’re just one of those stupid scientists who refuses to have an open mind. Goodbye.”

    Sadly, this kind of conversation probably happens all the time…

  • James, now I understand. If they just claimed “here is evidence, take it into account” there would be nothing to be skeptical about. But if they claimed “based on all the evidence so far, this weird effect looks real” then to treat that skeptically you’d have to believe people in that situation tend to be biased to claim more than their evidence justifies.

    Doug, yes, the key problem is that on certain topics people too easily believe strange things. The more clearly we could identify such topics, the better we could adjust the evidence level we require to the topic.

  • Robin, I think you’re misunderstanding Eliezer.

    “people are particularly likely to falsely claim violations of universal generalizations, compared to other false things they are likely to say.”

    This would imply that we should have higher prior probabilities of people lying about some things rather than other things. But Eliezer was saying that our prior probabilities in the things *themselves* should be different. Our belief in Newtonian mechanics should be so much stronger than our belief in a person’s likely whereabouts or the conditions of a friend’s children’s death that verbal evidence (given the same strength in both cases) doesn’t raise our probability estimation of the former enough to where it’s even plausible, let alone probable.

    I confess that I don’t understand how this addresses the coin scenario. Given a coin-flip sequence long enough that the probability of a given sequence being true is roughly the same as the probability of Newtonian mechanics being wrong, how is it that verbal evidence is stronger in the one case than the other? I know you’ve tried to address this Eliezer, but I still don’t get it.

  • Doug S.

    The way I see it, we can justifiably assume that the coin flip sequence is accurate to the extent that we can accurately report coin flip sequences. There would be some errors, but we don’t know what they are; for any given coin flip in the sequence, the reported outcome is still the far more likely to be correct than incorrect, so we would be justified in accepting the sequence as the best approximation to the real sequence that we can find.

  • Robin,

    Upon cogitating upon this further, it strikes me that while in principle Bayesian analysis can be applied here, there will be several further stages or modifiers that will affect the priors, many of which would fall into the context of “framing effects” notorious in behavioral econ and psych, some of which will boil down to details of the presentation of the extraordinary claim.

    So, once in a awhile I show up at GMU seminar. If you were to walk in and with your usual pixieish grin declare that you had just seen a black swan outside walking freely about, I might lower the prior given your history of prankish delight in posing philosophical conundra, and the notoriously widespread use of the existence or nonexistence of black swans in this or that location as a problem among philosophers.

    However, if you were to walk in with blood streaming down your face and a very serious and upset look on your face and declared that you had just been attacked by a black swan that had been walking about on campus (perhaps after annoying it with asking it inane philosophical questions about its presence in front of you on the GMU campus), I might raise the prior, although I would be aware that you might be fully capable of really faking people out for something that you could post on this notorious blog by faking the blood and putting on an act, and so forth. No end to this one, I guess, maybe even an infinite regress into absurdity…

  • Barkley, sure, whether I was grinning or seriously upset would be relevant information for whether you should believe a claim of mine. Bayesian analysis has little problem taking such things into account.

  • Doug, sure, the evidence would raise the probability of that particular coin-flip sequence to higher than it would otherwise be. But would it raise to from 2^-100 to .5? Or to 2^-80?

  • DaveL

    Robin writes: “I would be very unlikely to make such claims in situations where I did not have good reasons to think them true.”

    While that might be true of the examples you gave, and even of the coin-flipping example introduced later, in most real-life situations where the rule of thumb applies, the “good reasons” are often good only in the view of the person making the claim. The person who claims he was abducted by aliens, or was spoken to by God, often does so for the good reason that it will make him feel better about himself, or attract attention, or for any number of other reasons which are only good from his point of view. Some of these reasons may even be held unconsciously.

    So, in general, unless you know the “good reasons” you can’t assume they carry any weight whatsoever.

  • pdfs2ds:

    Let ‘I’ denote the prior information that I flipped a 2-sided fair coin 100 times, and let ‘S’ indicate some string of heads and tails. Then p(S | I) = 2^-100. Also, let’s say you consider it about 50% likely that I will truthfully and correctly report the string that I saw. We’ll let R indicate the event that I come to you and report, ‘the coin-flips produced the string S.’ Then p(R | SI) = 0.5. So after I make the report R, how certain are you that S was the actual string?

    By Bayes’ Theorem: p(S | RI) = p(S | I) * p(R | SI) / p(R | I). We’ve already given values for p(S | I) and p(R | SI), so p(S | RI) = 2^-101 / p(R | I). That thing in the denominator is what many people have been missing in this thread. A maxentropy distribution would give p(R | I) = 2^-100, since there are 2^100 possible reports you could give. Thus p(S | RI) = 2^-101 / 2^-100 = 0.5.

    Notice that (as Eli mentioned earlier), this is exactly equal to p(R | SI). If we knew something about the sorts of lies people tell, then these would not be exactly equal. For instance, you may consider it a priori fairly likely that I would claim to have gotten 100 heads, so when I do in fact make this claim, you don’t consider it very strong evidence. That’s because the denominator would become relatively large.

  • I think my view here can be summed up in one line:

    “Extraordinary claims are always extraordinary evidence but sometimes they are not extraordinary enough.”

    I think people are *less* likely to lie about conservation-violating coffee than about dead relatives – and in fact, I’ve never even heard of such a lie being told. *Nonetheless*, the prior probability is *so much* lower that the claim is no longer evidence *enough*. As the alleged facts themselves become more extraordinary, the claim may become more extraordinary evidence, but it becomes more extraordinary at a rate much less than the facts themselves.

  • Eliezer, yes, conscious lying is less the problem that other mental tendencies to make dramatic claims without enough evidential support.

  • Robin, there are three factors contributing to the posterior probability:

    1) The prior probability of the hypothesis.
    2) The likelihood that someone would make the claim, if the hypothesis were true.
    3) The likelihood that someone would make the claim, if the hypothesis were false.

    By thinking about whether or not someone seems especially likely to lie or self-deceive in a particular case, you’re attending to (3), which is naturally important. But one must also attend to (1), and it is this, more than (3), that leads us to reject e.g. claims of low-level ESP detectable only by statistical means.

    In particular, the proverb “Extraordinary claims require extraordinary evidence” is meant to be invoked when (1) is unusually low, not when (3) is unusually high.

  • Eliezer, my claim is instead that appropriate invocations of the proverb are in fact primarily when your (3) is unusually high. (1) and (3) are correlated to be sure, but it is (3) that drives the problem.

  • Well, I think we’ve located the disagreement, at least.

    Of course (3) has to be tiny *enough* to compensate for a tiny (1). So saying that (3) is “too large” and (1) is “too small” is equivalent.

    However, if we consider the absolute value, then I would hold that “Extraordinary claims require extraordinary evidence” is reflective of an unusually low (1), not an unusually high (3).

    When we find that studies funded by industry sources tend to find conclusions which industry would prefer to be true (negative findings of detrimental effects, positive findings of beneficial effects), we say, “He who pays the piper calls the tune.”

    When someone publishes a paper claiming that human beings can predict an LED driven by thermal noise, we say, “Extraordinary claims require extraordinary evidence.”

  • On one hand, it is a very inspiring reading. On the other hand, would you agree that all these statements how much evidence XY requires to make us certain at a UV level can be expressed much more accurately, e.g. in terms of the formulae behind the Bayesian reasoning (which I don’t particularly like, but OK)?

    I didn’t quite understand in what sense a statement itself is its own evidence.

    A statement about children who died under some unusual circumstances needs some evidence. Of course, the evidence doesn’t have to be directly related to the children and a good track record of someone saying true things can be strong enough.

    Well, I can tell you a real story of this kind that happened to me, and what it means. One hour before my PhD defense, I woke up in my former office where I spent a night ;-), got a shower, and found a projector. Half an hour before the defense started, I read an e-mail in Czech. It was 9 a.m. and the e-mail contained an extraordinary statement that one – and 5 minutes later, two – large airplanes crashed into some rather well-known buildings 50 miles from the place where I defended.

    It was extraordinary and a priori unlikely but I – probably in agreement with your analysis – immediately knew that the message was almost certain. But this belief was still based on some strong evidence, albeit indirect one: the message seemed to be copied from the Czech Press Agency. It sounded strange that the author would fake such a serious message because this exact kind of tough fake messages wasn’t usual with him. And it sounded even more unlikely that the Czech Press Agency would create such a silly black joke. So I immediately decided that it was almost certainly true. And of course, it was.

    On the other hand, I received a lot of similar extraordinary messages from some other sources during different days than 9/11/2001 that I didn’t believe simply because they were extraordinary while the evidence (of the phenomena themselves or the integrity of the messenger) was not. And I was right.

  • Lubos, yes, a statement is not its own evidence, but the fact that a person claims the statement is evidence, even if sometimes weak evidence.

  • Pingback: Overcoming Bias : A Model of Extraordinary Claims()