When Error is High, Simplify

We often use Bayesian analysis to identify human biases, by looking for systematic deviations between what humans and Bayesians would believe.  Many, however, are reluctant to accept this Bayesian standard; they prefer to collect more specific criteria about what beliefs are reasonable or justified.    For example, Nicholas Shackel recently commented:

It is no less reasonable, and perhaps more reasonable, to start from the premiss that people do reasonably disagree … and if Bayesianism conflicts with that, so much the worse for Bayesianism.

This choice of Bayesian vs. more specific epistemic judgments is an example of a common choice we face.  We often must choose between a strong “simple” framework with relatively few degrees of freedom, and a weak “complex” framework with many more degrees of freedom.  We see similar choices in law, between a few simple general laws and many complex context-dependent legal judgments. 

We also see similar choices in morality, such as between a simple Utilitarianism and more complex context-dependent moral rules, like that we should distribute basic medicine but not movies equitably with a nation.  In a paper on this moral choice, I used the following figure to make an analogy with Bayesian curve-fitting.


Imagine that one has a collection of data points, such as a sequence of temperatures driven in part by global warming.   In general one thinks of these points as determined both by some underlying trend one wants to understand, and some other distracting “noise” processes that obscures this underlying trend.   

In choosing a curve to describe this underlying trend, one can pick either a complex line which gets close to most points, or a simple line which deviates further from the data.  The Bayesian analysis of curve-fitting says that whether the complex or simple line is better depends in part on how strong is the noise process.   When there is little noise a complex line will extract more useful details about the underlying trend.  But when noise is large, a complex line will mostly just fit the noise, and so will predict new data points badly.   

Returning to the subject of human biases, we have many context-specific intuitions about what beliefs seem reasonable in various contexts.   But we expect those intuitions to be clouded and polluted by error.  If we expect just a little error, our best judgment about epistemic criteria should stay close to those intuitions.   But if we expect a lot of error, we are better off choosing a simple general approach like Bayesian analysis, since the context-dependent details of our intuitions are most likely to reflect error. 

In curve-fitting, if one has enough data one can estimate the error rate by looking at how well some parts of the data can predict other parts.   We might do well to consider a similar exercise to calibrate the error rates in our intuitions about reasonable beliefs.   

Today philosophy, literature, and parts of sociology tend to favor many context-dependent epistemic criteria, while statistics, economics, physics, and computer science tend to prefer simple standard closer-to-Bayesian criteria.  My knee also tends to jerk in this second direction.   

GD Star Rating
Tagged as:
Trackback URL:
  • I just completed a course on approximation of functions, and this reminds me of a common phenomenon there. Consider, for example, a continuous function f and we wish to approximate it by a polynomial function p of degree n. If we suppose that f is unknown but have a set of points from it (as is often the case with functions describing some new physical process), we can choose p such that we can intersect every such point up to n. The trouble is that as we increase the number of interpolation points (and thereby increase n), p oscillates wildly, and unless f is periodic, that’s generally not a good thing. So we can always get a function that better fits the data, but it’s another matter as to whether that function is a better approximation to the unknown function.

  • conchis

    It’s interesting to think how this fits with Philip Tetlock’s work (which has been commented on previously on this blog) suggesting that it’s those who are more willing to err on the side of context specific explanatory models (foxes) who are typically more accurate, compared with those who try to apply simple “one-size-fits-all” frameworks to every problem (hedgehogs). My knee tends to jerk with Robin’s, but the challenge is to figure out when simplfication is likely to be useful and when not. Context-dependence is context dependent. Or something.

  • Conchis, perhaps hedgehogs place excess confidence in the predictions of their single model, while foxes are appropriately uncertain about the implications of their eclectic mix of models? Perhaps in other areas the relative confidence of those with simple and complex models is different?

  • Robin: I understand the issue you are raising like this: it is not one of restricted methodology, that is, when are Bayesian methods appropriate, but the general claim that we ought to be Bayesian believers, or that rational belief is believing in accordance with probabilistic belief as determined by Bayesian methods. What you offer is essentially an argument by analogy.
    1. Given noisy data, it is more truth conducive to use a simpler rather than more complex curve fitting strategy.
    2. Our beliefs are like noisy data
    3. Therefore it is more truth conducive to use a simpler strategy for rationalising our beliefs.
    4. Bayesianism is simpler than other modes of non-deductive reasoning.
    5. Therefore we ought to be Bayesian believers.
    I note that the first premiss conceals a huge and very interesting methodological issue in its own right, and the fourth premiss could bear considerable discussion, but that is not what I want to discuss.

    There are three main points I would make. First, I would suggest that Bayesianism is ill suited to justifying why we should believe the second premiss, since what we need is an argument for why beliefs are relevantly like noisy data. You have offered some reasons along that line, and my point is that you have engaged in standard non-deductive reasoning rather than offered a calculation of its probability. If this point is correct then the conclusion must be false and all we can talk about are which circumstances are those in which Bayesianism is the right guide. But that is not what you want. Secondly, the second premiss looks like a contingent claim, so the conclusion is too strong and could only apply when beliefs are in fact relevantly like noisy data. Finally, the conclusion has to be amplified in terms of justified belief being a matter of belief formed in accordance with Bayesian principles, but what are they and what exactly would that mean for how our beliefs ought to be? As a matter of fact we do not reason by calculating probabilities except when we are reasoning with *full* beliefs about probabilistic propositions. But your Bayesian wishes to apply these principles to our beliefs in general, including a priori beliefs such as philosophical doctrines (I see that Paul has brought up the class of a priori beliefs that are moral beliefs in his querying the use of Bayesianism). The way you have applied this is by formulating general principles of reasoning by reflecting on the outcomes of the mathematical results, e.g. ‘take account of disagreement’. But that, by being only supplemental, is to acknowledge the priority of the standard modes of non-deductive reasoning.

  • Neel Krishnaswami

    As it happens, I am not a (classical) Bayesian, because I don’t see any reason that the requirement that probabilities in a probability distribution must sum to one. If I have a low belief that something is the case, then it doesn’t follow that I have a high belief that it is not the case — if P(X) = 0.1, then it doesn’t follow that P(not X) = 0.9. It can certainly be the case that I don’t have strong beliefs about the subject at all.

    This works perfectly sensibly as a probability theory too. The main change is that instead of building probability theory over a boolean algebra (ie, a model of classical logic), you build it up over a Heyting algebra (ie, a model of constructive logic). The constructive failure of the law of the excluded middle becomes the rule that P(X) + P(not X) <= 1. Now, you can do decision and game theory in a standard way over this nonstandard probability theory, though I haven't analyzed any theorems to see how they decompose constructively. (I need to graduate....) This choice about the appropriate standard of rationality will be motivated by your opinions about the proper foundations of mathematics (eg, constructive or classical). This, in turn, means that you cannot choose the foundations rationally, because how you rationally update your beliefs depends on your beliefs about the foundations.

  • Nick, regarding your three points: I grant that our degree of error may be context dependent, and so the attractiveness of Bayesian analysis may vary with context in that way. I will elaborate in future posts more about how to apply Bayesian analysis to more types of belief.

    Finally, I suggest Bayesian beliefs as a normative standard of reference, not as an exact procedure. So it would be a problem if I could not show you Bayesian arguments that our initial inclinations of beliefs about epistemic criteria are full of error. But it is not problematic that I did not exactly calculate Bayesian probabilities when I formed those beliefs. In fact, we have many kinds of data suggesting l(to a Bayesian) large errors in our beliefs about what kinds of beliefs are reasonable. Is this really in any doubt?

  • Neel, is your concept of rational beliefs integrated into a concept of rational decisions, similar to the way ordinary probabilities are integrated into ordinary decision theory?

  • Neel Krishnaswami

    Hi Robin, the answer is “probably”. I haven’t seen any serious obstacles to turning intuitionistic probability theory into intuitionistic decision theory, though as I mentioned before I haven’t carried this program through in any detail. There are a couple of places where things are likely to go very differently from classical Bayesianism — one in particular is that constraining utility functions to constructive functions will mean that you have a different class of utility functions (relative to classical logic). A second question is the interpretation of conditional probabilities — while you can go ahead and define P(A|B) = P(A and B)/P(B) just as before, it could be a better idea to interpret conditionality as a modal operator in the logic. (That is, the sentences that get assigned probabilities are like A and B, A or B, A|B, A implies B, etc.)

    Finally, there are still computational objections to even intuitionism — interpreted computationally, intuitionistic logic limits you to computable functions, but a strongly finitist view might be that “computable” is still too generous, since no one can actually compute a function that takes (say) hyper-exponential space. While I have a good deal of sympathy for this point of view, I think that the logics here are still to immature to try and base a decision theory on.

  • JMG3Y

    Reading Robin’s paper on what in our ancient brain might be driving our health care choices, from the consumption level to the the health policy and research expenditure allocation level, and other posts on this blog brings to mind a basic (and likely naive) question:

    How strong is the empirical evidence that an understanding of the cognitive problems that can result in decision making errors, such understanding the sources and effects of the many forms of bias, improves an individual’s decision making significantly? Or, instead of improving metacognition, does the evidence show that more benefit comes from improving the process itself? Or both?

    Is this different for group decision making as opposed to invidividual decision making? In other words, what provides the decisions with the least error – design and execution of the process or training the individual members? Would everyone, from individual consumers to national politicians actually make significantly different choices if they better understood what aspect of this?

    A related question. Does a sound understanding of the human learning process, such as it is, improve a learner’s performance significantly? Or should the focus remain on the design of the instruction process itself that in turn drives and controls the learner’s behavior much of the time?

  • JMG3Y, there has been a great deal of research on “debiasing”, attempts to reduce various perceptual and judgmental biases in different ways. I’ve looked at a few of these papers, and it seems that the consensus is that debiasing is extremely difficult and usually doesn’t work. However, it is not usually done simply by explaining the reality of Bayesian inference or probability theory, then turning people lose on problems. Rather, various tricks are used, such as getting them to consider alternatives, or imagine themselves in certain scenarios, or rewording the problems to try to reduce biasing effects. And as I said, usually these don’t help much.

    Tetlock told an amusing story of his debiasing experiment that backfired, in his book I reviewed earlier. He attempted to get participants to explicitly consider a wide range of alternative scenarios in making a forecast, to try to overcome a common bias of focusing too soon in analysis. But his single-minded “hedgehogs” refused to take the scenarios seriously since they thought they already knew exactly what was going to happen; their scores didn’t change. And his open-minded “foxes” wasted so much time delightedly exploring the intricacies of the new scenarios that they lost track of the bigger picture and ended up doing worse in the exercises.

    In general there seems to be something of an unstated assumption that just teaching people Bayesian decision theory would be uselessly abstract; I don’t know if this is due to earlier failed experiments, or perhaps reflects experimenters’ judgment that the theory is too complex for average subjects to grasp.

  • JMG3Y, as Hal notes simple attempts to “debias” usually fail. But anytime someone uses statistical techniques to draw a conclusion, they are implicitly acknowledging that just eye-balling the data would be biased. I’d call that a typically successful attempt to overcome bias.