Category Archives: Bayesian

Rarity Anomalies Remain

Our choices apparently under-weigh rare events when we experience track records, even though we accurately estimate the frequencies of those events.  We over-weigh rare events, however, when we are told their probabilities.  Simple explanations of these anomalies are shot down in a recent Psychological Science:

When making decisions involving risky outcomes on the basis of verbal descriptions of the outcomes and their associated probabilities, people behave as if they overweight small probabilities. In contrast, when the same outcomes are instead experienced in a series of samples, people behave as if they underweight small probabilities. We present two experiments showing that the existing explanations of the underweighting observed in decisions from experience are not sufficient to account for the effect. Underweighting was observed when participants experienced representative samples of events, so it cannot be attributed to undersampling of the small probabilities. In addition, earlier samples predicted decisions just as well as later samples did, so underweighting cannot be attributed to recency weighting. Finally, frequency judgments were accurate, so underweighting cannot be attributed to judgment error. Furthermore, we show that the underweighting of small probabilities is also reflected in the best-fitting parameter values obtained when prospect theory, the dominant model of risky choice, is applied to the data.
GD Star Rating
Tagged as:

The Pascal’s Wager Fallacy Fallacy

Today at lunch I was discussing interesting facets of second-order logic, such as the (known) fact that first-order logic cannot, in general, distinguish finite models from infinite models.  The conversation branched out, as such things do, to why you would want a cognitive agent to think about finite numbers that were unboundedly large, as opposed to boundedly large.

So I observed that:

  1. Although the laws of physics as we know them don't allow any agent to survive for infinite subjective time (do an unboundedly long sequence of computations), it's possible that our model of physics is mistaken.  (I go into some detail on this possibility below the cutoff.)
  2. If it is possible for an agent – or, say, the human species – to have an infinite future, and you cut yourself off from that infinite future and end up stuck in a future that is merely very large, this one mistake outweighs all the finite mistakes you made over the course of your existence.

And the one said, "Isn't that a form of Pascal's Wager?"

I'm going to call this the Pascal's Wager Fallacy Fallacy.

You see it all the time in discussion of cryonics.  The one says, "If cryonics works, then the payoff could be, say, at least a thousand additional years of life."  And the other one says, "Isn't that a form of Pascal's Wager?"

The original problem with Pascal's Wager is not that the purported payoff is large.  This is not where the flaw in the reasoning comes from.  That is not the problematic step.  The problem with Pascal's original Wager is that the probability is exponentially tiny (in the complexity of the Christian God) and that equally large tiny probabilities offer opposite payoffs for the same action (the Muslim God will damn you for believing in the Christian God).

Continue reading "The Pascal’s Wager Fallacy Fallacy" »

GD Star Rating

Share likelihood ratios, not posterior beliefs

When I think of Aumann's agreement theorem, my first reflex is to average.  You think A is 80% likely; my initial impression is that it's 60% likely.  After you and I talk, maybe we both should think 70%.  "Average your starting beliefs", or perhaps "do a weighted average, weighted by expertise" is a common heuristic.

But sometimes, not only is the best combination not the average, it's more extreme than either original belief.

Let's say Jane and James are trying to determine whether a particular coin is fair.  They both think there's an 80% chance the coin is fair.  They also know that if the coin is unfair, it is the sort that comes up heads 75% of the time.

Jane flips the coin five times, performs a perfect Bayesian update, and concludes there's a 65% chance the coin is unfair.  James flips the coin five times, performs a perfect Bayesian update, and concludes there's a 39% chance the coin is unfair.  The averaging heuristic would suggest that the correct answer is between 65% and 39%.  But a perfect Bayesian, hearing both Jane's and James's estimates – knowing their priors, and deducing what evidence they must have seen - would infer that the coin was 83% likely to be unfair.  [Math footnoted.]

Perhaps Jane and James are combining this information in the middle of a crowded tavern, with no pen and paper in sight.  Maybe they don't have time or memory enough to tell each other all the coins they observed.  So instead they just tell each other their posterior probabilities – a nice, short summary for a harried rationalist pair.  Perhaps this brevity is why we tend to average posterior beliefs.

However, there is an alternative.  Jane and James can trade likelihood ratios.  Like posterior beliefs, likelihood ratios are a condensed summary; and, unlike posterior beliefs, sharing likelihood ratios actually works.

Continue reading "Share likelihood ratios, not posterior beliefs" »

GD Star Rating
Tagged as: , ,

Different meanings of Bayesian statistics

I had a discussion with Christian Robert about the mystical feelings that seem to be sometimes inspired by Bayesian statistics.  The discussion originated with an article by Eliezer so it seemed appropriate to put the discussion here on Eliezer's blog.  As background, both Christian and I have done a lot of research on Bayesian methods and computation, and we've also written books on the topic, so in some ways we're perhaps too close to the topic to be the best judge of how a newcomer will think about Bayes.

Christian began by describing Eliezer's article about constructing Bayes’ theorem for simple binomial outcomes with two possible causes as "indeed funny and entertaining (at least at the beginning) but, as a mathematician, I [Christian] do not see how these many pages build more intuition than looking at the mere definition of a conditional probability and at the inversion that is the essence of Bayes’ theorem. The author agrees to some level about this . . . there is however a whole crowd on the blogs that seems to see more in Bayes’s theorem than a mere probability inversion . . . a focus that actually confuses—to some extent—the theorem [two-line proof, no problem, Bayes' theorem being indeed tautological] with the construction of prior probabilities or densities [a forever-debatable issue].

I replied that there are several different points of fascination about Bayes:

Continue reading "Different meanings of Bayesian statistics" »

GD Star Rating
Tagged as:

Beliefs Require Reasons, or: Is the Pope Catholic? Should he be?

In the early days of this blog, I would pick fierce arguments with Robin about the no-disagreement hypothesis.  Lately, however, reflection on things like public reason have brought me toward agreement with Robin, or at least moderated my disagreement.  To see why, it’s perhaps useful to take a look at the newspapers

the pope said the book “explained with great clarity” that “an interreligious dialogue in the strict sense of the word is not possible.” In theological terms, added the pope, “a true dialogue is not possible without putting one’s faith in parentheses.”

What are we to make of a statement like this?

Continue reading "Beliefs Require Reasons, or: Is the Pope Catholic? Should he be?" »

GD Star Rating
Tagged as: , , ,

Worse Than Random

Previously in seriesLawful Uncertainty

You may have noticed a certain trend in recent posts:  I’ve been arguing that randomness hath no power, that there is no beauty in entropy, nor yet strength from noise.

If one were to formalize the argument, it would probably run something like this: that if you define optimization as previously suggested, then sheer randomness will generate something that seems to have 12 bits of optimization, only by trying 4096 times; or 100 bits of optimization, only by trying 1030 times.

This may not sound like a profound insight, since it is true by definition.  But consider – how many comic books talk about “mutation” as if it were a source of power?  Mutation is random.  It’s the selection part, not the mutation part, that explains the trends of evolution.

Or you may have heard people talking about “emergence” as if it could explain complex, functional orders.  People will say that the function of an ant colony emerges – as if, starting from ants that had been selected only to function as solitary individuals, the ants got together in a group for the first time and the ant colony popped right out.  But ant colonies have been selected on as colonies by evolution.  Optimization didn’t just magically happen when the ants came together.

And you may have heard that certain algorithms in Artificial Intelligence work better when we inject randomness into them.

Is that even possible?  How can you extract useful work from entropy?

But it is possible in theory, since you can have things that are anti-optimized.  Say, the average state has utility -10, but the current state has an unusually low utility of -100.  So in this case, a random jump has an expected benefit.  If you happen to be standing in the middle of a lava pit, running around at random is better than staying in the same place.  (Not best, but better.)

A given AI algorithm can do better when randomness is injected, provided that some step of the unrandomized algorithm is doing worse than random.

Continue reading "Worse Than Random" »

GD Star Rating

Lawful Uncertainty

Previously in seriesLawful Creativity

From Robyn Dawes, Rational Choice in an Uncertain World:

"Many psychological experiments were conducted in the late 1950s and early 1960s in which subjects were asked to predict the outcome of an event that had a random component but yet had base-rate predictability – for example, subjects were asked to predict whether the next card the experiment turned over would be red or blue in a context in which 70% of the cards were blue, but in which the sequence of red and blue cards was totally random.

In such a situation, the strategy that will yield the highest proportion of success is to predict the more common event.  For example, if 70% of the cards are blue, then predicting blue on every trial yields a 70% success rate.

What subjects tended to do instead, however, was match probabilities – that is, predict the more probable event with the relative frequency with which it occurred.  For example, subjects tended to predict 70% of the time that the blue card would occur and 30% of the time that the red card would occur.  Such a strategy yields a 58% success rate, because the subjects are correct 70% of the time when the blue card occurs (which happens with probability .70) and 30% of the time when the red card occurs (which happens with probability .30); .70 * .70 + .30 * .30 = .58.

In fact, subjects predict the more frequent event with a slightly higher probability than that with which it occurs, but do not come close to predicting its occurrence 100% of the time, even when they are paid for the accuracy of their predictions…  For example, subjects who were paid a nickel for each correct prediction over a thousand trials… predicted [the more common event] 76% of the time."

(Dawes cites:  Tversky, A. and Edwards, W.  1966.  Information versus reward in binary choice.  Journal of Experimental Psychology, 71, 680-683.)

Do not think that this experiment is about a minor flaw in gambling strategies.  It compactly illustrates the most important idea in all of rationality.

Continue reading "Lawful Uncertainty" »

GD Star Rating

Recognizing Intelligence

Previously in seriesBuilding Something Smarter

Humans in Funny Suits inveighed against the problem of "aliens" on TV shows and movies who think and act like 21st-century middle-class Westerners, even if they have tentacles or exoskeletons.  If you were going to seriously ask what real aliens might be like, you would try to make fewer assumptions – a difficult task when the assumptions are invisible.

I previously spoke of how you don’t have to start out by assuming any particular goals, when dealing with an unknown intelligence.  You can use some of your evidence to deduce the alien’s goals, and then use that hypothesis to predict the alien’s future achievements, thus making an epistemological profit.

But could you, in principle, recognize an alien intelligence without even hypothesizing anything about its ultimate ends – anything about the terminal values it’s trying to achieve?

This sounds like it goes against my suggested definition of intelligence, or even optimization.  How can you recognize something as having a strong ability to hit narrow targets in a large search space, if you have no idea what the target is?

And yet, intuitively, it seems easy to imagine a scenario in which we could recognize an alien’s intelligence while having no concept whatsoever of its terminal values – having no idea where it’s trying to steer the future.

Continue reading "Recognizing Intelligence" »

GD Star Rating

Complexity and Intelligence

Followup toBuilding Something Smarter , Say Not "Complexity", That Alien Message

One of the Godel-inspired challenges to the idea of self-improving minds is based on the notion of "complexity".

Now "complexity", as I’ve previously mentioned, is a dangerous sort of word.  "Complexity" conjures up the image of a huge machine with incomprehensibly many gears inside – an impressive sort of image.  Thanks to this impressiveness, "complexity" sounds like it could be explaining all sorts of things – that all sorts of phenomena could be happening because of "complexity".

It so happens that "complexity" also names another meaning, strict and mathematical: the Kolmogorov complexity of a pattern is the size of the program code of the shortest Turing machine that produces the pattern as an output, given unlimited tape as working memory.

I immediately note that this mathematical meaning, is not the same as that intuitive image that comes to mind when you say "complexity".  The vast impressive-looking collection of wheels and gears?  That’s not what the math term means.

Suppose you ran a Turing machine with unlimited tape, so that, starting from our laws of physics, it simulated our whole universe – not just the region of space we see around us, but all regions of space and all quantum branches.  (There’s strong indications our universe may be effectively discrete, but if not, just calculate it out to 3^^^3 digits of precision.)

Then the "Kolmogorov complexity" of that entire universe – throughout all of space and all of time, from the Big Bang to whatever end, and all the life forms that ever evolved on Earth and all the decoherent branches of Earth and all the life-bearing planets anywhere, and all the intelligences that ever devised galactic civilizations, and all the art and all the technology and every machine ever built by those civilizations…

…would be 500 bits, or whatever the size of the true laws of physics when written out as equations on a sheet of paper.

The Kolmogorov complexity of just a single planet, like Earth, would of course be much higher than the "complexity" of the entire universe that contains it.

Continue reading "Complexity and Intelligence" »

GD Star Rating

Economic Definition of Intelligence?

Followup toEfficient Cross-Domain Optimization

Shane Legg once produced a catalogue of 71 definitions of intelligence.  Looking it over, you’ll find that the 18 definitions in dictionaries and the 35 definitions of psychologists are mere black boxes containing human parts.

However, among the 18 definitions from AI researchers, you can find such notions as

"Intelligence measures an agent’s ability to achieve goals in a wide range of environments" (Legg and Hutter)


"Intelligence is the ability to optimally use limited resources – including time – to achieve goals" (Kurzweil)

or even

"Intelligence is the power to rapidly find an adequate solution in what appears a priori (to observers) to be an immense search space" (Lenat and Feigenbaum)

which is about as close as you can get to my own notion of "efficient cross-domain optimization" without actually measuring optimization power in bits.

But Robin Hanson, whose AI background we’re going to ignore for a moment in favor of his better-known identity as an economist, at once said:

"I think what you want is to think in terms of a production function, which describes a system’s output on a particular task as a function of its various inputs and features."

Economists spend a fair amount of their time measuring things like productivity and efficiency.  Might they have something to say about how to measure intelligence in generalized cognitive systems?

This is a real question, open to all economists.  So I’m going to quickly go over some of the criteria-of-a-good-definition that stand behind my own proffered suggestion on intelligence, and what I see as the important challenges to a productivity-based view.  It seems to me that this is an important sub-issue of Robin’s and my persistent disagreement about the Singularity.

Continue reading "Economic Definition of Intelligence?" »

GD Star Rating