# Monthly Archives: February 2008

## Conditional Independence, and Naive Bayes

< ?xml version="1.0" standalone="yes"?> < !DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

Followup toSearching for Bayes-Structure

Previously I spoke of mutual information between X and Y, I(X;Y), which is the difference between the of the joint probability distribution, H(X,Y) and the entropies of the marginal distributions, H(X) + H(Y).

I gave the example of a variable X, having eight states 1..8 which are all equally probable if we have not yet encountered any evidence; and a variable Y, with states 1..4, which are all equally probable if we have not yet encountered any evidence.  Then if we calculate the marginal entropies H(X) and H(Y), we will find that X has 3 bits of entropy, and Y has 2 bits.

However, we also know that X and Y are both even or both odd; and this is all we know about the relation between them.  So for the joint distribution (X,Y) there are only 16 possible states, all equally probable, for a joint entropy of 4 bits.  This is a 1-bit entropy defect, compared to 5 bits of entropy if X and Y were independent.  This entropy defect is the mutual information – the information that X tells us about Y, or vice versa, so that we are not as uncertain about one after having learned the other.

Suppose, however, that there exists a third variable Z.  Z has two states, “even” and “odd”, perfectly correlated to the evenness or oddness of (X,Y).  In fact, we’ll suppose that Z is just the question “Are X and Y even or odd?”

If we have no evidence about X and Y, then Z itself necessarily has 1 bit of entropy on the information given.  There is 1 bit of mutual information between Z and X, and 1 bit of mutual information between Z and Y.  And, as previously noted, 1 bit of mutual information between X and Y.  So how much entropy for the whole system (X,Y,Z)?  You might naively expect that

H(X,Y,Z) = H(X) + H(Y) + H(Z) – I(X;Z) – I(Z;Y) – I(X;Y)

but this turns out not to be the case.

GD Star Rating

## Searching for Bayes-Structure

Followup toPerpetual Motion Beliefs

"Gnomish helms should not function.  Their very construction seems to defy the nature of thaumaturgical law.  In fact, they are impossible.  Like most products of gnomish minds, they include a large number of bells and whistles, and very little substance.  Those that work usually have a minor helm contained within, always hidden away, disguised to appear innocuous and inessential."
— Spelljammer campaign set

We have seen that knowledge implies mutual information between a mind and its environment, and we have seen that this mutual information is negentropy in a very physical sense:  If you know where molecules are and how fast they’re moving, you can turn heat into work via a Maxwell’s Demon / Szilard engine.

We have seen that forming true beliefs without evidence is the same sort of improbability as a hot glass of water spontaneously reorganizing into ice cubes and electricity.  Rationality takes "work" in a thermodynamic sense, not just the sense of mental effort; minds have to radiate heat if they are not perfectly efficient.  This cognitive work is governed by probability theory, of which thermodynamics is a special case.  (Statistical mechanics is a special case of statistics.)

If you saw a machine continually spinning a wheel, apparently without being plugged into a wall outlet or any other source of power, then you would look for a hidden battery, or a nearby broadcast power source – something to explain the work being done, without violating the laws of physics.

So if a mind is arriving at true beliefs, and we assume that the second law of thermodynamics has not been violated, that mind must be doing something at least vaguely Bayesian – at least one process with a sort-of Bayesian structure somewhere – or it couldn’t possibly work.

GD Star Rating

## Perpetual Motion Beliefs

Yesterday’s post concluded:

To form accurate beliefs about something, you really do have to observe it. It’s a very physical, very real process: any rational mind does "work" in the thermodynamic sense, not just the sense of mental effort…  So unless you can tell me which specific step in your argument violates the laws of physics by giving you true knowledge of the unseen, don’t expect me to believe that a big, elaborate clever argument can do it either.

One of the chief morals of the mathematical analogy between thermodynamics and cognition is that the constraints of probability are inescapable; probability may be a "subjective state of belief", but the laws of probability are harder than steel.

People learn under the traditional school regimen that the teacher tells you certain things, and you must believe them and recite them back; but if a mere student suggests a belief, you do not have to obey it.  They map the domain of belief onto the domain of authority, and think that a certain belief is like an order that must be obeyed, but a probabilistic belief is like a mere suggestion.

They look at a lottery ticket, and say, "But you can’t prove I won’t win, right?"  Meaning:  "You may have calculated a low probability of winning, but since it is a probability, it’s just a suggestion, and I am allowed to believe what I want."

Here’s a little experiment:  Smash an egg on the floor.  The rule that says that the egg won’t spontaneously reform and leap back into your hand is merely probabilistic.  A suggestion, if you will.  The laws of thermodynamics are probabilistic, so they can’t really be laws, the way that "Thou shalt not murder" is a law… right?

So why not just ignore the suggestion?  Then the egg will unscramble itself… right?

GD Star Rating

When do people listen to advice?  I teach my health econ students about studies showing no effect from randomized trials giving (or not giving) advice to teens about smoking, to heart attack victims about healthy living, and to new mothers about caring for their low birth weight babies.   Here is a new related result:

Affari Tuoi is the Italian prototype of the television show Deal or No Deal …114 television episodes … with large monetary stakes. When faced with a decision problem in Affari Tuoi, a contestant may seek advice from the audience, which comes in a form of the vote results. While there is a positive trend between contestants’ decisions and advice, this relation is not statistically significant. … When contestants do not have an opportunity to use advice or when the option of advice is available but not used, they make ex post "wrong" decisions in 52.9% and 54.6% of cases respectively. However, when they choose to consult the audience, the fraction of ex post "wrong" decisions decreases to 36.1%. Moreover, … by following advice contestants increase their earnings (Table 1). Subjects make ex post "wrong" decisions in 46.2% of cases when they neglect the advice and only in 30.4% of cases when they follow the advice.

However, the literature does show that in some situations people seem to listen too much to advice:

Schotter (2003) surveys several laboratory studies on advice when nonoverlapping “generations” of subjects play ultimatum and coordination games. In these studies (e.g. Schotter and Sopher, 2004, 2007) subjects often rely on the advice of naïve advice. … who hardly possess more expertise or knowledge than we do.

So why do we not listen sometimes and listen other times?

GD Star Rating
Tagged as:

## The Second Law of Thermodynamics, and Engines of Cognition

Followup toSuperexponential Conceptspace, and Simple Words

The first law of thermodynamics, better known as Conservation of Energy, says that you can’t create energy from nothing: it prohibits perpetual motion machines of the first type, which run and run indefinitely without consuming fuel or any other resource.  According to our modern view of physics, energy is conserved in each individual interaction of particles.  By mathematical induction, we see that no matter how large an assemblage of particles may be, it cannot produce energy from nothing – not without violating what we presently believe to be the laws of physics.

This is why the US Patent Office will summarily reject your amazingly clever proposal for an assemblage of wheels and gears that cause one spring to wind up another as the first runs down, and so continue to do work forever, according to your calculations.  There’s a fully general proof that at least one wheel must violate (our standard model of) the laws of physics for this to happen.  So unless you can explain how one wheel violates the laws of physics, the assembly of wheels can’t do it either.

A similar argument applies to a "reactionless drive", a propulsion system that violates Conservation of Momentum.  In standard physics, momentum is conserved for all individual particles and their interactions; by mathematical induction, momentum is conserved for physical systems whatever their size.  If you can visualize two particles knocking into each other and always coming out with the same total momentum that they started with, then you can see how scaling it up from particles to a gigantic complicated collection of gears won’t change anything.  Even if there’s a trillion quadrillion atoms involved, 0 + 0 + … + 0 = 0.

But Conservation of Energy, as such, cannot prohibit converting heat into work.  You can, in fact, build a sealed box that converts ice cubes and stored electricity into warm water.  It isn’t even difficult.  Energy cannot be created or destroyed:  The net change in energy, from transforming (ice cubes + electricity) to (warm water), must be 0.  So it couldn’t violate Conservation of Energy, as such, if you did it the other way around…

Perpetual motion machines of the second type, which convert warm water into electrical current and ice cubes, are prohibited by the Second Law of Thermodynamics.

The Second Law is a bit harder to understand, as it is essentially Bayesian in nature.

Yes, really.

GD Star Rating

## Leave a Line of Retreat

"When you surround the enemy
Always allow them an escape route.
They must see that there is
An alternative to death."
— Sun Tzu, The Art of War, Cloud Hands edition

"Don’t raise the pressure, lower the wall."
— Lois McMaster Bujold, Komarr

Last night I happened to be conversing with a nonrationalist who had somehow wandered into a local rationalists’ gathering.  She had just declared (a) her belief in souls and (b) that she didn’t believe in cryonics because she believed the soul wouldn’t stay with the frozen body.  I asked, "But how do you know that?"  From the confusion that flashed on her face, it was pretty clear that this question had never occurred to her.  I don’t say this in a bad way – she seemed like a nice person with absolutely no training in rationality, just like most of the rest of the human species.  I really need to write that book.

Most of the ensuing conversation was on items already covered on Overcoming Bias – if you’re really curious about something, you probably can figure out a good way to test it; try to attain accurate beliefs first and then let your emotions flow from that – that sort of thing.  But the conversation reminded me of one notion I haven’t covered here yet:

"Make sure," I suggested to her, "that you visualize what the world would be like if there are no souls, and what you would do about that.  Don’t think about all the reasons that it can’t be that way, just accept it as a premise and then visualize the consequences.  So that you’ll think, ‘Well, if there are no souls, I can just sign up for cryonics’, or ‘If there is no God, I can just go on being moral anyway,’ rather than it being too horrifying to face.  As a matter of self-respect you should try to believe the truth no matter how uncomfortable it is, like I said before; but as a matter of human nature, it helps to make a belief less uncomfortable, before you try to evaluate the evidence for it."

GD Star Rating

## More Referee Bias

Analyzing the neutrality of referees during 12 German premier league (1. Bundesliga) soccer seasons, this paper documents evidence that social forces influence agents’ decisions. Referees, who are appointed to be impartial, tend to favor the home team by systematically awarding more stoppage time in close matches in which the home team is behind. They also favor the home team in decisions to award goals and penalty kicks. Crowd composition affects the size and the direction of the bias, and the crowd’s proximity to the field is related to the quality of refereeing.

That is from Economic Inquiry

GD Star Rating
Tagged as:

## Superexponential Conceptspace, and Simple Words

Followup toMutual Information, and Density in Thingspace

Thingspace, you might think, is a rather huge space.  Much larger than reality, for where reality only contains things that actually exist, Thingspace contains everything that could exist.

Actually, the way I "defined" Thingspace to have dimensions for every possible attribute – including correlated attributes like density and volume and mass – Thingspace may be too poorly defined to have anything you could call a size.  But it’s important to be able to visualize Thingspace anyway.  Surely, no one can really understand a flock of sparrows if all they see is a cloud of flapping cawing things, rather than a cluster of points in Thingspace.

But as vast as Thingspace may be, it doesn’t hold a candle to the size of Conceptspace.

"Concept", in machine learning, means a rule that includes or excludes examples.  If you see the data 2:+, 3:-, 14:+, 23:-, 8:+, 9:- then you might guess that the concept was "even numbers".  There is a rather large literature (as one might expect) on how to learn concepts from data… given random examples, given chosen examples… given possible errors in classification… and most importantly, given different spaces of possible rules.

Suppose, for example, that we want to learn the concept "good days on which to play tennis".  The possible attributes of Days are:

Sky:      {Sunny, Cloudy, Rainy}
AirTemp:  {Warm, Cold}
Humidity: {Normal, High}
Wind:     {Strong, Weak}

We’re then presented with the following data, where + indicates a positive example of the concept, and – indicates a negative classification:

+   Sky: Sunny;  AirTemp: Warm;  Humidity: High;  Wind: Strong.
-   Sky: Rainy;  AirTemp: Cold;  Humidity: High;  Wind: Strong.
+   Sky: Sunny;  AirTemp: Warm;  Humidity: High;  Wind: Weak.

What should an algorithm infer from this?

GD Star Rating

## My Favorite Liar

[the following recounts an exceptionally powerful teaching technique employed by an economics professor of mine at university; teaching fact-checking and skepticism by salting it into the content of his delivery]

One of my favorite professors in college was a self-confessed liar.

I guess that statement requires a bit of explanation.

The topic of Corporate Finance/Capital Markets is, even within the world of the Dismal Science, a exceptionally dry and boring subject matter, encumbered by complex mathematic models and obscure economic theory.

What made Dr. K memorable was a gimmick he employed that began with his introduction at the beginning of his first class:

"Now I know some of you have already heard of me, but for the benefit of those who are unfamiliar, let me explain how I teach. Between today until the class right before finals, it is my intention to work into each of my lectures … one lie. Your job, as students, among other things, is to try and catch me in the Lie of the Day." And thus began our ten-week course.

GD Star Rating
Tagged as: , ,

## Mutual Information, and Density in Thingspace

Continuation ofEntropy, and Short Codes

Suppose you have a system X that can be in any of 8 states, which are all equally probable (relative to your current state of knowledge), and a system Y that can be in any of 4 states, all equally probable.

The entropy of X, as defined yesterday, is 3 bits; we’ll need to ask 3 yes-or-no questions to find out X’s exact state.  The entropy of Y, as defined yesterday, is 2 bits; we have to ask 2 yes-or-no questions to find out Y’s exact state.  This may seem obvious since 23 = 8 and 22 = 4, so 3 questions can distinguish 8 possibilities and 2 questions can distinguish 4 possibilities; but remember that if the possibilities were not all equally likely, we could use a more clever code to discover Y’s state using e.g. 1.75 questions on average.  In this case, though, X’s probability mass is evenly distributed over all its possible states, and likewise Y, so we can’t use any clever codes.

What is the entropy of the combined system (X,Y)?

You might be tempted to answer, "It takes 3 questions to find out X, and then 2 questions to find out Y, so it takes 5 questions total to find out the state of X and Y."

But what if the two variables are entangled, so that learning the state of Y tells us something about the state of X?

GD Star Rating