Followup toArtificial Addition, The Outside View's Domain

Where did I acquire, in my childhood, the deep conviction that reasoning from surface similarity couldn't be trusted?

I don't know; I really don't.  Maybe it was from S. I. Hayakawa's Language in Thought and Action, or even Van Vogt's similarly inspired Null-A novels.  From there, perhaps, I began to mistrust reasoning that revolves around using the same word to label different things, and concluding they must be similar?  Could that be the beginning of my great distrust of surface similarities?  Maybe.  Or maybe I tried to reverse stupidity of the sort found in Plato; that is where the young Eliezer got many of his principles.

And where did I get the other half of the principle, the drive to dig beneath the surface and find deep causal models?  The notion of asking, not "What other thing does it resemble?", but rather "How does it work inside?"  I don't know; I don't remember reading that anywhere.

But this principle was surely one of the deepest foundations of the 15-year-old Eliezer, long before the modern me.  "Simulation over similarity" I called the principle, in just those words.  Years before I first heard the phrase "heuristics and biases", let alone the notion of inside views and outside views.

The "Law of Similarity" is, I believe, the official name for the magical principle that similar things are connected; that you can make it rain by pouring water on the ground.

Like most forms of magic, you can ban the Law of Similarity in its most blatant form, but people will find ways to invoke it anyway; magic is too much fun for people to give it up just because it is rationally prohibited.

In the case of Artificial Intelligence, for example, reasoning by analogy is one of the chief generators of defective AI designs:

"My AI uses a highly parallel neural network, just like the human brain!"

First, the data elements you call "neurons" are nothing like biological neurons.  They resemble them the way that a ball bearing resembles a foot.

Second, earthworms have neurons too, you know; not everything with neurons in it is human-smart.

But most importantly, you can't build something that "resembles" the human brain in one surface facet and expect everything else to come out similar.  This is science by voodoo doll.  You might as well build your computer in the form of a little person and hope for it to rise up and walk, as build it in the form of a neural network and expect it to think.  Not unless the neural network is fully as similar to human brains as individual human brains are to each other.

So that is one example of a failed modern attempt to exploit a magical Law of Similarity and Contagion that does not, in fact, hold in our physical universe.  But magic has been very popular since ancient times, and every time you ban it it just comes back under a different name.

When you build a computer chip, it does not perform addition because the little beads of solder resemble beads on an abacus, and therefore the computer chip should perform addition just like an abacus.

The computer chip does not perform addition because the transistors are "logical" and arithmetic is "logical" too, so that if they are both "logical" they ought to do the same sort of thing.

The computer chip performs addition because the maker understood addition well enough to prove that the transistors, if they work as elementarily specified, will carry out adding operations.  You can prove this without talking about abacuses.  The computer chip would work just as well even if no abacus had ever existed.  The computer chip has its own power and its own strength, it does not draw upon the abacus by a similarity-link.

Now can you tell me, without talking about how your neural network is "just like the human brain", how your neural algorithm is going to output "intelligence"?  Indeed, if you pretend I've never seen or heard of a human brain or anything like it, can you explain to me what you mean by "intelligence"?  This is not a challenge to be leveled at random bystanders, but no one would succeed in designing Artificial Intelligence unless they could answer it.

I can explain a computer chip to someone who's never seen an abacus or heard of an abacus and who doesn't even have the concept of an abacus, and if I could not do this, I could not design an artifact that performed addition.  I probably couldn't even make my own abacus, because I wouldn't understand which aspects of the beads were important.

I expect to return later to this point as it pertains to Artificial Intelligence particularly.

Reasoning by analogy is just as popular today, as in Greek times, and for the same reason.  You've got no idea how something works, but you want to argue that it's going to work a particular way.  For example, you want to argue that your cute little sub-earthworm neural network is going to exhibit "intelligence".  Or you want to argue that your soul will survive its death.  So you find something else to which it bears one single surface resemblance, such as the human mind or a sleep cycle, and argue that since they resemble each other they should have the same behavior.  Or better yet, just call them by the same name, like "neural" or "the generation of opposites".

But there is just no law which says that if X has property A and Y has property A then X and Y must share any other property.  "I built my network, and it's massively parallel and interconnected and complicated, just like the human brain from which intelligence emerges!  Behold, now intelligence shall emerge from this neural network as well!"  And nothing happens.  Why should it?

You come up with your argument from surface resemblances, and Nature comes back and says "So what?"  There just isn't a law that says it should work.

If you design a system of transistors to do addition, and it says 2 + 2 = 5, you can go back and debug it; you can find the place where you made an identifiable mistake.

But suppose you build a neural network that is massively parallel and interconnected and complicated, and it fails to be intelligent.  You can't even identify afterward what went wrong, because the wrong step was in thinking that the clever argument from similarity had any power over Reality to begin with.

In place of this reliance of surface analogies, I have had this notion and principle - from so long ago that I can hardly remember how or why I first came to hold it - that the key to understanding is to ask why things happen, and to be able to walk through the process of their insides.

Hidden or openly, this principle is ubiquitously at work in all my writings.  For example, take my notion of what it looks like to "explain" "free will" by digging down into the causal cognitive sources of human judgments of freedom-ness and determination-ness.  Contrast to any standard analysis that lists out surface judgments of freedom-ness and determination-ness without asking what cognitive algorithm generates these perceptions.

Of course, some things that resemble each other in some ways, resemble each other in other ways as well.  But in the modern world, at least, by the time we can rely on this resemblance, we generally have some idea of what is going on inside, and why the resemblance holds.

The distrust of surface analogies, and the drive to find deeper and causal models, has been with me my whole remembered span, and has been tremendously helpful to both the young me and the modern one.  The drive toward causality makes me keep asking "Why?" and looking toward the insides of things; and the distrust of surface analogies helps me avoid standard dead ends.  It has driven my whole life.

As for Inside View vs. Outside View, I think that the lesson of history is just that reasoning from surface resemblances starts to come apart at the seams when you try to stretch it over gaps larger than Christmas shopping - over gaps larger than different draws from the same causal-structural generator.  And reasoning by surface resemblance fails with especial reliability, in cases where there is the slightest motivation in the underconstrained choice of a reference class.

New Comment
33 comments, sorted by Click to highlight new comments since: Today at 4:56 AM

The fact that you have felt something your whole life is not to the rest of us much of a reason to believe it. Nor do a few examples of analogy failures give much of a reason. Yes of course it is great to try to understand things as much as possible, but noticing and paying attention to similarities is a great tool for doing that. And what to you may seem "surface" similarities can to others be understood as causal structural similarities if they understand abstractions that you do not.

Robin Hanson: I don't think that's what he's getting at. Yes, surface similarities are correlated with structural similarities, or mathematical similarities (I know of a guy who found a couple big research papers towards his astrophysics PhD via a colleage's analogy between gravitational and electromagnetic waves), but they show up so often under other circumstances that it is meet to be suspicious of them. The outside view works really well for Christmas shopping, essay writing, program development, and the like because it is obvious that the structural similarities are present.

You can't reason only from superficial similarities, but the hypothesis that similarities come from similar causal structure often happens to be true, so it has good chances a priori, unless you can also see the indication that causal structure is different, which is so in your examples. The error is in attributing the fundamental role to similarities and forging about everything else, but similarities also have their place.

In my post History of Transition Inequality I merely listed the pattern in time I saw, but then in my post Outside View of Singularity I tried to interpret that pattern in terms of deeper causes. But Eliezer keeps saying it is all just surface similarity. This suggests to me that he just doesn't understand, or even appreciate the existence of, the kind of abstractions I was using to understand these events.

You can always criticize an outside view by saying that it is just a surface similarity, and that a proper inside view analysis will be superior because it takes more information into account. And yet we know that most inside view analyses are not in fact better than outside views (or at least, so claim the social scientists), because this methodology is such an invitation to bias. Now it may be the case that for Eliezer, inside view analysis works. Maybe he can overcome the biases which afflict most such efforts. My concern is that others here may be tempted to follow his example, unsuccessfully.

Now can you tell me, without talking about how your neural network is "just like the human brain", how your neural algorithm is going to output "intelligence"? Indeed, if you pretend I've never seen or heard of a human brain or anything like it, can you explain to me what you mean by "intelligence"? This is not a challenge to be leveled at random bystanders, but no one would succeed in designing Artificial Intelligence unless they could answer it.

Does anyone currently working on AI pass this test?

Do you have an estimate of how much work it would take after reaching that point, to build a human-level AI?

When did I start to distrust supposed knowledge from the perceptual to the conceptual? When I recognized that when I perceive a tree my head doesn't explode because it didn't duplicate it. Aha, if I'm to trust any knowledge, including provisional, I can trust abstractions. Indeed, that's what the tree is in my brain at a significant etiological terminus. If I'm to recognize other trees and to be able to talk about them, my perceptual apparatus had better be able to abstract relevant tree-properties from trees automatically. The principle: Perceptual 2-D maps are not intrinsically superior to conceptual n-D maps. Hanson makes this sort of point very clear.

For a more sophisticated theory of analogical reasoning, you should read Dedre Gentner's papers. A good starting point is The structure-mapping engine: Algorithm and examples. Gentner defines a hierarchy of attributes (properties of entities; in logic, predicates with single arguments, P(X)), first-order relations (relations between entities; in logic, predicates with two or more arguments, R(X,Y)), and higher-order relations (relations between relations). Her experiments with children show that they begin reasoning with attributional similarity (what you call "surface similarities"); as they mature, they make increasing use of first-order relational similarity (what you call "structural similarity"); finally, they begin using higher-order relations, especially causal relations. This fits perfectly with your description of your own childhood. See Language and the career of similarity.

I can certainly agree that you rely on this sort of reasoning a lot. But I don't think what you do is much of an improvement over what you're criticizing. You just take words and make "surface analogies" with "cognitive algorithms." The useful thing about these "cognitive algorithms" is that, being descriptions of "deep causes" (whatever those are) rather than anything we know to actually exist in the world (like, say, neurons), you can make them do whatever you please with total disregard for reality.

Saying that a neural network never gets at "intelligence" is little different from saying the descriptions of biology in textbooks never capture "life." Without a theory of "life" how will we ever know our biological descriptions are correct? The answer is as blatantly obvious as it is for neural networks by comparing them to actual biological systems. We call this "science." You may have heard of it. Of course, you could say, "What if we didn't have biology to compare it too, how then would you know you have the correct description of life?" But... well, what to say about that? If there were no biology nobody would talk about life. Likewise, if there were no brains, nobody would be talking about intelligence.

"I built my network, and it's massively parallel and interconnected and complicated, just like the human brain from which intelligence emerges! Behold, now intelligence shall emerge from this neural network as well!"

Who actually did this? I'm not aware of any such effort, much less it being a trend. Seems to me that the "AI" side of neural networks is almost universally interested in data processing properties of small networks. Larger more complex network experiments are part of neuroscience (naive in most cases but that's a different topic). I don't think anybody in AI or brain research ever thought their network was or would or could be "intelligent" in the broad sense you are implying.

The truth of the matter is that all comparisons are superficial. We might just as easily say that they're all deep - it makes no difference.

Our sensory apparati produce streams of data that our nervous system finds patterns in, and then it finds correlations between the patterns. That is how we attempt to represent the world, and by determining how our model produces new states, we attempt to anticipate how the world will act.

At no point do we go beyond making comparisons based on some limited number of similarities. That is ALL we ever do. Eliezer has stated that he learned to be wary of using one word to refer to different things, then concluding that they are the same - when has he learned to be wary of calling one thing by different names, then concluding that they are different?

"But there is just no law which says that if X has property A and Y has property A then X and Y must share any other property."

"X & Y both have properties A & B" is logically simpler than "X & Y have property A, X has B, and Y does not have B"

So if X and Y share property A, and X has B, this is evidence, by Ockham's razor, that Y has property B.

Unknown, what is the function you are using that takes pairs of logical statements as arguments and outputs their simplicity ordering?

Yes, it is simpler for X and Y to be identical if they share one property; therefore, by Occam's razor, X and Y are identical.

I doubt it particularly matters which precise measure of simplicity I use, probably any reasonable measure will do. Consider the same with one hundred properties: X has properties 1 through 100. If Y has properties 12, 14, 15, 27, 28, 29, 43, 49, 62, 68, 96, and 100, but no others, then it will take more bits to say which properties X and Y have, than the number of bits it will take to specify that X and Y share all the same properties.

Of course, this seems to support Guest's argument; and yes, once we see that X and Y share a property, the simplest hypothesis is that they are the same. Of course this can be excluded by additional evidence.

this is evidence, by Ockham's razor

See Ockham's Razor is Dull.

The Razor suggests that the system with the fewest assumptions, that is also compatible with the known data, is best.

Saying that X is identical to Y is a very strong assertion - it implies a great deal about their full sets of properties. Saying that X is Y is not necessary, given that they are known to share one property. All we can say AND SUPPORT is that X might be Y. The two could easily be distinctly different and still share one property.

The Razor does not indicate that asserting X is Y is justified in this case.

This quote is in reference to a supposed split in the http://en.wikipedia.org/wiki/Dorsal_stream The first was a 'coordinate system' that preserved metric information. "Therefore, another type of spatial description of the object’s geometry seems more useful, one that preserves invariant spatial information by ignoring metric information. Indeed, many contemporary theories of object identification (e.g., Marr, 1982; Biederman, 1987) assume that objects are represented using such structural descriptions, which use abstract types of spatial representations to specify relations among parts (e.g., top-of, end-to-middle-connected, left-side-connected)." A quote from a book on MBTI. Describes something called http://greenlightwiki.com/lenore-exegesis/Introverted_Thinking "When we use Thinking in an Introverted way, we get a mental image of the logical relationships in an entire system. For example, if we're crocheting an initial into a sweater, we're likely to draw a picture rather than work out the logical relationships analytically." MBTI isn't neuroscience but it's managed to hit on important distinctions in how people think.

Anyways, a problem you'll see with students in math is that they don't realize that the basics of logic/mathematical reasoning are SPATIAL. They see math formulas and think that's all there is to it. In fact, this was my problem. I could do fine with philosophical verbal reasoning, but I would end up asking a bunch of questions and create a bunch of possibilities. I never realized the importance of what Eliezer calls "Constraining the search-space". Turns out it's not the words that are important, it's the "spatial idea" it stands for. It took a LONG time for me to see that. I started to realize it from reading Eliezer's writings specifically "I keep emphasizing the idea that evidence slides probability because of research that shows people tend to use spatial intutions to grasp numbers. In particular, there's interesting evidence that we have an innate sense of quantity that's localized to left inferior parietal cortex - patients with damage to this area can selectively lose their sense of whether 5 is less than 8, while retaining their ability to read, write, and so on. (Yes, really!) The parietal cortex processes our sense of where things are in space (roughly speaking), so an innate "number line", or rather "quantity line", may be responsible for the human sense of numbers. This is why I suggest visualizing Bayesian evidence as sliding the probability along the number line; my hope is that this will translate Bayesian reasoning into something that makes sense to innate human brainware. (That, really, is what an "intuitive explanation" is.) For more information, see Stanislas Dehaene's The Number Sense." Eliezer talks about this all the time, whether it's the importance of anticipating an experience, or replacing the symbol with the substance.

The other thing I want to draw attention to is http://greenlightwiki.com/lenore-exegesis/IntrovertedIntuition?version=66 "p. 225: "For INJs, patterns aren't 'out there' in the world, waiting to be discovered. They're part of us--the way we make sense of the riot of energy and information impinging on our systems. A disease syndrome is a useful construct, but that's all it is--an aggregate of observations attached to a label, telling us what to see and how to deal with it."" "p. 234: "For INJs, truth isn't about logic. Truth is a frame of reference, a way of organizing information, which serves one set of needs or another."" When Robin Hanson talks about "meta" this or that, or talks about frameworks. It seems that he is thinking in an Introverted Intuition sort of way. It's much more verbal. Eliezer also talks about this in http://yudkowsky.net/bayes/truth.html , specifically "Inspector Darwin looks at the two arguers, both apparently unwilling to give up their positions. "Listen," Darwin says, more kindly now, "I have a simple notion for resolving your dispute. You say," says Darwin, pointing to Mark, "that people's beliefs alter their personal realities. And you fervently believe," his finger swivels to point at Autrey, "that Mark's beliefs can't alter reality. So let Mark believe really hard that he can fly, and then step off a cliff. Mark shall see himself fly away like a bird, and Autrey shall see him plummet down and go splat, and you shall both be happy.""

Robin doesn't see the PRIMACY of spatial relationships to reasoning. So when Eliezer talks about surface analogies compared to deep causal STRUCTURE (spatial) Robin just sees it as a frame of reference, instead of something invariant across frames of reference. I hope some of this made sense.

I doubt it particularly matters which precise measure of simplicity I use, probably any reasonable measure will do.

You're too easy on yourself.

It's highly debatable whether it matters which measure of simplicity is in use, and it definitely matters that the one you use is precisely specified. Otherwise, I have nothing more to go on than, "Unknown said so" -- and I can't generalize to see if its use makes sense (satisfies some optimality criterion) beyond the limited case you're arguing for.

Yeesh, that last comment of mine was poorly written. Let me try again.

It's highly debatable whether it matters which measure of simplicity is in use. It definitely matters that the one you use is not precisely specified. As it is, I have nothing more to go on than, "Unknown said so" -- and I can't generalize to see if its use makes sense (satisfies some optimality criterion) beyond the limited case you're arguing for.

Cyan: "Minimum description length" works for English and probably most other languages as well, including abstract logical languages. Increase the number of properties enough, and it will definitely work for any language.

Caledonian: the Razor isn't intended to prove anything, it is intended to give an ordering of the probability of various accounts. Suppose we have 100 properties, numbered from one to a hundred. X has property #1 through #100. Y has property #1. Which is more likely: Y has properties #1 through #100 as well, or Y has property #1, all prime numbered properties except #17, and property #85. I think it is easy enough to see which of these is simpler and more likely to be true.

Peter Turney: the argument for the Razor is that on average, more complicated claims must be assigned a lower prior probability than simpler claims. If you assign prior probabilities at all, this is necessary on average, no matter how you define simplicity. The reason is that according to any definition of simplicity that corresponds even vaguely with the way we use the word, you can't get indefinitely simpler, but you can get indefinitely more complicated. So if all your probabilities are equal, or if more probable claims, on average, are more probable than simpler claims, your prior probabilities will not add to 1, but to infinity.

X has property #1 through #100. Y has property #1. Which is more likely: ...

As I understand it, this is an example of the fallacy of the excluded middle.

After all, I could make up my own comparison case: Which is more likely: Y has properties #1 through #100 as well, or Y just has property #1 ? and come to the opposite conclusion that you have drawn.

The point being that you have to compare the case "Y has properties 1 through 100" with all other potential possible values of Y. and there's no reason that you happen to necessarily have in your hands a Y that is actually also an X.

Unknown, MDL works for me. (I think your 100-property example works better in terms of MDL than your 2-property example.)

It seems to me that the justification for taking an outside view is some kind of probabilistic exchangeability. In fact, exchangeability seems to me to be the key concept grounding an outside view, and I'm surprised no one else has brought that concept up.

When the events being predicted are rare and complicated, I have a hard time seeing an outside view as justified. I don't see any justification for an inside view, either.

Quite the strawman you're attacking here, Eliezer. Where are all these AI researchers who think just tossing a whole bunch of (badly simulated) neurons into a vat will produce human-like intelligence?

There are lots of people trying to figure out how to use simulated neurons as building blocks to solve various sorts of problems. Some of them use totally non-biological neuron models, some use more accurate models. In either case, what's wrong with saying: "The brain uses this sort of doohickey to do all sorts of really powerful computation. Let's play around with a few of them and see what sort of computational problems we can tackle."

Then, from the other end, there's the Blue Brain project, saying "let's build an accurate simulation of a brain, starting from a handful on neurons and working our way up, making sure at every step that our simulation responds to stimulation just like the real thing. Maybe then we can reverse-engineer how the brain is doing its thing." When their simulations deviate from the real thing, they run more tests on the real thing to figure out where they're going wrong. Will they succeed before someone else builds an AI and/or funding runs out? Maybe, maybe not; but they're making useful contributions already.

Elizier: "the data elements you call 'neurons' are nothing like biological neurons. They resemble them the way that a ball bearing resembles a foot."

A model of a spiking neuron that keeps track of multiple input compartments on the dendrites and a handful of ion channels is accurate enough to duplicate the response of a real live neuron. That's basically the model that Blue Brain is using. (Or perhaps I misread your analogy, and you're just complaining about your terrible orthopedic problems?)

I'm not saying that neurons or brain simulation are the One True Way to AI; I agree that a more engineered solution is likely to work first, mostly because biological systems tend to have horrible interdependencies everywhere that make them ridiculously hard to reverse-engineer. But I don't think that's a reason to sling mud at the people who step up to do that reverse engineering anyway.

Eh, I guess this response belongs on some AI mailing list and not here. Oh well.

To those crying "Strawman":

I cite:

The "Artificial Development" AI project. This is how they sounded in 2003. They sound more biologically realistic now, and sounded even less biologically realistic when they first came out; note that they got millions of dollars in funding at that time.

To me even their latest version still sounds like neurovoodoo. What kind of work is this system going to do and why?

And neurovoodoo really was quite popular in the history of AI, once upon a time, especially the 80s, as best I understand that history.

Correction: in my last comment it should have been "if more complex claims, on average, are more probable than simpler claims," not "if more probable claims, on average, are more probable than simpler claims".

Caledonian: the Razor isn't intended to prove anything, it is intended to give an ordering of the probability of various accounts.
Yes, I know.

As has already been stated, you're using it improperly. The Razor does not lead to the conclusion that it is more probable that two things which share a property are identical than not. That is leaping to a conclusion not justified by the available data.

Weak assertions require little justification, strong assertions more. Given that two things share one property, there are many ways this could be the case without their being identical. The claim that they are identical is specific, precise, and excludes a vast amount of possibility space, and it needs strong support. That support is lacking.

Caledonian, I didn't say that the Razor leads to the conclusion that "it is more probable that two things which share a property are identical than not." The Razor leads to the conclusion that "the two things are identical" is more likely than some other specific hypothesis that they are not identical in some specific way.

There are of course an infinite number of ways in which two things can fail to be identical, so in order to compare the probability that the two are identical with the probability that they are not, we have to sum the probabilities for all the ways they could fail to be identical; and thus the conclusion will be that they are more likely not identical than identical, as you correctly stated.

If you look back, though, you will see that I never said anything opposed to this anyway.

Saying that two things are identical is saying that they are the same in every specific way. This is harder to demonstrate than showing that two things are dissimilar in one specific way.

You are using the Razor incorrectly. I don't care what you think you've said, I care what you've actually said - and what you've actually said is the following:

once we see that X and Y share a property, the simplest hypothesis is that they are the same
That is a false claim.

Elizier: To those crying "Strawman" ... I cite the "Artificial Development" AI project. [also, neurovoodoo in the 80s]

Ok, that's fair. You're right, there are delusional people and snake oil salesmen out there, and in the 80s it seemed like that's all there was. I interpreted your post as a slam at everybody who was simulating neurons, so I was responding in defense of the better end of that spectrum.

Jeff, there are many fine AIfolk out there who understand that gradient descent is not magic and is good for some things but not others; and many others actively engaged in non-mysterious biological modeling with intent to understand neurons or solve specific local tasks. Both of these groups have nothing but respect from me.

+1 for "science by voodoo doll".

This has aged poorly:

you can't build something that "resembles" the human brain in one surface facet and expect everything else to come out similar.  This is science by voodoo doll.  You might as well build your computer in the form of a little person and hope for it to rise up and walk, as build it in the form of a neural network and expect it to think.  Not unless the neural network is fully as similar to human brains as individual human brains are to each other.

So that is one example of a failed modern attempt to exploit a magical Law of Similarity and Contagion that does not, in fact, hold in our physical universe.  But magic has been very popular since ancient times, and every time you ban it it just comes back under a different name.

...

But there is just no law which says that if X has property A and Y has property A then X and Y must share any other property.  "I built my network, and it's massively parallel and interconnected and complicated, just like the human brain from which intelligence emerges!  Behold, now intelligence shall emerge from this neural network as well!"  And nothing happens.  Why should it?