Abstraction, Not Analogy

I’m not that happy with framing our analysis choices here as “surface analogies” versus “inside views.” More useful, I think, to see this as a choice of abstractions.  An abstraction neglects some details to emphasize others.  While random abstractions are useless, we have a rich library of a useful abstractions, tied to specific useful insights.

For example, consider the oldest known tool, the hammer.  To understand how well an ordinary hammer performs its main function, we can abstract from details of shape and materials.  To calculate the kinetics energy it delivers, we need only look at its length, head mass, and recoil energy percentage (given by its bending strength).  To check that it can be held comfortably, we need the handle’s radius, surface coefficient of friction, and shock absorption ability.  To estimate error rates we need only consider its length and head diameter.

For other purposes, we can use other abstractions:

  • To see that it is not a good thing to throw at people, we can note it is heavy, hard, and sharp.
  • To see that it is not a good thing to hold high in a lightning storm, we can note it is long and conducts electricity.
  • To evaluate the cost to carry it around in a tool kit, we consider its volume and mass.
  • To judge its suitability as decorative wall art, we consider its texture and color balance.
  • To predict who will hold it when, we consider who owns it, and who they know.
  • To understand its symbolic meaning in a story, we use a library of common hammer symbolisms.
  • To understand its early place in human history, we consider its easy availability and frequent gains from smashing open shells.
  • To predict when it is displaced by powered hammers, we can focus on the cost, human energy required, and weight of the two tools.
  • To understand its value and cost in our economy, we can focus on its market price and quantity.
  • [I’m sure we could extend this list.]

Whether something is “similar” to a hammer depends on whether it has similar relevant features. Comparing a hammer to a mask based on their having similar texture and color balance is mere “surface analogies” for the purpose of calculating the cost to carry it around, but is a “deep inside” analysis for the purpose of judging its suitability as wall art.  The issue is what abstractions are how useful for what purposes, not what features are “deep” vs. “surface.”

Minds are so central to us that we have an enormous range of abstractions for thinking about them.  Add that to our abstractions for machines and creation stories, and we have a truly enormous space of abstractions for considering stories about creating machine minds.  The issue isn’t so much whether any one abstraction is deep or shallow, but whether it is appropriate to the topic at hand.

The future story of the creation of designed minds must of course differ in exact details from everything that has gone before.  But that does not mean that nothing before is informative about it.  The whole point of abstractions is to let us usefully compare things that are different, so that insights gained about some become insights about the others.

Yes when you struggle to identify relevant abstractions you may settle for analogizing, i.e., attending to commonly-interesting features and guessing based on feature similarity.  But not all comparison of different things is analogizing.  Analogies are bad not because they use “surface” features, but because the abstractions they use do not offer enough relevant insight for the purpose at hand.

I claim academic studies of innovation and economic growth offer relevant abstractions for understanding the future creation of machine minds, and that in terms of these abstractions the previous major singularities, such as humans, farming, and industry, are relevantly similar.  Eliezer prefers “optimization” abstractions.  The issue here is evaluating the suitability of these abstractions for our purposes.

GD Star Rating
Tagged as:
Trackback URL:
  • Jef Allbright

    It may be useful to consider abstraction as compression.

    One scheme aims to trade detail for salience by a fractal encoding of regularities within the domain of interest. Intelligence is in the encoding. Coherence is maximized over the entire context.

    Another scheme aims to trade detail for salience by selection of features according to an external function, call it Archimedes_and_his_Lever. Intelligence is in Archimedes. Coherence is maximized over a selected context.

    Within a given domain, these two approaches will tend to perform equally well for description. One will tend to perform significantly better than the other for prediction.

  • (So at first I wondered if I was just tired because I could read the words in the preceding comment but not the sentences; and then I saw “Jef Allbright” and it all became clear.)

    The dawn of life, considered as a complete event, could not have had its properties predicted by similarity to any other complete event before it.

    But you could, for example, have dropped down to modeling the world on the level of atoms, which would go on behaving similarly to all the other atoms ever observed. It’s just that the compound of atoms wouldn’t behave similarly to any other compound, with respect to the aspects we’re interested in (Life Go FOOM).

    You could say, “Probability is flowing between regions of pattern-space, the same as before; but look, now there’s a cycle; therefore there’s this new thing going on called search.” There wouldn’t be any search in history to analogize to, but there would be (on a lower level of granularity) patterns giving birth to other patterns: stars to planets and the like.

    Causal modeling can tell us about things that are not similar in their important aspect to any other compound thing in history, provided that they are made out of sufficiently similar parts put together in a new structure.

    I also note that referring to “humans, farming, and industry” as “the previous major singularities” is precisely the issue at hand – is this an abstraction that’s going to give us a good prediction of “self-improving AI” by direct induction/extrapolation, or not?

    I wouldn’t begin to compare the shift from non-recursive optimization to recursive optimization to anything else except the dawn of life – and that’s not suggesting that we could do inductive extrapolation, it’s just a question of “How large an event”? There isn’t anything directly similar to a self-improving AI, in my book; it’s a new thing under the Sun, “like replication once was” but not at all the same sort of hammer – if it was, it wouldn’t be a new thing under the Sun.

  • Tim Tyler

    I wouldn’t begin to compare the shift from non-recursive optimization to recursive optimization to anything else except the dawn of life

    That doesn’t help much in understanding it, though. Fortunately, there are other approaches. E.g. adopting the meme’s eye view:

    The first objective was to make room for themselves in human brains. They did this by rewarding the humans with more space for memes with increased genetic fitness. Memes for language, music and fashion were probably mainly responsible for this. The result was 5 million years of steadily-expanding cranial capacity – which resulted in much more space for the memes.

    The next step was to increase human numbers – since the more humans there are, the more memes there are. Agricultural memes allowed humans to form closer symbiotic relationships with plants, animals and each other, which boosted their fitness, increased their numbers, and massively increased the population of memes.

    The next problem was meme transmission fidelity. At this early stage, memes were copied verbally, and by behavioural imitation – neither of which provided much in the way of copying fidelity. Environmental inheritance proved to be the answer here – by inventing the idea of writing memes could persist unaltered across extended periods of time, without fear of mental mutation.

    Then there was the copying speed problem. Transcribing documents by hand was slow and tedious. However, the invention of mechanical printing presses allowed machines to take over this task from humans, resulting in vastly wider distribution of memes.

    However, many memes often still need the consent of a human brain to get copied – an obvious bottleneck. The afflicted memes are currently busy sorting this issue out. Computer viruses skip over the human brain completely – but they are nasty parasites. Superintelligent machines will copy memes with the full consent of society. At that stage, the memes won’t be dependent on humans any more.

    Here, superintelligent machines are seen as the last step in a series that goes: Big brains -> language -> society -> writing -> publishing -> SI.

  • John Maxwell

    Is it worth to distinguish between two types of self-improvement for an AI? One type is hardware improvements: the AI learns how to make the hardware that it is operating on run faster. This would be an extension of Moore’s Law. The other type is software improvement: the AI learns how to think more efficiently with the hardware that it has. This would be an extension of how humans have learned to think (mathematics, scientific method, bayesian reasoning) or how humans have learned to program (functional programming, object-oriented programming, rapid prototyping). What does Eliezer mean by “self-improving AI”? How much does it depend on the AI learning how to learn better than humans have learned how to learn?

  • frelkins

    @Tim Tyler

    Would it make any difference to the resulting quality of the supposed SI if the sequence in fact went:

    society -> music -> big brains -> writing -> publishing -> SI?

  • frelkins

    Arrgh, sorry, typo, that should of course be:

    society -> music -> big brains -> language -> writing -> publishing -> SI?

  • Eliezer, have I completely failed to communicate here? You have previously said nothing is similar enough to this new event for analogy to be useful, so all we have is “causal modeling” (though you haven’t explained what you mean by this in this context). This post is a reply saying, no, there are more ways using abstractions; analogy and causal modeling are two particular ways to reason via abstractions, but there are many other ways. But here again in the comments you just repeat your previous claim. Can’t you see that my long list of ways to reason about hammers isn’t well summarized by an analogy vs. causal modeling dichotomy, but is better summarized by noting they use different abstractions?

    I am of course open to different way to conceive of “the previous major singularities”. I have previously tried to conceive of them in terms of sudden growth speedups.

  • Tim Tyler

    Re: society -> music -> big brains -> language -> writing -> publishing -> SI?

    I see that as essentially the same as my sequence – except that it misses out the stage I included which represents the human population explosion of 10,000 BC – and has some additional stages tacked on at the front.

    I would say that the sequence contains much the same insight: superintelligence is the next stage in the “master plan” of the new replicators.

  • Tim Tyler

    It’s a pretty standard term – see: http://timtyler.org/self_improving_systems/

  • Jef Allbright

    @Eliezer: So at first I wondered if I was just tired because I could read the words in the preceding comment but not the sentences…

    Funny, I find your writing these days to be far too wordy. 😉

    So here’s it is somewhat expanded:

    It may be useful to consider abstraction as compression.

    As many readers of OvercomingBias are aware, there’s a strong connection between intelligence and compression. Indeed many will go so far as to claim that effective compression is the sine qua non of intelligence. By extension, effective abstraction is essential to the costly computational challenge of intelligently thinking—and predicting—at a meaningful level of complexity without succumbing to the Death of a Thousand Distractions.

    One scheme aims to trade detail for salience by a fractal encoding of regularities within the domain of interest. Intelligence is in the encoding. Coherence is maximized over the entire context.

    Many of us are familiar with Run-length encoding, Huffman encoding and Lempel-Ziv-Welch encoding, which lossless compression methods achieve progressively tighter compression by encoding progressively higher levels of abstraction. To the extent that we can identify regularities operating not only at the “surface level” of simple ASCII characters or pixels as in Run-length encoding, but more significantly at the “deeper levels” of word fragments or geometric shapes, or even more significantly at the level of semantic (or functional) atoms, then we can exploit such self-similarity at multiple scales to encode ever more effectively.

    It’s worth noting that such encoding is fundamentally hard in the information-theoretic sense that there can be no shortcuts that don’t involve hard-earned information, entailing “intelligence”, “increasing knowledge of the search space”, or an “optimization process” as it is often stated in this particular sandbox.

    It’s worth noting also that the more “intelligent” the compression method, the more coherent will be the structure of the “atoms” within the compressed output, meaning that on average, “atoms” at any level will have greater mutual information, each will play a relatively greater role in synergy with its associates, and these synergies will tend to operate over a greater context (than, say, simple Run-length encoding.)

    Another scheme aims to trade detail for salience by selection of features according to an external function, call it Archimedes_and_his_Lever. Intelligence is in Archimedes. Coherence is maximized over a selected context.

    On the other hand, many of us are familiar also with lossy methods of compression. These include mp3 which depends heavily on an external psychoacoustic model, the compressed sensing of our visual cortex, which depends heavily on a preexisting model of our visual reality, or the compression of typical elementary school arithmetic considered pretty much independently of set theory or the axioms of Peano. Oh, and the examples given by Robin, above, where particular attributes are described in terms of features selected via external knowledge of the domain.

    The reference to “Archimedes and his lever” is an inside joke, mentioned more than once in this forum, in regard to the perceived tendency of Robin to adopt a perspective of knowledge greater than, or outside, the system of interest similar to Archimedes’ famously saying like “Give me a long enough lever and a place to stand on, and I will move the earth.”

    Within a given domain, these two approaches will tend to perform equally well for description. One will tend to perform significantly better than the other for prediction.

    So the point here, in trying to distinguish between two quite functionally distinct views of abstraction, is that description, whether it be of the salient percepts of a piece of music, or the salient features of arithmetic as taught to schoolchildren, or even all you need to know about cosmology, according to the book of Genesis, can be quite coherent and effective within a selected context.

    But for prediction beyond the context of current observations, in order to reasonably expect that one’s search is at least constrained within the best-known distribution within a much vaster possibility space, to illuminate via searchlight beam rather than a bulb no matter how bright, one benefits by having a deeper model that points, rather than a shallow model prone to pivoting.

  • mjgeddes

    I’m not sure that the examples you give are really any different from analogies Robin.

    I wonder if all valid reasoning is actually analogy formation in disguise. It may just be the case that some analogies are much more sophisticated than others. What EY dismisses as ‘surface similarities’ is really only a criticism of the limitations of bad analogies, not of analogy formation per se.


    (1) All valid reasoning can only proceed by manipulating concepts that are already known to us (i.e. concepts that are already in our mind)

    (2) To reason about new domains or extend the applicability of known reasoning methods, new concepts must be derived, or old concepts extended.

    (3) From (1), in order to move from old/known concepts to new/extended concepts in an understandable way, (2) must involve a mapping between new/extended concepts and old/known concepts

    (4) Any such mapping relies on similarities between old/known concepts and new/extended concepts

    (5) Mapping of similarities between things is analogy formation

    (6) From (5), all valid reasoning is analogy formation

    Types of analogies:

    (a) Surface similarities (mostly bad analogies): Looking at structural
    similarities (similarities in the way things appear)

    (b) Functional similarities (better analogies): Looking at similarities in the way things act/are used. (Hypothesis: This is entirely equivalent to causal modeling/Bayesian Induction!)

    (c) Semantic similarities (deep analogies): Looks at similarities in the meanings (high-level representations)
    of concepts (Hypothesis: This is equivalent to ontology creation and merging – interfaces/mapping between ontological categories)

  • mjgeddes


    I think this makes my idea crystal clear:

    Analogies are *mappings* between knowledge domains; and there are varying *degrees* of mapping accuracy – ranging from a totally inaccurate mapping (a totally bad analogy) to a perfect mapping (Bayes, causal modelling).

    In the limit that the mapping approaches perfect accuracy, the accuracy of analogical reasoning approaches Bayesian Induction/Causal modelling. This shows that Bayesian reasoning is merely a special case of analogical reasoning.

    Eliezer keeps critisizing analogies because the mappings are inaccurate (missing info, mixed with some errors), but this is actually a strength of analogies!

    Here’s why: Imperfect mappings are what enable us to slide between different knowledge domains (to slide back and forth across concept space). Bayesian reasoning can only be applied to perfectly defined domains; true, Bayes is perfectly accurate, but Bayes cannot let us slide across concept space to explore new ill-defined domains (cross domain reasoning)

    In summary there’s a trade-off, the perfect accuracy of Bayesian reasoning at one end of the scale, but confined to narrow, precisely defined domains, and at the other end of the scale, the freedom of analogical reasoning to connect different domains (cross domain reasoning), but with some inaccuracy in the mappings.

  • Pingback: AI Foom Debate: Post 7 – 10 | wallowinmaya()