I’m not that happy with framing our analysis choices here as "surface analogies" versus "inside views." More useful, I think, to see this as a choice of abstractions. An abstraction neglects some details to emphasize others. While random abstractions are useless, we have a rich library of a useful abstractions, tied to specific useful insights.
For example, consider the oldest known tool, the hammer. To understand how well an ordinary hammer performs its main function, we can abstract from details of shape and materials. To calculate the kinetics energy it delivers, we need only look at its length, head mass, and recoil energy percentage (given by its bending strength). To check that it can be held comfortably, we need the handle’s radius, surface coefficient of friction, and shock absorption ability. To estimate error rates we need only consider its length and head diameter.
For other purposes, we can use other abstractions:
- To see that it is not a good thing to throw at people, we can note it is heavy, hard, and sharp.
- To see that it is not a good thing to hold high in a lightning storm, we can note it is long and conducts electricity.
- To evaluate the cost to carry it around in a tool kit, we consider its volume and mass.
- To judge its suitability as decorative wall art, we consider its texture and color balance.
- To predict who will hold it when, we consider who owns it, and who they know.
- To understand its symbolic meaning in a story, we use a library of common hammer symbolisms.
- To understand its early place in human history, we consider its easy availability and frequent gains from smashing open shells.
- To predict when it is displaced by powered hammers, we can focus on the cost, human energy required, and weight of the two tools.
- To understand its value and cost in our economy, we can focus on its market price and quantity.
- [I'm sure we could extend this list.]
Whether something is "similar" to a hammer depends on whether it has similar relevant features. Comparing a hammer to mask based on their having similar texture and color balance is mere "surface analogies" for the purpose of calculating the cost to carry it around, but is a "deep inside" analysis for the purpose of judging its suitability as wall art. The issue is what abstractions are how useful for what purposes, not what features are "deep" vs. "surface."
Minds are so central to us that we have an enormous range of abstractions for thinking about them. Add that to our abstractions for machines and creation stories, and we have a truly enormous space of abstractions for considering stories about creating machine minds. The issue isn’t so much whether any one abstraction is deep or shallow, but whether it is appropriate to the topic at hand.
The future story of the creation of designed minds must of course differ in exact details from everything that has gone before. But that does not mean that nothing before is informative about it. The whole point of abstractions is to let us usefully compare things that are different, so that insights gained about some become insights about the others.
Yes when you struggle to identify relevant abstractions you may settle for analogizing, i.e., attending to commonly-interesting features and guessing based on feature similarity. But not all comparison of different things is analogizing. Analogies are bad not because they use "surface" features, but because the abstractions they use do not offer enough relevant insight for the purpose at hand.
I claim academic studies of innovation and economic growth offer relevant abstractions for understanding the future creation of machine minds, and that in terms of these abstractions the previous major singularities, such as humans, farming, and industry, are relevantly similar. Eliezer prefers "optimization" abstractions. The issue here is evaluating the suitability of these abstractions for our purposes.
loading...



It may be useful to consider abstraction as compression.
One scheme aims to trade detail for salience by a fractal encoding of regularities within the domain of interest. Intelligence is in the encoding. Coherence is maximized over the entire context.
Another scheme aims to trade detail for salience by selection of features according to an external function, call it Archimedes_and_his_Lever. Intelligence is in Archimedes. Coherence is maximized over a selected context.
Within a given domain, these two approaches will tend to perform equally well for description. One will tend to perform significantly better than the other for prediction.
(So at first I wondered if I was just tired because I could read the words in the preceding comment but not the sentences; and then I saw “Jef Allbright” and it all became clear.)
The dawn of life, considered as a complete event, could not have had its properties predicted by similarity to any other complete event before it.
But you could, for example, have dropped down to modeling the world on the level of atoms, which would go on behaving similarly to all the other atoms ever observed. It’s just that the compound of atoms wouldn’t behave similarly to any other compound, with respect to the aspects we’re interested in (Life Go FOOM).
You could say, “Probability is flowing between regions of pattern-space, the same as before; but look, now there’s a cycle; therefore there’s this new thing going on called search.” There wouldn’t be any search in history to analogize to, but there would be (on a lower level of granularity) patterns giving birth to other patterns: stars to planets and the like.
Causal modeling can tell us about things that are not similar in their important aspect to any other compound thing in history, provided that they are made out of sufficiently similar parts put together in a new structure.
I also note that referring to “humans, farming, and industry” as “the previous major singularities” is precisely the issue at hand – is this an abstraction that’s going to give us a good prediction of “self-improving AI” by direct induction/extrapolation, or not?
I wouldn’t begin to compare the shift from non-recursive optimization to recursive optimization to anything else except the dawn of life – and that’s not suggesting that we could do inductive extrapolation, it’s just a question of “How large an event”? There isn’t anything directly similar to a self-improving AI, in my book; it’s a new thing under the Sun, “like replication once was” but not at all the same sort of hammer – if it was, it wouldn’t be a new thing under the Sun.
That doesn’t help much in understanding it, though. Fortunately, there are other approaches. E.g. adopting the meme’s eye view:
Here, superintelligent machines are seen as the last step in a series that goes: Big brains -> language -> society -> writing -> publishing -> SI.
Is it worth to distinguish between two types of self-improvement for an AI? One type is hardware improvements: the AI learns how to make the hardware that it is operating on run faster. This would be an extension of Moore’s Law. The other type is software improvement: the AI learns how to think more efficiently with the hardware that it has. This would be an extension of how humans have learned to think (mathematics, scientific method, bayesian reasoning) or how humans have learned to program (functional programming, object-oriented programming, rapid prototyping). What does Eliezer mean by “self-improving AI”? How much does it depend on the AI learning how to learn better than humans have learned how to learn?
@Tim Tyler
Would it make any difference to the resulting quality of the supposed SI if the sequence in fact went:
society -> music -> big brains -> writing -> publishing -> SI?
Arrgh, sorry, typo, that should of course be:
society -> music -> big brains -> language -> writing -> publishing -> SI?
Eliezer, have I completely failed to communicate here? You have previously said nothing is similar enough to this new event for analogy to be useful, so all we have is “causal modeling” (though you haven’t explained what you mean by this in this context). This post is a reply saying, no, there are more ways using abstractions; analogy and causal modeling are two particular ways to reason via abstractions, but there are many other ways. But here again in the comments you just repeat your previous claim. Can’t you see that my long list of ways to reason about hammers isn’t well summarized by an analogy vs. causal modeling dichotomy, but is better summarized by noting they use different abstractions?
I am of course open to different way to conceive of “the previous major singularities”. I have previously tried to conceive of them in terms of sudden growth speedups.
Re: society -> music -> big brains -> language -> writing -> publishing -> SI?
I see that as essentially the same as my sequence – except that it misses out the stage I included which represents the human population explosion of 10,000 BC – and has some additional stages tacked on at the front.
I would say that the sequence contains much the same insight: superintelligence is the next stage in the “master plan” of the new replicators.
It’s a pretty standard term – see: http://timtyler.org/self_improving_systems/
@Eliezer: So at first I wondered if I was just tired because I could read the words in the preceding comment but not the sentences…
Funny, I find your writing these days to be far too wordy.
So here’s it is somewhat expanded:
It may be useful to consider abstraction as compression.
One scheme aims to trade detail for salience by a fractal encoding of regularities within the domain of interest. Intelligence is in the encoding. Coherence is maximized over the entire context.
Another scheme aims to trade detail for salience by selection of features according to an external function, call it Archimedes_and_his_Lever. Intelligence is in Archimedes. Coherence is maximized over a selected context.
Within a given domain, these two approaches will tend to perform equally well for description. One will tend to perform significantly better than the other for prediction.
I’m not sure that the examples you give are really any different from analogies Robin.
I wonder if all valid reasoning is actually analogy formation in disguise. It may just be the case that some analogies are much more sophisticated than others. What EY dismisses as ‘surface similarities’ is really only a criticism of the limitations of bad analogies, not of analogy formation per se.
Lemma:
(1) All valid reasoning can only proceed by manipulating concepts that are already known to us (i.e. concepts that are already in our mind)
(2) To reason about new domains or extend the applicability of known reasoning methods, new concepts must be derived, or old concepts extended.
(3) From (1), in order to move from old/known concepts to new/extended concepts in an understandable way, (2) must involve a mapping between new/extended concepts and old/known concepts
(4) Any such mapping relies on similarities between old/known concepts and new/extended concepts
(5) Mapping of similarities between things is analogy formation
(6) From (5), all valid reasoning is analogy formation
Types of analogies:
(a) Surface similarities (mostly bad analogies): Looking at structural
similarities (similarities in the way things appear)
(b) Functional similarities (better analogies): Looking at similarities in the way things act/are used. (Hypothesis: This is entirely equivalent to causal modeling/Bayesian Induction!)
(c) Semantic similarities (deep analogies): Looks at similarities in the meanings (high-level representations)
of concepts (Hypothesis: This is equivalent to ontology creation and merging – interfaces/mapping between ontological categories)
Addendum:
I think this makes my idea crystal clear:
Analogies are *mappings* between knowledge domains; and there are varying *degrees* of mapping accuracy – ranging from a totally inaccurate mapping (a totally bad analogy) to a perfect mapping (Bayes, causal modelling).
In the limit that the mapping approaches perfect accuracy, the accuracy of analogical reasoning approaches Bayesian Induction/Causal modelling. This shows that Bayesian reasoning is merely a special case of analogical reasoning.
Eliezer keeps critisizing analogies because the mappings are inaccurate (missing info, mixed with some errors), but this is actually a strength of analogies!
Here’s why: Imperfect mappings are what enable us to slide between different knowledge domains (to slide back and forth across concept space). Bayesian reasoning can only be applied to perfectly defined domains; true, Bayes is perfectly accurate, but Bayes cannot let us slide across concept space to explore new ill-defined domains (cross domain reasoning)
In summary there’s a trade-off, the perfect accuracy of Bayesian reasoning at one end of the scale, but confined to narrow, precisely defined domains, and at the other end of the scale, the freedom of analogical reasoning to connect different domains (cross domain reasoning), but with some inaccuracy in the mappings.