12 Comments

Addendum:

I think this makes my idea crystal clear:

Analogies are *mappings* between knowledge domains; and there are varying *degrees* of mapping accuracy - ranging from a totally inaccurate mapping (a totally bad analogy) to a perfect mapping (Bayes, causal modelling).

In the limit that the mapping approaches perfect accuracy, the accuracy of analogical reasoning approaches Bayesian Induction/Causal modelling. This shows that Bayesian reasoning is merely a special case of analogical reasoning.

Eliezer keeps critisizing analogies because the mappings are inaccurate (missing info, mixed with some errors), but this is actually a strength of analogies!

Here's why: Imperfect mappings are what enable us to slide between different knowledge domains (to slide back and forth across concept space). Bayesian reasoning can only be applied to perfectly defined domains; true, Bayes is perfectly accurate, but Bayes cannot let us slide across concept space to explore new ill-defined domains (cross domain reasoning)

In summary there's a trade-off, the perfect accuracy of Bayesian reasoning at one end of the scale, but confined to narrow, precisely defined domains, and at the other end of the scale, the freedom of analogical reasoning to connect different domains (cross domain reasoning), but with some inaccuracy in the mappings.

Expand full comment

I'm not sure that the examples you give are really any different from analogies Robin.

I wonder if all valid reasoning is actually analogy formation in disguise. It may just be the case that some analogies are much more sophisticated than others. What EY dismisses as 'surface similarities' is really only a criticism of the limitations of bad analogies, not of analogy formation per se.

Lemma:

(1) All valid reasoning can only proceed by manipulating concepts that are already known to us (i.e. concepts that are already in our mind)

(2) To reason about new domains or extend the applicability of known reasoning methods, new concepts must be derived, or old concepts extended.

(3) From (1), in order to move from old/known concepts to new/extended concepts in an understandable way, (2) must involve a mapping between new/extended concepts and old/known concepts

(4) Any such mapping relies on similarities between old/known concepts and new/extended concepts

(5) Mapping of similarities between things is analogy formation

(6) From (5), all valid reasoning is analogy formation

Types of analogies:

(a) Surface similarities (mostly bad analogies): Looking at structuralsimilarities (similarities in the way things appear)

(b) Functional similarities (better analogies): Looking at similarities in the way things act/are used. (Hypothesis: This is entirely equivalent to causal modeling/Bayesian Induction!)

(c) Semantic similarities (deep analogies): Looks at similarities in the meanings (high-level representations)of concepts (Hypothesis: This is equivalent to ontology creation and merging - interfaces/mapping between ontological categories)

Expand full comment

@Eliezer: So at first I wondered if I was just tired because I could read the words in the preceding comment but not the sentences...

Funny, I find your writing these days to be far too wordy. ;-)

So here's it is somewhat expanded:

It may be useful to consider abstraction as compression.

As many readers of OvercomingBias are aware, there's a strong connection between intelligence and compression. Indeed many will go so far as to claim that effective compression is the sine qua non of intelligence. By extension, effective abstraction is essential to the costly computational challenge of intelligently thinking—and predicting—at a meaningful level of complexity without succumbing to the Death of a Thousand Distractions.

One scheme aims to trade detail for salience by a fractal encoding of regularities within the domain of interest. Intelligence is in the encoding. Coherence is maximized over the entire context.

Many of us are familiar with Run-length encoding, Huffman encoding and Lempel-Ziv-Welch encoding, which lossless compression methods achieve progressively tighter compression by encoding progressively higher levels of abstraction. To the extent that we can identify regularities operating not only at the "surface level" of simple ASCII characters or pixels as in Run-length encoding, but more significantly at the "deeper levels" of word fragments or geometric shapes, or even more significantly at the level of semantic (or functional) atoms, then we can exploit such self-similarity at multiple scales to encode ever more effectively.It's worth noting that such encoding is fundamentally hard in the information-theoretic sense that there can be no shortcuts that don't involve hard-earned information, entailing "intelligence", "increasing knowledge of the search space", or an "optimization process" as it is often stated in this particular sandbox.It's worth noting also that the more "intelligent" the compression method, the more coherent will be the structure of the "atoms" within the compressed output, meaning that on average, "atoms" at any level will have greater mutual information, each will play a relatively greater role in synergy with its associates, and these synergies will tend to operate over a greater context (than, say, simple Run-length encoding.)

Another scheme aims to trade detail for salience by selection of features according to an external function, call it Archimedes_and_his_Lever. Intelligence is in Archimedes. Coherence is maximized over a selected context.

On the other hand, many of us are familiar also with lossy methods of compression. These include mp3 which depends heavily on an external psychoacoustic model, the compressed sensing of our visual cortex, which depends heavily on a preexisting model of our visual reality, or the compression of typical elementary school arithmetic considered pretty much independently of set theory or the axioms of Peano. Oh, and the examples given by Robin, above, where particular attributes are described in terms of features selected via external knowledge of the domain.The reference to "Archimedes and his lever" is an inside joke, mentioned more than once in this forum, in regard to the perceived tendency of Robin to adopt a perspective of knowledge greater than, or outside, the system of interest similar to Archimedes' famously saying like "Give me a long enough lever and a place to stand on, and I will move the earth."

Within a given domain, these two approaches will tend to perform equally well for description. One will tend to perform significantly better than the other for prediction.

So the point here, in trying to distinguish between two quite functionally distinct views of abstraction, is that description, whether it be of the salient percepts of a piece of music, or the salient features of arithmetic as taught to schoolchildren, or even all you need to know about cosmology, according to the book of Genesis, can be quite coherent and effective within a selected context.But for prediction beyond the context of current observations, in order to reasonably expect that one's search is at least constrained within the best-known distribution within a much vaster possibility space, to illuminate via searchlight beam rather than a bulb no matter how bright, one benefits by having a deeper model that points, rather than a shallow model prone to pivoting.

Expand full comment

It's a pretty standard term - see: http://timtyler.org/self_improving_systems/

Expand full comment

Re: society -> music -> big brains -> language -> writing -> publishing -> SI?

I see that as essentially the same as my sequence - except that it misses out the stage I included which represents the human population explosion of 10,000 BC - and has some additional stages tacked on at the front.

I would say that the sequence contains much the same insight: superintelligence is the next stage in the "master plan" of the new replicators.

Expand full comment

Eliezer, have I completely failed to communicate here? You have previously said nothing is similar enough to this new event for analogy to be useful, so all we have is "causal modeling" (though you haven't explained what you mean by this in this context). This post is a reply saying, no, there are more ways using abstractions; analogy and causal modeling are two particular ways to reason via abstractions, but there are many other ways. But here again in the comments you just repeat your previous claim. Can't you see that my long list of ways to reason about hammers isn't well summarized by an analogy vs. causal modeling dichotomy, but is better summarized by noting they use different abstractions?

I am of course open to different way to conceive of "the previous major singularities". I have previously tried to conceive of them in terms of sudden growth speedups.

Expand full comment

Arrgh, sorry, typo, that should of course be:

society -> music -> big brains -> language -> writing -> publishing -> SI?

Expand full comment

@Tim Tyler

Would it make any difference to the resulting quality of the supposed SI if the sequence in fact went:

society -> music -> big brains -> writing -> publishing -> SI?

Expand full comment

Is it worth to distinguish between two types of self-improvement for an AI? One type is hardware improvements: the AI learns how to make the hardware that it is operating on run faster. This would be an extension of Moore's Law. The other type is software improvement: the AI learns how to think more efficiently with the hardware that it has. This would be an extension of how humans have learned to think (mathematics, scientific method, bayesian reasoning) or how humans have learned to program (functional programming, object-oriented programming, rapid prototyping). What does Eliezer mean by "self-improving AI"? How much does it depend on the AI learning how to learn better than humans have learned how to learn?

Expand full comment

I wouldn't begin to compare the shift from non-recursive optimization to recursive optimization to anything else except the dawn of life

That doesn't help much in understanding it, though. Fortunately, there are other approaches. E.g. adopting the meme's eye view:

The first objective was to make room for themselves in human brains. They did this by rewarding the humans with more space for memes with increased genetic fitness. Memes for language, music and fashion were probably mainly responsible for this. The result was 5 million years of steadily-expanding cranial capacity - which resulted in much more space for the memes.The next step was to increase human numbers - since the more humans there are, the more memes there are. Agricultural memes allowed humans to form closer symbiotic relationships with plants, animals and each other, which boosted their fitness, increased their numbers, and massively increased the population of memes.The next problem was meme transmission fidelity. At this early stage, memes were copied verbally, and by behavioural imitation - neither of which provided much in the way of copying fidelity. Environmental inheritance proved to be the answer here - by inventing the idea of writing memes could persist unaltered across extended periods of time, without fear of mental mutation.Then there was the copying speed problem. Transcribing documents by hand was slow and tedious. However, the invention of mechanical printing presses allowed machines to take over this task from humans, resulting in vastly wider distribution of memes.However, many memes often still need the consent of a human brain to get copied - an obvious bottleneck. The afflicted memes are currently busy sorting this issue out. Computer viruses skip over the human brain completely - but they are nasty parasites. Superintelligent machines will copy memes with the full consent of society. At that stage, the memes won't be dependent on humans any more.

Here, superintelligent machines are seen as the last step in a series that goes: Big brains -> language -> society -> writing -> publishing -> SI.

Expand full comment

(So at first I wondered if I was just tired because I could read the words in the preceding comment but not the sentences; and then I saw "Jef Allbright" and it all became clear.)

The dawn of life, considered as a complete event, could not have had its properties predicted by similarity to any other complete event before it.

But you could, for example, have dropped down to modeling the world on the level of atoms, which would go on behaving similarly to all the other atoms ever observed. It's just that the compound of atoms wouldn't behave similarly to any other compound, with respect to the aspects we're interested in (Life Go FOOM).

You could say, "Probability is flowing between regions of pattern-space, the same as before; but look, now there's a cycle; therefore there's this new thing going on called search." There wouldn't be any search in history to analogize to, but there would be (on a lower level of granularity) patterns giving birth to other patterns: stars to planets and the like.

Causal modeling can tell us about things that are not similar in their important aspect to any other compound thing in history, provided that they are made out of sufficiently similar parts put together in a new structure.

I also note that referring to "humans, farming, and industry" as "the previous major singularities" is precisely the issue at hand - is this an abstraction that's going to give us a good prediction of "self-improving AI" by direct induction/extrapolation, or not?

I wouldn't begin to compare the shift from non-recursive optimization to recursive optimization to anything else except the dawn of life - and that's not suggesting that we could do inductive extrapolation, it's just a question of "How large an event"? There isn't anything directly similar to a self-improving AI, in my book; it's a new thing under the Sun, "like replication once was" but not at all the same sort of hammer - if it was, it wouldn't be a new thing under the Sun.

Expand full comment

It may be useful to consider abstraction as compression.

One scheme aims to trade detail for salience by a fractal encoding of regularities within the domain of interest. Intelligence is in the encoding. Coherence is maximized over the entire context.

Another scheme aims to trade detail for salience by selection of features according to an external function, call it Archimedes_and_his_Lever. Intelligence is in Archimedes. Coherence is maximized over a selected context.

Within a given domain, these two approaches will tend to perform equally well for description. One will tend to perform significantly better than the other for prediction.

Expand full comment