Irreducible Detail

Our best theories vary in generality. Some theories are very general, but most are more context specific. Putting all of our best theories together usually doesn’t let us make exact predictions on most variables of interest. We often express this fact formally in our models via “noise,” which represents other factors that we can’t yet predict.

For each of our theories there was a point in time when we didn’t have it yet. Thus we expect to continue to learn more theories, which will let us make more precise predictions. And so it might seem like we can’t constrain our eventual power of prediction; maybe we will have powerful enough theories to predict everything exactly.

But that doesn’t seem right either. Our best theories in many areas tell us about fundamental limits on our prediction abilities, and thus limits on how powerful future simple general theories could be. For example:

  • Thermodynamics – We can predict some gross features of future physical states, but the entropy of a system sets a very high (negentropy) cost to learn precise info about the state of that system. If thermodynamics is right, there will never be a general theory to let one predict future states more cheaply than this.
  • Finance – Finance theory has identified many relevant parameters to predict the overall distribution of future assets returns. However, finance theory strongly suggests that it is usually very hard to predict details of the specific future returns of specific assets. The ability to do so would be worth such a huge amount that there just can’t be many who often have such an ability. The cost to gain such an ability must usually be more than the gains from trading it.
  • Cryptography – A well devised code looks random to an untrained eye. As there are a great many possible codes, and a great many ways to find weaknesses in them, it doesn’t seem like there could be any general way to break all codes. Instead code breaking is a matter of knowing lots of specific things about codes and ways they might be broken. People use codes when they expect the cost of breaking them to be prohibitive, and such expectations are usually right.
  • Innovation – Economic theory can predict many features of economies, and of how economies change and grow. And innovation contributes greatly to growth. But economists also strongly expect that the details of particular future innovations cannot be predicted except at a prohibitive cost. Since knowing of innovations ahead of time can often be used for great private profit, and would speed up the introduction of those innovations, it seems that no cheap-to-apply simple general theories can exist which predict the details of most innovations well ahead of time.
  • Ecosystems – We understand some ways in which parameters of ecosystems correlate with their environments. Most of these make sense in terms of general theories of natural selection and genetics. However, most ecologists strongly suspect that the vast majority of the details of particular ecosystems and the species that inhabit them are not easily predictable by simple general theories. Evolution says that many details will be well matched to other details, but to predict them you must know much about the other details to which they match.

In thermodynamics, finance, cryptography, innovations, and ecosystems, we have learned that while there are many useful generalities, the universe is also chock full of important irreducible incompressible detail. As this is true at many levels of abstraction, I would add this entry to the above list:

  • Intelligence – General theories tell us what intelligence means, and how it can generalize across tasks and contexts. But most everything we’ve learned about intelligence suggests that the key to smarts is having many not-fully-general tools. Human brains are smart mainly by containing many powerful not-fully-general modules, and using many modules to do each task. These modules would not work well in all possible universes, but they often do in ours. Ordinary software also gets smart by containing many powerful modules. While the architecture that organizes those modules can make some difference, that difference is mostly small compared to having more better modules. In a world of competing software firms, most ways to improve modules or find new ones cost more than the profits they’d induce.

If most value in intelligence comes from the accumulation of many expensive parts, there may well be no powerful general theories to be discovered to revolutionize future AI, and give an overwhelming advantage to the first project to discover them. Which is the main reason that I’m skeptical about AI foom, the scenario where an initially small project quickly grows to take over the world.

Added 7p: Peter McCluskey has thoughtful commentary here.

GD Star Rating
Tagged as: , , , ,
Trackback URL:
  • Dan Browne

    This is precisely why foom cannot happen as the result of an algorithm. Detail can only be compressed so far. (An example: compress numbers down to a single bit. It’s problematic to say the least trying to extract more information from that single bit than was originally there.)
    The second part to it is that the entity is in effect a mathematical system whose axioms are the irreducible detail.
    Goedel’s theorem eliminates the possibility of the mathematical system being able to derive its axioms and this further eliminates the possibility of self improvement.
    In short, the system must be improved upon externally.
    i.e. it must be trained by data from the real world.
    And this is what will prevent foom from happening: the speed of R&D in the physical world.

    • IMASBA

      Nature and history can train a mind even without human R&D going on. Even merely having oversight of all of humanity’s existing knowledge enable a mind to manipulate its way to the top and then take over (it could simultaneously win presidential elections in 20 countries through the use of figurehead avatars). Robin is too naive in assuming AI will play nice and stick to our rules of commerce, competition and contact, just as he is too naive in assuming extreme competition in an EM world won’t lead to desperate EMs killing, stealing and rebelling to get to live for another clockcycle.

      • Dan Browne

        While what you say is true, this != FOOM.

      • IMASBA

        Agreed, but it would definitely make subsequent FOOM a lot easier (the resource constraint and competition arguments Robin has thrown up would become moot).

      • …it could simultaneously win presidential elections in 20 countries through the use of figurehead avatars…

        So why didn’t the CIA do this yet? Or is it too difficult for an organisation with lots of money, resources, and many thousands of drones which were optimized by evolution for such social games? If that is the case, then what exactly would the AI do better than them? Come up with some convincing arguments why people should vote his puppets? Because elections are won by arguments?

        I think many of these “the AI will do X” scenarios are quite ridiculous. People don’t worry about how their particular scenario is supposed to work in practice, because they believe that since AI is conjectured to be much smarter than them, they are allowed to appeal to magic.

      • Peter David Jones

        You would need quite a lot of Foom to have happened already for that takeover scenario to happen.

  • Jack

    It remembers me The Black Swan Book. By the way, what do you think about Nassim Taleb’s Ideas?

    • Doug

      Going off on a tangent: As someone who works in quant finance, Taleb is ridiculously overrated. Most of his popular ideas, such as “anti-fragility” are simply inferior and convoluted reconstructions of well-known statistics.

      Focusing on finance specifically, Taleb has two primary beliefs, that are often coated underneath vague, hand-wavery and intellectual hedging. The first is that the asset returns follow a Cauchy distribution. A class of distribution where tails are so extreme that there isn’t even a defined mean or variance. Nothing we’ve learned in the past fifty years of finance, since Mandelbrot first proposed this theory, supports this belief. Nor is there any plausible mechanism that would generate Cauchy asset returns. The idea only persists with a very small sliver of cranks because Cauchy distributions are by their very nature hard to falsify.

      The second belief is that tradeable volatility (as expressed in option prices) is systematically overvalued. Again this is a very hard idea to falsify, but almost all of our evidence indicates otherwise. Selling volatility regularly earns positive return premium, and we have many reasons to believe that volatility is a natural risk factor where sellers are compensated. Taleb argues that selling vol is actually a negative expected value in the long-run, due to infrequent, but massive dislocations. Consider that Black Swan was published in 2007, and subsequently the greatest financial crisis since 1929 occurred. Yet since then the returns to volatility have been stunningly positive.

  • Eliezer Yudkowsky

    I think we are making exciting progress toward turning your belief into a series of modular premises such that I think at least one of them is obviously false.

    I ask if you can restate your hypothesis in less computer-sciency terms and more in economic abstractions, a la “Intelligence Explosion Microeconomics”. All of your examples, with the possible and complicated exception of thermodynamics, are about bounded irreducibility rather than unbounded irreducibility; you are not trying to state that something cannot be reduced, you are trying to state that it has an unavoidable dependency on details and the details are expensive to predict.

    Let’s take finance as our example. A theory of finance which is efficient to a first approximation says that there are a lot of boundedly rational agents who are not very much dumber than you, on the lookout for ways to convert small amounts of resources into large amounts of resources with above-average real interest rates. Therefore, nobody should be able to find investments with exceptionally high returns. We then have to explain away lots of apparently exceptionally high returns, which luckily, we have surprisingly many ways of doing to second order. For example, if some countries have higher real interest rates than others, we say that there are hidden risks or barriers to entry. We say that lottery winners cannot repeat their wins consistently. When Larry Page and Sergei Brin founded Google, we could say that what we were actually looking at was a hidden expensive barrier to entry that prevented the average venture capitalist from inventing PageRank without a very big investment of time, or trying to hire lots and lots of expensive smart people to reliably have an idea that good, and that Larry and Sergei were collecting the equivalent of a winning lottery ticket on happening to be just two smart people with a good idea. (Plus, of course, all the other investments and luck factor required for them to end up as billionaires via Google.)

    It seems to me that there must be very many exceptions to your first order belief that intelligence is boundedly irreducible, that is, it has lots of expensive details bounding its return on investment, or at least that I would produce many such exceptions as soon as you stated your theory in more economic rather than computer-sciency terms. There is only one agency (natural selection) which produced human-level intelligence, there is only one example of it, and it seems to have achieved huge gains relative to competitors; intelligence does not look very much like an efficient market, nor like cryptography. But I turn it over to you; how would you state your theory in economic terms as an unreasonable cost of reducing the details, and what are some obvious examples of second-order exceptions that you think can be explained away without either breaking the theory or obviously permitting an intelligence explosion?

    • I agree that I am talking about a high cost to learn and use details, rather than saying details are impossible to use.

      Following your lead in focusing on the finance example, I would not claim that no one can ever make higher (risk etc. adjusted) returns than others, or even that no one could ever be in a position to rationally expect that they’d make higher returns. What I’d say is that situations like this must be very rare: where someone can rationally expect to make vastly higher returns, so much so that they soon afterward own far more than the rest of the world put together, even though they started with only a tiny fraction. The reason is that greater asset returns result from having a vast number of small pieces of info, and it is incredible to think one investor could acquire that many of them in a short time. Because useful info is expensive to acquire – asset relevant info has great irreducible detail.

      Similarly, consider a situation where there are many firms and projects writing software, trying to improve old software, trying to abstract software to make it easier to improve, and trying to get software itself to do the improving and abstracting. I wouldn’t say that no firms could ever do better than others at these tasks. But because doing better overall generally requires doing better at a great many particular (irreducible) details, this situation must be very rare: a single project that starts out with a tiny fraction of resources quickly produces a set of software that is far more valuable than all of the other software in the world put together.

      I accept that while water usually flows incrementally downhill, sometimes damns burst, giving a greatly disproportionate influence to local state near such an event. So I accept that there might be pivotal tipping points where small initial conditions end up having huge final effects. But such events and places must be rare. So it is up to those who think they know where such things lie to make a strong concrete case for their exceptionality.

      • Dan Browne

        They are not only rare but they are also HARD.
        It’s not just the intelligent software that has to be developed – an empty brain in a vat with no knowledge is useless. It’s the content. That’s what adds the value. And it’s adding the content (the irreducible detail) that’s hard.

      • Ronfar

        This is exactly why Douglas Lenat is working on that “dead end” “AI” project called Cyc – he decided that he needed that content, and he’s brute forcing it because he doesn’t know a better way to do it.

      • Eliezer Yudkowsky

        And Cyc is a non player in any AI market that I know about, except possibly the market for government grants. The winners believe in architecture. Maybe it wasn’t a silly thing for Lenat to believe in 1984, but you would have a hard time buying this after reading a good modern AI textbook in 2014.

      • Dan Browne

        Yeah maybe so, but that Cyc is dependent on content != architecture being independent of content for utility.
        I happen to believe that without the architecture, putting in the content is *too hard* whereas *with* the architecture it’s merely *hard*. But it’s important to realize that the content can’t be derived by itself even if the architecture is finally here.

      • Silent Cal

        But the possibility exists that the right architecture, with maybe a compact load of initial content, could just crawl the internet for the rest of the necessary content.

      • Dan Browne

        Yes it could, but the internet doesn’t *have* the rest of the necessary content. It only has what we know up till *now*. So if we trawl the scientific journals e.g. arxiv and pubmed (to name a couple) we won’t get all the answers, we’ll only get the publications about the research that’s already been done *as well as* a bunch of unanswered questions. EDIT: for a fooming AI much greater than human level. For a non-fooming AI yes I think you probably could use the internet to train basic human equivalent modules.

        We still need to ask the questions and then experiment against *the real world* to get the answers. Simulation can only take you so far because to simulate exactly the real world you need to *be* the real world. And that’s what this is all about.

        That said, because correlation can probably speed up the direction of research significantly we could get some kind of limited FOOM but there is an absolute limit on computability of the answers imposed by Goedel so we won’t get a FOOM to infinity.

      • Silent Cal

        Unanswered physics questions aren’t really the point. Reading the whole internet might get you a pretty complete picture of human psychology as well as Earth’s economic, political and social structures, which could be more than enough to take over the world.

      • Dan Browne

        I’m saying that unanswered physics questions and science questions in general ARE the point. Without being able to advance science by experimentation you are left with imagined hypotheses in a box which have been unverified by evidence. There is thus no way to gain a tractable advantage in the real world. Of course, an entity with all the knowledge of the world would be a very interesting (and perhaps charismatic) conversation partner, but it would hardly be able to take over the world without interacting with it and it certainly couldn’t FOOM without doing real science in the real world.

      • AnotherScaryRobot

        There would be limits to FOOM without new physics, but a sufficiently intelligent agent with access to all present human knowledge will very likely be able to derive new physics from existing experimental results.

        Even if it can’t, the limits a lack of new physics imposes on FOOM might be sufficiently distant that, from the perspective of our present human civilization, they might as well not exist. If the agent can FOOM to world domination without new physics, how much do we care that it might have to pause expansion to run some experiments some time later?

      • Eliezer Yudkowsky

        Okay. I propose that you and I have a dispute within a shared economic paradigm that we could call something like “efficient returns” that generalizes the efficient market hypothesis as a paradigm for explaining both first-order conformance and first-order exceptions. Efficient cognitive returns are a special case of efficient returns.

        E.g., to talk about why Larry and Sergei were able to earn an excess return on Google, we at some point have to ask why other venture capitalists didn’t think of it, and the answer is that Larry and Sergei won a cognitive lottery (lots of people investing brain time in thinking of ideas, some of that brain time wins a ticket) and that, after factoring in the other risks of Google or similar projects, it would not have been efficient for a venture capitalist to pay lots of people as smart as Larry and Sergei to try to come up with similar ideas and test all the ones that sounded as promising in advance as Google did (without benefit of hindsight) to that venture capitalist. To put it another way, we are trying to reduce the story of Google to a combination of predictable returns that are within market range, and lottery tickets; and then we want the complete story of what it would take to duplicate a Google, including the cost of purchasing all lottery tickets, to deliver a return that isn’t too high above market, relative to the number of actors who we think would actually be incentivized to try that, and have the resources to try that, if there was a predictable return above the other returns they could get.

        In other words, the complete story about efficient returns is going to include some steps of the story that are about efficient cognitive returns—yes, Larry and Sergei thought of PageRank and other search-engine companies didn’t, but we have to ask why venture capitalists can’t pay other people to think of things in order to get the complete second-order story about why Google had a seemingly above-market first-order return.

        I will first attempt to lay out the parts of “efficient (cognitive) returns” that I think we agree on.

        The fundamental premise of efficient returns is as follows: Many agents with goals want similar things (that is, there are some things such that many agents would find those things useful; a paradigmatic thing of this type is money). Then the difficulties that some agents face in obtaining these commonly desired things, is informative about the difficulties that we expect other agents to face in obtaining these things. When an agent has apparently obtained a lot of something easily, we must ask why other agents did not also obtain that thing.

        If an agent wins $100M via a lottery ticket that cost $1M, we need a story in which “buy a lottery ticket” is not a generally efficient strategy (unless the market’s overall real rate of return is consonant with those gains in that amount of time).

        When career bureaucrats in the KGM make billions of dollars by buying up state-owned enterprises from the Former Soviet Union, we need to explain why these gains were not also captured by recent Harvard graduates who also want lots of money. In this case we tell a story where the resource price of being a well-connected senior bureaucrat in the KGB exceeds what those Harvard graduates are able to pay. But there are agents better placed than Harvard graduates to try to capture these gains, like those already living in Russia, or hedge-fund managers in the US. The stories we tell here may be about career bureaucracy in the KGB nonetheless coming at a price above the reach of a starving factory worker, or the returns to being a career KGB agent not being cognitively predictable decades in advance (i.e. this is another kind of lottery ticket; the agents who also wanted the money would have needed to try many other things equally plausible to them to cover this case, relative to their cognitive ability to predict returns).

        So far I expect us to agree.

        I remark that the theory of efficient returns is useful not because it is exceptionless but because we think it is theoretically productive in predicting that the first-order exceptions will have second-order explanations, and (sometimes) in barring sufficiently large first-order exceptions after second-order explanations have been ruled out in advance.

        An overly strong version of the efficient market hypothesis would say that nobody ever gets rich at very much above the real rate of return. But in point of fact the world is littered with thousands of exceptions to this principle, many of them quite predictable as classes. Genuinely smart people can make bounded amounts of amounts of money by trading thin markets that don’t scale up to the level where big-name hedge funds find them interesting (though it’s hard to figure out how much of this is really a lottery ticket). Venture capital involves a number of thinly traded markets that only accredited investors are allowed to enter, kept forcefully illiquid by regulation. Many government officials are able to loot state coffers and state-owned enterprises. Corporations may have ethical qualms about collecting the seemingly above-market returns on lobbying, though this is still something of a puzzle. Prediction markets can offer predictable gains to whole corporations because we understand the incentives of individual bureaucrats within those corporations to oppose prediction markets.

        If we generalize efficient financial markets to a theory of efficient returns in general, we will be able to fit many of those cases into the larger paradigm: people can earn excess returns by being smart because it costs huge amounts per unit of predictable return to try to turn time into brainpower at a traditional university.

        But the point remains that efficient returns are not a strong theory where nobody gets rich. It’s a weak theory that forces us to come up with excuses for why lots of people are becoming rich. This is, from the standpoint of good scientific methodology, an inherently suspicious state of affairs. Yet in economics it has withstood the test of time. We regard the theory as productive and probably-correct because the second-order excuses *do* often pan out; we expect that arbitrary hedge-fund managers were not allowed to bid on Russian state-owned enterprises and whaddya know, they weren’t. Where we *can* sufficiently bar the second-order excuses, as in liquid well-traded markets where lots of smart people are visibly trying very hard, it really is the case that nobody can reliably double their money every year by buying and selling the S&P 500. Excess returns earned in one place are usually *not* repeatable and other agents *don’t* just run out and earn them too. The second-order excuses are coherent, we do find the phenomena that they tell us to look for, and where those exceptions are barred the first-order predictions do hold.

        The final reason we believe in efficient returns is because it sounds so plausible from first principles: we think that to first order the world is approximated by there being lots of agents who want similar things, and when the difficulty of most agents in obtaining a resource is suddenly uninformative about the difficulty faced by another agent in obtaining a resource, we want to know why, and why other agents aren’t obtaining the first one.

        Yet again, and I’m sorry if it seems like I’m hammering too hard on this point, this generalized theory of efficient returns doesn’t say that nobody gets rich quick. This happens *all the time* in real life. The theory is just very theoretically productive in forcing us to understand why the exceptions are occurring.

        Are we on the same page so far?

      • Yup, we are on the same page so far. In these terms, the key issue seems to be the “lumpiness” of the ex post excess returns sometimes achieved. The gains from PageRank seem near the upper bounds of what we observe in our world, so the hypothesis that there will be vastly larger future lumps to he found requires some explanation of what will have changed.

      • Eliezer Yudkowsky

        Okay. I spent some time trying to think of how to phrase the next step precisely but now consider myself to have run out of time, so I’ll sort of blurt it out. Within the context of this shared theory—I’ve been thinking that maybe it should be called “inexploitability”, a la inexploitable markets being a better name than efficient markets—I don’t understand where you get your beliefs about (a) many independent modules AND (b) no big architecture wins that can make modules much cheaper.

        * We know that humans get much more of what they want than chimpanzees do. The theory of evolutionary biology puts upper bounds on how much adaptation we should expect to separate us. It sure looks like there were one or more big wins.

        * If inexploitability were a strong theory like conservation of momentum, we could look at any purported or hypothetical violation and deduce back to an error of reasoning along the way. But inexploitability is a weak theory full of real-world exceptions and corresponding second-order excuses. So you can’t look at the result of a FOOM and deduce that there must be massive modularity in order to prevent the FOOM. You have to establish the massive modularity from existing evidence already known to you, because otherwise FOOM could just be one of the many predictable exceptions to inexploitability.

        * There are not lots of hedge-fund managers competing to buy the best revisions to human brain architecture. So you couldn’t be deducing from that, that excess gains from each purchased brain revision must be small. (Plus: chimps, humans.) That right now no agents are competing to purchase this good because it is not yet available, looks like exactly the kind of excuse that we would use to explain a first-order exception if inexploitability happened to fail in this case.

        * More generally: We shouldn’t be able to deduce that gains per unit of electricity are small by arguing that nobody in the sixteenth century was buying it. Now that electricity is generally validated and available, *excess* gains from marginal electricity purchases, *within* countries that have electricity, are relatively small. But they’re still very large benefits relative to the sixteenth century and providing electrical infrastructure to countries that don’t have it can be a legitimate priority, albeit these countries must have other barriers to investment.

        * Right now we are not in a space where anyone can buy an AGI, let alone where lots of smart actors are competing to produce improvements to the current best one. The only known AGI was produced by natural selection, in one species. That’s not exactly the highly competitive conditions where we expect inexploitability to hold.

        * The part of your thesis that involves “And then people must sell whatever architectural innovations they have, because they can make more money that way” seems to me to be produced by inappropriate backwards reasoning from thinking that a FOOM must be impossible, therefore this condition must support its impossibility. This is not appropriate for a weak theory like inexploitability / efficient returns. Many actors in the modern economy encounter conditions where they prefer to keep info rather than sell it, e.g., given asymmetric info and difficulties of contracting, they think their selling price would be insufficient. Given all other parts of the FOOM thesis, this condition would certainly hold for a team that otherwise expected their AI to go FOOM. So we can’t use this step of the reasoning to conclude that a FOOM is impossible, because this step of the reasoning fails if a FOOM is in fact possible. The theory of inexploitability includes the possibilities of barriers to investment and proprietary advantages as some of the points which explain the many observed first-order exceptions.

        The basic FOOM thesis is along the lines of “There are innovations that greatly decrease the cost of cognitive capacity, these returns can compound, there are barriers to them being as easily reinvested in humans, and finally, given that overall scenario it makes sense for a leading project not to sell their seed corn and core proprietary innovations (same as a hedge fund) because they can get greater gains by keeping them private and letting them compound.” You prohibit this by proposing that (1) there are no large absolute object-level cognitive gains and (2) there are no large absolute meta-level gains.

        But the inexploitability story is about relative returns, not absolute returns, and furthermore there are currently not lots of agents competing to purchase these goods. So where are you getting your info about their absolute magnitude?

        What info already in your possession causes you to conclude your massive modularity and no architecture thesis, i.e., that absolute gains from individual purchases of cognitive capacity are small and that there are no remaining innovations which can greatly decrease their cost? I just don’t see how this conclusion follows within efficient returns / inexploitability. It would follow if the theory were strong and a FOOM needed to be prohibited a priori, but the theory is weak and full of real-life exceptions. So you must need to establish this using other observational evidence. But I don’t see what observations imply that. The observations I can think of imply the opposite.

      • We know that humans get much more of what they want than chimpanzees do.

        Really? You know this?

        You seem to have a antiquated conception of evolution as a goal-directed process (toward maximizing utility).

        Your nemesis Stephen J. Gould showed that evolution is generally punctuated: that doesn’t demonstrate a “big win.”

        [I suppose you agree with Chomsky and Fodor that language arose through a miraculous single mutation.]

      • Silent Cal

        Your position is that humans are not noticeably more successful than chimpanzees? I mean, you can make the case that humans don’t get much of ‘what they want’ because modern alienation and stuff, but this really isn’t the relevant measure for this debate.

        If I recall, Robin’s position is that humans don’t have an enormous intelligence advantage over other primates, they have an enormous social/coordination advantage. But clearly there’s some kind of enormous advantage at play (even if it’s not hedonically advantageous).

        I’m not sure why you believe humans having an advantage implies saltationism, but if the implication is true that’s an argument for saltationism, because the advantage is apparent.

      • I didn’t say humans had no distinct advantages over chimps, only that it isn’t “hedonically advantageous,” EY’s claim having been that humans are better at “getting utility.”

        Saltation is relevant to EY’s claim that humans experienced a couple of “big wins.” These being analogous to hard takeoff, EY took the human advantages to imply something like a single-gene breakthrough.

        But (I thought it had become the conventional wisdom) there’s no direction to evolution, such that organisms get “better at obtaining utility” (or that one species is “more successful” overall

      • Silent Cal

        Let me back up and try to get a better understanding of what you’re saying. Is your position that producing an AI that is to humans as humans are to chimanzees wouldn’t be that big a deal, since humans aren’t ‘better’ than chimpanzees?

      • EY believes something like a saltation (or two) separates us from chimps; consequently, we may have a saltational development of AI. I deny the premise.

        I probably wasn’t clear on the relevance of the issue about humans getting more utility. The relevance is only indirect. (I was struck by the claim because it seems absurd at several levels.) If EY (tacitly) sees evolution as a directed process (or at least that intellectual evolution is), then it becomes easy to see the development of AI as something like physics rather than like engineering (because ‘intelligence’ then has a nature).

      • Silent Cal

        The argument I’m seeing from EY goes like this:
        1. Premise: Humans are much more capable than chimpanzees in important ways.
        2. Premise: Evolutionary theory puts fairly low limits on how much adaptation separates humans from chimpanzees.
        3. Conclusion: A small amount of adaptation can, at least sometimes, cause a large increase in capability.

        There are vague terms in the above, but I hope it can at least be a starting point.

        You’ve taken issue with his imprecise phrasing of 1), but you seem to agree with my phrasing, which I think is the important one. But you also don’t seem to be disputing 2); if anything you’re arguing that less adaptation could have taken place.

        I’m not sure how the argument relies on evolution being directed; if it’s a random walk or whatever, doesn’t that make premise 2 stronger? Premise 1 isn’t drawn from evolutionary theory at all but from observation of humans and chimpanzees today.

        (Incidentally, Robin seems to accept the argument but dispute its relevance, attributing the outsized effect to a threshold relating to communication that has no analog in the AGI situation)

      • I dispute 2.

        [Moreover, if you’re right about RH’s view, I would also disagree in that I think language result in a deep rewiring of the brain (but also that language evolved gradually).

        That communication has no analog in AGI seems a dubious argument because of its arbitrarily drawing the boundary around suitable analogs narrowly.]

        Back to disputing 2–There were apparently intense selectional pressures over a prolonged time for human intelligence to develop. (There are arguments that human intelligence must have developed suddenly. I’ve thought EY is endorsing such arguments.) I haven’t elsewhere seen the claim (or warrant for it) that there’s not enough evolutionary space between chimps and humans–in just that realm where humans are most different from chimpanzees.

      • Silent Cal

        Thanks, that makes sense.
        I don’t have anything like the expertise to quantify how much optimization separates humans from chimps in, say, bits, or equivalent-programmer-hours, or any other currency useful for present purposes, which puts me in the uncomfortable position of judging word against word based on authority. Maybe you’d like to get into the numbers?

      • Eliezer Yudkowsky

        For the record: I deny that I said anything about saltation, certainly not if that means only a few mutations. And I certainly know enough evolutionary biology with math to know better than to consider it as teleological, though we may look back at history and see trends reflecting sustained selection pressures. “Stephen Diamond” is strawmanning me, probably willfully so.

      • I didn’t say the project must not sell access to its innovations, I noted that selling was the usual behavior, highlighting it as another unusual element to be explained.

        It seems we agree that the key issue is the size/lumpiness of the mind architecture innovations, especially those that support learning. To me “AGI” just means software with high ability over a wide scope, and so I apply our long experience with software. In that experience lumpiness is sometimes high within narrow scopes, but aggregating across wide scopes lumpiness seems too low to support a foom scenario.

        The main kinds of lumpiness we have seen are based on compatibility to standards, rather than from superior software abilities. Intel, IBM, Microsoft, and Apple gained big by controlling widely used standards. Also, like the original ability of humans to share culture, we also saw big gains from interconnected computers. And had some consortium been able to keep exclusive access to that capacity, they would have realized huge gains relative to outsiders. But of course whatever gains were realized by early adopters of computer connectivity, and by firms controlling standards, they weren’t remotely enough to support anything like a foom scenario.

        Your response seems to be to say that our experience with software over the last seventy years just isn’t relevant, as AGI is an entirely different thing. I don’t see your basis for seeing it so different that we shouldn’t rely on it as the closest data we have.

      • Eliezer Yudkowsky

        It sounds like your disbelief in FOOM is based on a blanket disbelief in AGI and human general intelligence: in your worldview there just are no innovations, past or future, that significantly reduce the cost of cognition or speed up learning, and things like human/chimp differentials are to be accounted for only by humans being better communicators. Does that sound like an accurate representation of your views?

      • It sounds like your disbelief in FOOM is based on a blanket disbelief in AGI and human general intelligence: in your worldview there just are no innovations, past or future, that significantly reduce the cost of cognition or speed up learning, and things like human/chimp differentials are to be accounted for only by humans being better communicators.

        The one point that’s correct in this strawman is that the issue is general human intelligence. That is, you believe–contrary to the real possibilities provided by evolution–that human intelligence is unified rather than composite.

        But your position is contrary to reason and evidence, and if it is in fact adopted by many AI researchers, that only demonstrates their ignorance of psychology and biology.

        [In other words, get a grip: Gould was right on the main issue. There is no g factor.]

      • A sufficiently large and diverse set of tools can give you a general toolkit, but that is different from having a single general tool. I see humans and software as mostly becoming general by having large toolkits, and much less by having particular very general tools. Communication tools can create thresholds below which you can’t talk, and above which you can talk a lot. But I’m otherwise skeptical about there being critical architectures, that make a huge difference in the value gained from a set of tools.

      • Eliezer Yudkowsky

        I attempt to interpret your position as saying that you:

        1) Agree that evolutionary biology puts sharp limits on how much marginal cognitive innovation can have been added to the human innate toolbox compared to the chimpanzee innate toolbox.

        2) Agree that humans seem to have produced a much greater volume of productive cognitive content than chimps.

        3) Disagree that this points to compact ‘architectural’ innovations that decrease the cost of cognitive content.

        4) Believe 1+2 is best explained by pointing to human communication and human population sizes only.

        5) See no implication that any similar cognitive productivity differentials could apply between an AI and a human as a result of the AI containing relatively few and compact cognitive innovations.

        Sound fair/accurate?

      • I’m with you up until #5. First it is likely possible to make a human equivalent mind far faster and cheaper than are humans. Second, it may well be possible to eventually have a thousand times as many useful modules as human minds contain, and to make each module a hundred times more effective. With more better modules, a mind might be vastly better at creating cognitive content.

      • mjgeddes

        My 3-level model of cognition in terms of modelling capabilities explains past facts and predicts future FOOM:
        Cognitive capability:

        Models of the External World (Level 1)

        Models of the Self (Level 2)

        Models of Models (Level 3)

        The big leap between animals and humans ( level 1 >>2 ) was the ability to form self-models – that is what lead to language and all the social coordination and communication involved in modern society.
        The leap to level 3 will be a general purpose ‘language of thought’ functioning as a universal ontology or standard (‘a theory of everything’), capable of integrating many separate cognitive modules into a single general purpose system. That’s a FOOM.

      • 1) Agree that evolutionary biology puts sharp limits on how much
        marginal cognitive innovation can have been added to the human innate
        toolbox compared to the chimpanzee innate toolbox.

        Ever heard of arms races and sexual selection?

      • He’s asking you for a reason for expecting an (extreme) second-order exception for AI.

        If the main answer is self-improvability, then you haven’t answered RH’s argument that self-improvement, too, will (prima facie) be gradual.

      • I should add that economists wouldn’t attribute Google profits mostly to PageRank. We’d instead talk about profits being due to scale and scope economies in production and consumption, and the secondarily about what temporary cost advantages allowed a particular firm to take over that niche early on when there were many competitors for the niche.

      • Wei Dai

        It seems likely that an AGI firm will have a similar economy of scale as Google, but much higher economies of scope. In other words it will be comparatively much easier to train/specialize an AGI to work in a new field, than for Google to develop a new product. This seems like a part of “some explanation of what will have changed.”

      • And what would be the source of that wider scope of scale economies?

      • Wei Dai

        For an AGI firm to enter 100 markets, it needs to hire one R&D team, develop one AGI, make 100 copies of it, and send them to college to learn 100 fields. The cost of this is much less than 100 times the cost of entering just one market. For a software firm to enter 100 markets, it needs to hire 100 software development teams to program 100 applications. The cost of this is close to 100 times the cost of entering one market.

      • You are assuming what I am questioning – why should there be such small number of things that one team can learn that would let it make a box that could then could learn enough to enter 100 markets?

      • Wei Dai

        It doesn’t matter if it’s one team or if the AGI firm needs to hire 100 R&D teams to develop 100 modules. After it does that, it can make 100 copies of the AGI to learn 100 different fields and enter 100 markets, at a total cost that’s much less than 100 times the cost of entering one market, which is what economy of scope means, right? The only way this wouldn’t be true is if you have to develop a new module to learn each new field, but humans learn new fields all the time without having new modules, so presumably that’s not needed for AGI either.

      • I would guess that humans feature more specializations than there are fields. Just look at the number of human ancestors. For example, lobe-finned fishes are smarter than flatworms, which are smarter than sponges. Each ancestor contributed its novel abilities.

        These combined abilities allow us to tackle problems that none of our ancestors faced. But this doesn’t imply that “general intelligence” isn’t a huge conglomerate of narrow abilities.

        If indeed it is the case that there is no other way to create an efficient general intelligence, then before anyone can combine any of these many narrow AIs into a single agent, the advantage such a general AI might have will be greatly diminished. Since most markets will be satisfied by ancestors that are not fully generally intelligent but highly specialized, and which therefore feature a higher domain specific intelligence.

  • lump1

    Thank you for this post, it clears up some things that confused me about the earlier foom post. I also really appreciate the effort to explicitly state your assumptions.

    In my own thinking about foom scenarios, I keep gravitating to “damburst” stories which aren’t so much about self-improving software, but are instead about self-replicating hardware – so not the classic singularitarianism. I claim that two things we will never have enough of are energy (in a readily usable form) and computing cycles. The damburst will be the creation of a machine that can take raw material, convert it into a means of generating energy, computing cycles and more copies of itself, all without human supervision.

    I still think that this is a legit foom scenario because the hardest part to make will be the AI system that coordinates all these activities. If that AI – appropriately “embodied” – is set loose in the asteroid belt or on Mercury, that could unleash something big enough to let ten guys go from nothing to owning more of the economy than everyone else put together.

    • Self-replicating hardware whose doubling time is longer than the doubling time of the economy loses that competition. First versions of self-replicators will be very slow, and then different projects will make incrementally faster versions. How much faster do you think the best project will be than the second best project? Isn’t it likely that different designs will work best in different environments, and so even if a design is best in some environment it won’t be best in all of them?

      • IMASBA

        Since does one party have to be significantly better than the others to become the biggest? In business one party will momentarily become big pretty much through random chance alone. In our world antitrust regulations backed up by governments with militaries prevent such a party from achieving world domination, but who knows what would happen with an AI than can absorb/eliminate its closest competing AIs and governments.

      • Doug

        I think it’s pretty unlikely that in the absence of antitrust law that a single corporation would conquer the whole world. Companies may dominate single industries, but even massive corporations frequently fail to gain toeholds when they invest large resources in diversifying into new markets.

        For example Apple is one of the world’s largest corporations, but it seems very unlikely to achieve even a modicum of success if it tried to run an energy company. Same can be said of Exxon Mobil trying to become a software company. The vast majority of institutions have narrowly focused areas of expertise, that strongly augurs against global domination.

      • IMASBA

        And what if Apple could hire an army? I talked about antitrust laws AND the governments that back those up. AI asteroid enterprises would be a lot bigger than Apple is today, they would really only compete with each other in the absence of governments that are suspicious of them, Earth would be a prize for the victor. It would be similar to the British and Dutch East Companies who conquered entire countries and monopolized important trade routes.

      • What makes you think Apple couldn’t hire an army today?

      • Ronfar

        1) There tend to be laws against that kind of thing, and they’re enforced by the people with the biggest armies around.

        2) Armies capable of fighting against other large armies cost billions of dollars a year to maintain – and who knows how much to build from scratch. Even Apple would have a hard time affording that.

      • lump1

        The first versions of self-replicators might be very slow, but the way I pictured it, they could be out of sight for a long time, doing their thing. Before they came close to hitting any resource ceilings (the way self-replicating rabbits sometimes do), they could produce a mighty industrial force that could catch everyone else by surprise.

        That’s because this could all happen while everybody else is stuck in a model where they think they should produce only things for which there are customers. We’re not that interested in automated hyperproduction because we can meet customer demands with more or less traditional means. And sure, machines are gradually replacing workers, and newer machines will replace the older machines, but all this is just to make stuff for someone to buy. It can all keep going without there being much effort put into Von Neumann machines in the asteroid belt. So whoever sends out the first one might have a long run of uncontested exponential growth, and it doesn’t matter much how slowly it starts. The growth line gets pretty steep soon enough.

  • roystgnr

    Ordinary software also gets smart by containing many powerful modules. While the architecture that organizes those modules can make some difference

    …the first architecture that can create arbitrary new useful modules without human intervention could make an unprecedented difference.

    • We already have software that can create new software – they just aren’t very good. They will get better, and thus more useful. But there is not clear threshold of “useful enough”; the better the software gets, the more useful it will be.

      • roystgnr

        Doesn’t the ability to recurse seem like an interesting threshold here? We already have software that can create new software; we don’t have any software that could create anything equivalent to itself; eventually we will. In other fields, having the output be more than 100% of the input is occasionally a *critical* difference…

      • Economic growth is the entire world creating the equivalent of itself. Small systems won’t be designing new systems that are their equivalent, though they may contribute to a world of design that produces a similar outcome.

  • Doug

    In machine learning this concept is usually formalized under the concept of the No Free Lunch theorem:

    • Arthur B.

      The No Free Lunch theorem is part of a framework which tries to analyze machine learning without a theory of epistemology. PACC learning for instance has to assume a finite set of configurations because it can’t (or doesn’t want) to deal with the induction problem

      If you accept Solomonoff induction as a paradigm, the no free lunch theorem disappears.

      • If you accept Solomonoff induction as a paradigm, the no free lunch theorem disappears.

        Indeed, if Somomonoff is the basis for induction, then there is special reason to think AI will have a hard take-off: there is an ultimate algorithm, which makes the details less significant. How much belief in Solomonoff Induction should shift one’s priors about a hard takeoff may be uncertain, but a putative general induction algorithm seems the only plausible way to differentiate AI from other inventions (which don’t take off hard).

        Yet this key issue receives almost no discussion.

        [But, note, the modularity of the human intellect is argument against the Salamonoff’s induction as an epistemological solution. As is the collapse of all philosophical programs trying to formalize induction.]

      • Dan Browne

        The challenge in deciding how far Solomonoff can do (it’s ultimately uncomputable at the limit) is in how compressible is the algorithm. This, to a large extent is due to constants which cannot themselves be compressed further nor can they be derived by the system itself due to Goedel’s incompleteness theorem. That said I myself suspect that there is in fact a universal “intelligence” algorithm but it is in fact a somewhat fuzzy recognition algorithm rather than an intelligence algortihm and it has limited utility to do it’s very generality.
        I suspect that the hard effort is going to be generating the knowledge which will make the algorithms useful which itself will limit the capacity of any system to FOOM.

      • What leads you to suspect there’s a universal intelligence algorithm?

        [If there were, why radical modularity?]

        I think it must be admitted that–to the extent that there is such an algorithm–FOOM becomes more credible. It provides a basis for distinguishing the prospect for progress in AI from other engineering fields (where no general algorithm exists). But, to repeat myself, why should we expect that a general algorithm underlies the modular mind?

      • Dan Browne

        General recognition algorithm.
        And FOOM is only credible to the extent that any given intelligent algorithm can generate useful answers by simulation without real world input.

      • I take it that “to the extent” is essential.

        Wouldn’t we expect, in the general case, that a fundamental intelligent algorithm would be capable of restricting the range of potential answers obtained by simulation? Specifically, wouldn’t a general recognition algorithm be likely to have this virtue?

      • Dan Browne

        If such a system were possible (a general algorithm) it wouldn’t be possible to match inputs to the internal algorithms without an external entity passing it parameters. Agreeing on terms (limiting the range if you like) between the external entity and the computational entity. Turing formulated this beautifully when speaking of an Oracle machine.

        From that perspective a general recognition algorithm would also have the same virtue: i.e. in order to be trained (or programmed) to recognize something specific in the external world, it would need parameters passed in by the Oracle, but it would still be capable of recognizing anything at all in it’s untrained state.

      • mjgeddes

        A general method of Induction is not what distinguishes a hard-take-off AI. The distinguishing factor is the capability to form various types of models. There appear to be 3 general types of capabilities, and each step up in capability seems to be a qualitative leap.

        Models of the External World (Level 1)

        Models of the Self (Level 2)

        Models of Models (Level 3)

        Most animals don’t get past level 1. They are aware of the external world but have no self-model (possible exception: chimps). You see a big discontinuous jump in capabilities with humans that are capable of forming general models of the self (level 2).

        But alas, humans are still largely lacking level 3 capabilities! So we are still not fully general. From the perspective of agents stuck on level 2, it may appear that intelligence is not general, but that’s because level 2 agents fail to see how all abstract knowledge fits together into a coherent unified model.

        So what would a level.3 agent see? In short, level.3 awareness is ‘awareness of the theory of everything’, in the sense of an abstract model of models themselves…in short a level 3 agent indeed sees how ‘everything fits together’; the ‘many separate irreducible details’ apparently seen by the level 2 agents turn out be an illusion caused by their own cognitive limitations.

        Of course, it is trivial for an agent with even rudimentary awareness of level 3 to see that Induction is not a sufficiently general epistemological method. Bayesian Methods/Decision theory simply aren’t ‘meta enough’ 😉

      • Bayesian Methods/Decision theory isn’t a theory of induction (unless perhaps it’s combined with some Salomonoff hocus pocus).

        Induction is the process of building models based on evidence. (Some do distinguish between induction and abduction.)

        [This may be just terminological.]

  • mjgeddes

    With enough computational power, any missing details can be filled in simply by running brute-force simulations.

    • Few interesting problems have less than exponential complexity. You can never have enough computational power to brute-force those.

  • Ari

    This is a great post. The biggest problem is usually getting married to simple models. The closer you look, the more detail you see, the more messy things get.

    A lot can be said for the construal level theory though. In business or in personal life we make decisions with a lot less accurate models than we demand in other situations for signalling reasons. Although like signalling theory says, we’re just not very conscious about them unless shown explicitly.

    I wonder though, at least in social sciences, on the margin, do we need more info or is it mostly a question of coalition politics? Seems that a lot of efficiency gains could be made on local and global level but there are way too powerful coalitions to block them. Like you said, nerds coordinate rather well since they use less energy for coalition politics (at least in their own field). That is my personal experience as well.

    I suppose we can start in our private lives by calling out institutional failure when we see it, or rent-seeking (not that I’m innocent) when we see it, even if it means risking our reputation or job or whatever.

    Most of these points were made by Robin previously though, I’m just adding up here to see how the pieces fit together.

  • arch1

    Robin, I think your following statement misses the point: “…there may well be no powerful general theories to be discovered to revolutionize future AI, and give an overwhelming advantage to the first project to discover them.” Here’s why.

    1) The central notion behind foom is that among tech development activities, *GAI* development potentially leverages positive feedback to an unusual (perhaps unique) degree. This confers an unusually huge advantage (by greatly magnifying its initial lead) to a project which can a) keep its progress-feedback loop running, b) while sufficiently decoupling itself from things *outside* that loop (a process which has been crudely abstracted as “crossover”), c) sufficiently in advance of all other projects.

    2) The above scenario doesn’t depend on there being powerful general theories for revolutionizing AI. It works with *any* path to greatly enhanced AI capabilities that can be sufficiently decoupled from constraints/things (such as normal humans) outside the feedback loop, and along which one project may gain a moderate edge during the right (or wrong, depending on one’s perspective) project phase.
    (I didn’t read others’ comments so apologies for any redundancies, especially if mistaken)

    • I agree that a sufficiently decoupled sufficiently extra-strong feedback loop is sufficient. My question is why we should expect such a thing to exist.

      • arch1

        It seems to me that this would come for free given a scalable AGI above a certain capability threshhold (perhaps roughly that of a typical human AI researcher); and that absent global decline it’s just a matter of time before *that* exists.
        (I hope I’m mistaken. The foom variant in which the lead project is secretive seems a stretch but alas not out of the question, and nightmarish in its likely consequences; but *any* plausible foom scenario seems to have great potential for good or evil).

      • arch1

        I think I should change “for free” to “cheap for some, relative to the value of the prize” :-). Also maybe a sketch will make this scenario easier to criticize and (hopefully) rip to shreds:
        2) at some point, scalable “AI-researcher-equivalent AGI” technology is attained
        3) one or more projects scale up such technology radically with the goal of reaching “crossover”
        4) <tbd number (millions billions?) researcher-year-equivalents later, crossover is achieved
        5) foom

  • Pingback: Overcoming Bias : Regulating Infinity()

  • Philip Goetz

    The examples you gave are of two different types: Systems where there is a constant inherent limit on predictability, and self-adaptive “red queen” systems in which the development of a new theory or model changes the system so as to make it as unpredictable as before.

  • Pingback: Overcoming Bias : I Still Don’t Get Foom()

  • Pingback: Thoughts on Robots, AI, and Intelligence Explosion – Foundational Research()

  • Pingback: Four Background Claims - Machine Intelligence Research Institute()

  • Pingback: Overcoming Bias : Tegmark’s Book of Foom()