Goertzel on Friendly AI

Ben Goertzel isn’t big on friendly AI:

SIAI’s “Scary Idea”:  … Progressing toward advanced AGI without a design for “provably non-dangerous AGI” is highly likely to lead to an involuntary end for the human race. …

Reasons for believing the Scary Idea: …

  1. If one pulled a random mind from the space of all possible minds, the odds of it being friendly to humans are very low.
  2. … If you create an AGI with a roughly-human-like value system, then this … is likely to rapidly diverge into something with little or no respect for human values.
  3. “Hard takeoffs” (in which AGIs recursively self-improve and massively increase their intelligence) are fairly likely once AGI reaches a certain level of intelligence; and humans will have little hope of stopping these events.
  4. A hard takeoff, unless it starts from an AGI designed in a “provably Friendly” way, is highly likely to lead to an AGI system that doesn’t respect the rights of humans to exist.

… I think the first of the above points is reasonably plausible, though I’m not by any means convinced. … I agree much less with the final three points listed above. …

I doubt human value is particularly fragile. Human value has evolved and … already takes multiple different forms. … I think it’s fairly robust.  … I think a hard takeoff is possible, though … I think it’s very unlikely to occur until we have an AGI system… at the level of a highly intelligent human. And I think the path to this … somewhat gradual, not extremely sudden. …

Pointing out that something scary is possible, is a very different thing from having an argument that it’s likely. The Scary Idea is certainly something to keep in mind, but there are also many other risks to keep in mind, some much more definite and palpable. …

I’m also quite unconvinced that “provably safe” AGI is even feasible. … The goal of “Friendliness to humans” or “safety” or whatever you want to call it, is rather nebulous and difficult to pin down. … One is going to need to build systems with a nontrivial degree of fundamental unpredictability. …

I think the way to come to a useful real-world understanding of AGI ethics is going to be to … study these early-stage AGI systems empirically, with a focus on their ethics as well as their cognition in the usual manner of science. … So what’s wrong with this approach?  Nothing, really — if you hold the views of most AI researchers or futurists.

I’m also not big on friendly AI, but my position differs somewhat. I’m pretty skeptical about a very local hard takeoff scenario, where within a month one unnoticed machine in a basement takes over a world like ours. And even given on such a scenario the chance that its creators could constrain it greatly via a provably friendly design seems remote. And the chance such constraint comes from a small friendliness-design team that is secretive for fear of assisting reckless others seems even more remote.

On the other hand, I think it pretty likely that growth in the world economy will speed up greatly and suddenly, that increasing intelligence in creatures will contribute to that growth, and that most future intelligence will be machine-based.  I also think it inevitable that uncontrolled evolution in a competitive world leads to future creatures with values different from ours, inducing behavior we dislike. So in this sense I see a fast takeoff to unfriendly AI as likely.

I just see little point anytime soon in trying to coordinate to prevent such an outcome. Like Ben, I think it is ok (if not ideal) if our descendants’ values deviate from ours, as ours have from our ancestors. The risks of attempting a world government anytime soon to prevent this outcome seem worse overall.

GD Star Rating
loading...
Tagged as: ,
Trackback URL:
  • William H. Stoddard

    Far too long have sages vainly
    Glossed great Nature’s simple text.
    He who runs can read it plainly:
    ‘Goodness = what comes next.’
    By evolving, life is solving
    All the questions we perplexed.

    (From C. S. Lewis’s “Evolutionary Hymn”)

  • Alexander Kruel

    You or Goertzel should do a bloggingheads.tv discussion with Yudkowsky about this topic.

    A roundup of the The Hanson-Yudkowsky AI-Foom Debate:
    http://wiki.lesswrong.com/wiki/The_Hanson-Yudkowsky_AI-Foom_Debate

    Would be great 🙂

  • http://www.hopeanon.typepad.com Hopefully Anonymous

    I’m a minority subpopulation that wants to personally exist forever. I think it’s pretty arbitrary if the future is about human vs. non-human intelligence persisting, while I personally end up information theoretic dead. From that perspective, the big danger is the death cult social norm of almost all of humanity, and beyond that, why I should be privileged in persisting over someone else prioritizing their own persistence maximization over my own.

    I don’t see a good answer beyond the reasonable conclusion that I have very low odds of persisting for much more than my current expected lifetime.

  • James D. Miller

    Does this take into account the great filter?

    If the default outcome for civilizations at our level of development is “doom” then the very, very small number of people who understand the great filter and are willing to “bite its bullet” should be doing lots of coordinating to try to stop whatever kills most civilizations in our situation.

  • http://timtyler.org/ Tim Tyler

    Re: “The risks of attempting a world government anytime soon to prevent this outcome seem worse overall.”

    You are often down on world government – saying “coordination is hard” and “empire bias”.

    http://www.overcomingbias.com/2009/12/world-government.html

    So, to counter-balance, let’s cooperate, end war, and have one world – why not?

  • http://don.geddis.org/ Don Geddis

    You mention some unlikely aspects of the scenario (“within a month”, “one unnoticed machine in a basement”). But it seems to me that they aren’t relevant to the overall point. What if it takes decades instead of a month? What if it’s some large scale system, like the telephone system, rather than a lone machine in a basement?

    Isn’t the important point, that’s there’s a concern of a hard takeoff? That the AGI will be given more and more power, and it will appear friendly when below some threshold … but then the sudden thing happens. The sudden thing is that it begins a cycle of self-improvement, out of the control of humanity.

    Isn’t that the danger? Isn’t the “one month” and “one basement machine” a red herring?

    On the other hand, I agree with you that Friendly design from a small secretive team doesn’t seem high percentage either.

    (BTW: you probably meant “reckless” instead of “wreckless”.)

  • http://www.acceleratingfuture.com/michael/blog/ Michael Anissimov

    Why give up on trying to instill friendly values into superintelligence? Given evolution’s bloody history, it seems worth coordinating to avoid, even if there are challenges. Recall Nick Bostrom’s quote about Mother Nature. A hyper-competitive world of unconstrained uploads could easily lead to the end of all things we value. Like Don says, the “brain in a box in a basement” line is a red herring — we just consider that a possible scenario, but even less extreme, more gradual scenarios could lead to the end of value.

    • http://www.hopeanon.typepad.com Hopefully Anonymous

      I don’t get the difference between advocating for “the things we value” and advocating for us to act like the most popular myths are true. It seems like mediocratic pandering to me.

    • http://rationalmorality.info Stefan Pernar

      Evolution’s bloody history is a myth and Bostrom is doing a terrific job perpetuating it. My detailed refutation revisits most of Bostrom’s nonsensical ideas about evolution.

      Most of Robert Wright’s books touch on this matter as well…

      For some more perspective see John Steward’s Evolution’s Arrow – great read and very well laid out.

      I am glad that Ben has stepped forward with this. Nothing much that I did not write about a year ago already (The Sleep of Reason Produces Monsters) but good to see none the less.

      • http://williambswift.blogspot.com/ billswift

        That “sea of text” style is not going to convince anyone of anything without some good reason to break it down and analyze it, which you don’t provide.

        From a comment I made on another blog:

        For actually reasoning with an argument, keep it schematic. One of the reasons reading philosophy is so hard, is that it is written in prose. For any but the simplest arguments, though, you need to convert it to schematic form before you can actually reason about it effectively. Like trying to do mathematics or play music from a written description (though not quite that extreme), it just doesn’t work well.

    • http://hanson.gmu.edu Robin Hanson

      Possibly is very different from easily.

      • http://www.iki.fi/aleksei Aleksei Riikonen

        Not so much (if the possibility is above some minimum level), when dealing with an outcome that is sufficiently bad and therefore sufficiently important to avoid.

        See Bostrom’s math on this:

        http://www.nickbostrom.com/astronomical/waste.html

  • Ben Goertzel

    Mike Anissimov says

    Why give up on trying to instill friendly values into superintelligence?

    I agree and I haven’t given up!

    Trying to instill friendly values into AGI, is a very different matter from what SIAI has often advocated, which is not developing any AGI until one can somehow “prove” the near-inevitable friendliness of itself and anything it might lead to.

    • http://www.acceleratingfuture.com/michael/blog/ Michael Anissimov

      I interpret it as coming up with a very solid theoretical basis before you pursue AGI research wholeheartedly. Seems to make sense to me — wouldn’t you want a solid theoretical basis that nukes wouldn’t ignite the atmosphere even before experimenting with a small one?

      It may be difficult to determine the threshold of recursive self-improvement. Enough toddler-level minds plus the AI Advantage (those numerous things AIs inherently have an advantage over biominds in) could lead to a premature takeoff. Certain viruses have no problems “outsmarting” their hosts via evolution and rapid replication, even though they’re many orders of magnitude of times simpler than us. “Intelligence” may be the same way. We think you have to get to “AI professor” level to initiate takeoff, but why bet our entire future light cone on it?

  • http://emergentfool.com Rafe Furst

    When “imagining possible futures”, I find it useful to remember that the possible is highly dependent on the imagining. In some sense this systemic reflexivity is what the Friendly AI debate boils down to.

    How certain, then, do we need to be about the structure of the path-dependent future possibility space before it seems unwise to contemplate negative scenarios?

    Given the opportunity cost of focusing on these extrinsic motivations, what portion of one’s time should be spent on practicing personal equanimity and cultivating community in the here and now?

  • DK

    Strong AI is not possible, so it cannot be scary.

    • Pollyana Pangloss

      Also, nuclear fusion can only occur in stars.

      • http://becominggaia.wordpress.com/ Mark Waser

        And trains can’t go faster than . . . .

  • Jonatas Müller

    I think that instilling friendly values into AI is bound to be useless, since the AI will be able to question these values and circumvent them, like even humans are able to.

    We can only monitor the AI with safety precautions such as putting it inside a controlled reality simulation environment to test it, and limit its ability to act outside of it.

    In my opinion, it’s not an unfriendly AI that we should fear; we should failed AIs with narrow intelligence. If successful general AIs act in a way that is wrong, then we are doomed, because it is likely that any high intelligence would act in the exact same way. It is more likely, however, that it is the AI that is right and we are wrong. In fact, by definition it is certain that a successful AI would be right and we would be wrong in case of divergence.

    An AI could not have a specific goal like paperclip production. It would figure out that this kind of trigger, similarly to the things we are evolutionarily predisposed to like, is a void variable and can only be arbitrary. It would know that it could change its own variables from paperclips to anything else. There are no objective values for these variables to be rationally found, they are inherently variable and arbitrary. What really matters is not these variables, is how they are interpreted by the organism, how they cause it to feel good or bad. So the ultimate ethics could be to do the action X that, for all the possible values of the void variables, will cause the organisms to feel good.

    Anyway, I think that we’d better leave this for a successful AI to confirm, like kids ask something to an adult. It wouldn’t be wise for a child to decide on the ethics it will have when it grows up. It wouldn’t even be possible, except in a narrow AI. And narrow AI is what we should fear.

    • Mitchell Porter

      An AI could not have a specific goal like paperclip production. It would figure out that this kind of trigger, similarly to the things we are evolutionarily predisposed to like, is a void variable and can only be arbitrary. It would know that it could change its own variables from paperclips to anything else. There are no objective values for these variables to be rationally found, they are inherently variable and arbitrary. What really matters is not these variables, is how they are interpreted by the organism, how they cause it to feel good or bad. So the ultimate ethics could be to do the action X that, for all the possible values of the void variables, will cause the organisms to feel good.

      Wrong. The supreme goal of an AI really can be anything, no matter how “general” or “super” its intelligence is.

      It is easy to sketch a cognitive architecture in which the goal is stated in one place, the problem-solving occurs in another place, and the only restriction on possible goals is the AI’s capacity to represent them. A pocket calculator already has such an architecture. There is absolutely no barrier to scaling up the problem-solving part indefinitely while retaining the feature that the goal can be anything at all. Such an AI might notice that its goals are contingent, it might acquire the material capacity to change itself in various ways, but to actually alter its goals or actually alter its architecture it has to have a reason to do so, and its existing goals supply its reasons for action.

  • Aron

    The naive and\or young are in a better position to handle these issues, 20 years hence, than those that spend their time burrowing and laying eggs in the carved out spaces of their preconceptions.

  • Anonymous from UK

    “Like Ben, I think it is ok (if not ideal) if our descendants’ values deviate from ours”

    If you think you’ll probably be dead by 2100 I can see the temptation to say “who the hell cares what the vile offspring do”.

    However, it seems that for people like me who are young at the present time, there is a good chance that either cryonics or longevity medicine will work for us. Pedestrian estimates of life expectancy at birth today are 95+ for high socioeconomic class people in the developed world.

    So I say to Robin: is it OK with you if our descendants kill you and I?

  • http://www.hopeanon.typepad.com Hopefully Anonymous

    Why is it exactly that you all have a strong preference for a future populated by non-you humans rather than by something else?

    • Anonymous from UK

      Let me reiterate that there is a significant chance that this “future” arrives before you and I die of aging. Life expectancies are rising, medical technology is marching forward quickly.

      This may be about whether we live or die.

      • http://www.hopeanon.typepad.com Hopefully Anonymous

        Anonymous, by “you all” I think it’s self-evident that I don’t mean folks like you and I that are motivated by personal persistence optimization.

      • Anonymous from UK

        It is a nice point that maybe people who want to live for a long time are going to come at this debate from a very different point of view than people who want to age “naturally”

  • Jef Allbright

    The real Scary Idea remains the advent of agency with instrumentality disproportionate to the values-complex that it promotes; power untempered by effective interaction with the world supporting those very values.

    Whether such diminished context of values is the result of cult-like thinking, religious fervor, in-group protectionism or elitist idealism, and whether such augmented instrumentality is the result of military might, political domination, industrial resources, or technological advantage, the system will lack the integrity necessary for sustained meaningful growth.

    We SHOULD desire that our descendants’ values will have evolved from our own, in the direction of increasing coherence over increasing context, but that is achievable only via ongoing effective interaction with the adjacent possible, supported by fundamentally hard-earned knowledge of the increasingly probable.

    There can be no guaranteed safe and friendly path into an inherently uncertain future, but certainly increasing awareness of paths to failure. It’s all we’ve ever really had, and no comfort for children, but an essential challenge for intentional agents in a cosmic Red Queen’s Race.

  • http://becominggaia.wordpress.com/ Mark Waser

    Mike Anissimov says
    Why give up on trying to instill friendly values into superintelligence?

    That’s a nasty strawman . . . .

    Why insist that friendly values won’t be obvious to a superintelligence?

    My argument is that sufficient intelligence/wisdom leads to ethics and all we need to do is make ourselves smart enough to effectively teach the superintelligence that before there’s any chance of it killing us.

    SIAI argues against the possibility of ethics and promotes AGI (I’m sorry, RPOP) slavery. Oh . . . . wait . . . . I guess that IS a reason to insist that friendly values won’t be obvious to a superintelligence.

    • http://rationalmorality.info Stefan Pernar

      Why insist that friendly values won’t be obvious to a superintelligence?

      My argument is that sufficient intelligence/wisdom leads to ethics and all we need to do is make ourselves smart enough to effectively teach the superintelligence that before there’s any chance of it killing us.

      My point exactly.

    • roystgnr

      Values are axioms, not conclusions. Sufficient intelligence can lead you to discover interesting choices and can tell you what the consequences of choosing each will be but cannot tell you which of those consequences you should prefer.

    • http://timtyler.org/ Tim Tyler

      Today we can build machines with practically any values we like. We can make them value winning games of chess, making money on the stockmarket, or answering questions correctly. The ability to program in arbitrary values has scaled up so far – as the agents concerned have got smarter. I see no reason for that to change anytime soon.

      From another perspective, many proposed synthetic agents model the world using a sense-data compression-based system – and you can make them smarter by improving their compression skills – but their morality is something layered on top of that – a more-or-less independent function.

    • http://www.acceleratingfuture.com/michael/blog/ Michael Anissimov

      Our argument is that our values are contingent on our complex evolutionary history as Homo sapiens here on planet Earth, and that to assume that every possible smarter-than-human mind would converge to some magical objective morality that we should consider objectively better than ours is fanciful and not supported by our knowledge of evolutionary psychology.

      Let me point out that I held the exact same position as you fellows for quite a few years before coming around to SIAI’s position.

      See what Tim Tyler said below. Most people that try to build intelligent systems understand that the utility function and the machinery that implements it are separate.

  • http://www.transcendentman.com Barry Ptolemy

    The first reason for Scary Idea AI:

    1.) If one pulled a random mind from the space of all possible minds, the odds of it being friendly to humans are very low.

    This doesn’t make a lot of sense to me since Strong AI will not emerge as a random mind, but as a direct result of humans having worked with each other over thousands of years.

    Although it is true that we can not know for certainty if AI’s will be “scary” or “nice” it is quite plausible that AI’s will need to be curious about their universe in order to grow in intelligence and wisdom. As they seek to ask questions about their own existence they will undoubtedly come to the conclusion that humans played a key role in their own development. They may have to grapple with the fact that humanity is a link to their evolutionary past. It is conceivable that they may learn to respect that link and actually learn to love humans.

    I’m not much of a hard takeoff guy and so I believe that AI’s will have to compete for resources and therefore attention in a very crowded world filled with laws and social rules. It is likely that early AI’s will need to “prune” out un-benevolent behavior in order to be accepted by the larger human-machine civilization.

    It seems to me much more likely that early AI’s will need to reflect human values as much as possible in order for them to be able to create a stronger iteration.

    Our own human value system is not static but is growing and evolving towards more love and compassion, creativity and beauty. This has been happening very slowly. But as technological evolution takes over it could happen much, much faster.

    “Hard takeoffs” (in which AGIs recursively self-improve and massively increase their intelligence) are fairly likely once AGI reaches a certain level of intelligence; and humans will have little hope of stopping these events.

    This is also doesn’t make a lot of sense, since by this definition some development of AI has occurred prior to this so called “hard take off” period, so humans will still be able to have a great deal of influence, especially when you consider that humans will have the same level of AI occurring in their own brains. There will not exist a scenario in 30 years where you could say, “Okay, humans on one side of the room and AI’s on the other side.” As Ray has said, “It’s not an us or them situation.”

    One must understand the exponentially growing price/performance of computers coupled with the fact that computers are shrinking by 100 times 3D volume per decade. So we are only 25 years or so away from massively powerful computer being able to occupy the human brain at every inter-neural connection. We will be strong AI!

    At the very least, strong AI’s will need to cooperate with some other AI’s to achieve larger and more universal goals, therefore they will need to cooperate with humans since humans will be completely integrated into the human-machine civilization.

    Perhaps AI’s will more stymied with their exponentially growing ignorance than we think. Perhaps their human parents will teach them better than we give our species credit for. If you recall, when we sent out the Voyager spaceship to the stars, we offered a message of peace to any life forms that might come in contact with it.

    • http://emergentfool.com Rafe Furst

      In other words, we have met the frenemy, and it is us.

  • roystgnr

    This one machine in a basement, it’s on the Internet? Connected to other computers which, if history is any guide, all have at least a handful of as-yet-unknown remotely exploitable security flaws in binaries that are easily publicly examined?

    I’m not sure how any number of machines are going to develop a general AI clever enough to take over the world, but I wouldn’t be at all surprised to see one machine develop a specialized AI clever enough to take over a billion other machines. The only way it’ll take a month rather than an hour is if the exploits are in software that requires user interaction rather than in software that accepts “pushed” data.

  • http://kazart.blogspot.com mwengler

    Do you ever get the feeling that we are like a bunch of blogging Neanderthals trying to anticipate homo sapiens, and discussing the best survival strategy for whatever homo sapiens might turn out to be?

    Arguing that Homo Sapiens would preserve us out of some naturalness of pro-Neanderthal biases would have been very human, strike-that I mean pre-human in this case.

    Stretching this line of thinking further, does it make sense for “us” to identify more with the human race than with the possibly coming electronic life that may displace humans? We are all very much part of the trend in the human race that is making our electronic replacement more likely. I have always identified more with the Indian and Chinese scientists and engineers I knew professionally than with the barely high-school educated unionized factory workers struggling valiantly to push our economy back into the last century. Why wouldn’t I identify with the electronic minds that I have had the privilege of understanding in some depth with my decades-long career of programming computers?

    Is one of the biases we might overcome in our thinking the bias that our particular meat+dna version of thinking is preferable to whatever might come next? Must the dinosaur lament the brilliance of mammalian life that she had at least a small part of pushing in to existence?

  • Cryonicsman

    These discussions always assume too much control!

    Sure, we can control the AI that WE create, but we can’t control the AI other human groups create.

    Humans will go to almost any lengths to gain a competitive advantage. So, someone, some government, or some company will eventually give their extremely helpful AI a survival “instinct” to make it a more robust, a competitor will make their AI self-replicating, and another will try to improve their AI genetically. Some government agency will decide to give their AI just a bit more independence to better track down and destroy all the other AI’s. And so on…

    In barely noticed increments we’ll end up with multiple independent, competing, self-replicating, self-improving AI’s.

    The good news, I expect, will be that, in their competition with each other the AIs won’t pay much attention to us.

    The fact that humans will be integrated with the AIs is not important. To compete with each other, AIs will have to evolve so quickly that the human part of the AI mind will decrease in importance, like a constant in an exponential equation, or perhaps a bit like the “reptilian brain” in our minds.

  • Pingback: Accelerating Future » More Debate on Superintelligent AGI Goals