AI Risk Convo Synthesis

May 11, 2023

YouGov America released a survey of 20,810 American adults. … 46% say that they are "very concerned" or "somewhat concerned" about the possibility that AI will cause the end of the human race on Earth … There do not seem to be meaningful differences by region, gender, or political party. Younger people seem more concerned than older people.… Furthermore, 69% of Americans appear to support a six-month pause in "some kinds of AI development". (More)

Few researchers think that a threatening (or oblivious) superintelligence is close. Indeed, the AI researchers themselves may even be overstating the long-term risks. Ezra Karger of the Chicago Federal Reserve and Philip Tetlock of the University of Pennsylvania pitted ai experts against “superforecasters”, people who have strong track records in prediction and have been trained to avoid cognitive biases. In a study to be published this summer, they find that the median AI expert gave a 3.9% chance to an existential catastrophe (where fewer than 5,000 humans survive) owing to AI by 2100. The median superforecaster, by contrast, gave a chance of 0.38%. Why the difference? For one, AI experts may choose their field precisely because they believe it is important, a selection bias of sorts. (More)

Rumor tells me that Karger and Tetlock actually did this exercise for four different big risks (AI, nukes, bioterror, CO2), and that only on AI did extensive discussion not close the opinion gap between experts and superforecasters, or leave the gap so large. And my experience agrees: on this topic different sides tend to talk past each other. Which inspired me to invite many folks to do some conversations on AI Risk, most of which are now recorded and posted: David Hoffman & Ryan Adams, Roko, James Miller, Katja Grace, Scott Aaronson, William Eden, Zvi Mowshowitz, Ronny Fernandez, Jaan Tallinn, Roman Yampolskiy, David Orban, David Duvenaud &Agnes Callard, Tom Edwards, Daniel Faggella.

While I see a great diversity of opinion on all sides, my best one-factor model to explain opinion variance here is this: some of us “other” the AIs more. That is, we vary in terms of how “partial” we feel toward “our” end of an “us-them” axis that ranges from humans to artificial intelligence. And this axis induces a near peak of partiality; we are more inclined toward partiality re this axis than almost any other.

Those who feel more partial to their gender or race, or to natives relative to foreigners, generally hold more negative views about the abilities, motives, and inclinations of the “others”, and also more essentialist views on what “we” have in common, as well as on what “they” have in common. We also tend to see the others in a more far mode.

Similarly, those who feel more partial to humans relative to AI also tend to hold more positive views on humans, more negative views on AIs, and more essentialist views regarding what each side has in common, and on how much those matter. Yes, causality can go in both directions, both from seeing differences to othering as well as from othering to seeing differences.

Across history, our world has changed enormously, and so apparently have human beliefs, attitudes, values and mental styles. Such things have also varied greatly across space at a given time. Our capacities have increased, as have rates of change in all these things. Suggesting that even without AI, our descendants would eventually also have very different capacities, beliefs, attitudes, values, and mental styles, and that such changes might happen much faster in the future than they have in the past. In addition, even without AI, new techs such as mind-reading, social media, and genetic engineering offer new ways to change all these things.

Such differing descendants might even induce violent revolutions wherein they grab property and perhaps even life from prior generations. Prior to such revolts, they might act deceptively, misleading prior generations about their intentions. But even without such revolutions, we expect succeeding generations to become increasingly capable, to accumulate more wealth, and to displace prior generations in positions of power. Even if we get immortality. (This all holds even more strongly in Age of Em.)

Future AIs can also have different beliefs, attitudes, values, and mental styles. They might also become more capable, and eventually displace bio humans, either peacefully or via violent revolutions. Raising the question: why should we worry more about our AI descendants doing such things, relative to bio humans? The key pattern we see is: those more partial toward bio humans also more embrace reasons to be worried. Those most worried demand that AIs be far more locked down and thoroughly brainwashed to love us (i.e., “aligned”) than we accept from most humans and orgs in our world today.

For example, those more partial toward bio humans tend to see future human changes as slower, and as more likely to stay within bounds set by some human essence or core, a core that matters for more aspects of behavior. They also see human changes as being more driven by appropriate responses to changing conditions, and to “rational” arguments and evidence, and less by relatively random and impersonal social forces. Some accept that this was less true of the past, but still claim that it will be more true of our future. Those partial to humans also tend to attribute the peace and relative lack of predation in our world today, including re our “super-intelligent” firms and nations, to human goodwill toward each other, rather than to incentives set by law and competition.

While most everyone acknowledges great uncertainty regarding features of future AIs, those more partial to bio humans tend to see bigger chances of worse AI feature values. For example, they tend to see early AIs as differing more from humans in their values and styles of thinking, and also as more likely to change later further and faster in those features, as well as in capability. They tend to see AIs as less varied and more selfish, deceptive, and inclined to and able to coordinate to induce a violent revolution. They also tend to put a lower moral value on AI experiences and styles of thinking. Many even worry that AIs would have no experiences or feelings whatsoever.

Me, I see reason as having had only a modest influence on past changes in human values and styles, and expect that situation to continue into the future. Even without future AI, I expect change to accelerate, and humans values and styles to eventually change greatly, without being much limited by some unmodifiable “human core”. And I see law and competition, not human partiality, as the main forces keeping peace among humans and super-intelligent orgs.

As early AIs would be built to fit in human slots and feel comfortable to humans, I expect them to share many of our values, habits, and styles of thought. While AIs remain a minority of the economy, I expect AI values and styles to drift along with humans’, but then to continue to drift after AIs come to dominate the economy, with a drift then less influenced by humans. I expect our AI descendants to be as different from alien AIs as we now are from that alien AI’s bio ancestors. Our AIs will inherit many legacies from us. Better coordination tech will be developed, but it won’t greatly favoring AIs over humans.

I frame our civilization developing AIs as it expanding out into mind-space, with issues similar to our expanding out into the physical space of our solar system, and then into the galaxy. Both kinds of expansion should add admirably to our total civilization capacity, as well as to rates of growth. While I expect our space descendants to differ somewhat from those on Earth both today and later, I don’t see good reasons to feel very partial toward Earth descendants relative to others. Similarly, while I expect our mind-space descendants to have someone different mental styles compared to their bio human ancestors, I don’t see strong reasons to feel very partial re that axis either.

Yes, many humans today clearly seem do feel strong intuitions inclining them to feel very partial re this axis. But, as I’ve explained, I see the best explanation of this pattern as our minds over-generalizing from their inclinations re specific fertile factions to a general rule to be partial toward our factions in proportion to how different they feel from rival factions. And to many AI feels near maximally different. However, as evolutionary selection is the plausible cause of these specific fertile faction inclinations, a force that should not apply to expansions into new territory, it seems to me a mistake to embrace these anti-AI intuitions. Especially as one of the main lessons of moral philosophy is to trust your moral intuitions less:

Exposure to moral philosophy changes moral views. In line with intuitionist accounts, we find that the mechanism of change is reduced reliance on intuition, not increased reliance on deliberation. (More)

And this sort of moral intuition is widely seen as among our most questionable:

Two of the most commonly accepted signs that this intuition might nevertheless be in error seem to be that the intuition’s origin is excessively historically contingent and that the intuition reflects a hidden bias toward one’s self or one’s in-group. (More)

I should admit that, like Alan Turing, being someone on the autistic spectrum plausibly causes me to feel more tolerant of and less horrified by, deviant styles of thought, compared to those who are more neurotypical. But this doesn’t obviously make me wrong about this.

Note that even if I felt much more partial regarding this human-AI axis, the current moment still seems to me way too early to be taking strong actions to regulate AI. The arguments for apocalypse soon require a coincidence of way too many unlikely factors for me to find them plausible. Though I’m fine with supporting cheap robust policies, such as foom liability, robots-took-most-jobs insurance, and requiring that AI training be done within secure operating systems.

Added noon: Above I describe many views that correlated with being more partial re this key human-AI axis. The fact that those views tend to correlate with each other is the evidence that this partiality is an important social force driving these opinions. Otherwise those correlations become puzzling and hard to explain.

48 Comments

Sergey Alexashenko

How the Hell

May 12, 2023Liked by Robin Hanson

Very similar to my thoughts on the subject. I like to start the argument with Orson Scott Card's Hierarchy of Foreignness https://sergey.substack.com/p/what-is-to-be-done

Expand full comment

1 reply by Robin Hanson

Dave Orr

Mistake Theory

May 12, 2023

I wonder if there's a fundamental disagreement on what you are even talking about, relative to the people you've been discussing this with. (Kudos, btw, for public discussions on a contentious topic; very few try to drive forward shared understanding like you do!)

In particular, I think that your conception of AI and theirs is so different as to be talking about entirely different categories of things. For instance, you say "Future AIs can also have different beliefs, attitudes, values, and mental styles." But I think many of your interlocutors would deny that current AIs even have any of those things, that the analogues they have are so impoverished that there is nothing meaningfully there that could be considered a value. It's less that AIs will have different values, it's that they may not have values at all.

Would you say that grey goo[1] has values of growth and competition? I think Zvi would say it's mindlessly replicating, it doesn't have value. If the future was automated factories building self driving Teslas but no humans, would that be a future that seems good with entities that have very different values? Or would the loss of humanity be something to mourn?

I think you think that our AI descendants will have values and beliefs and thought styles, and there's no particular reason to think that ours are much more valuable than theirs, for the same reason that we don't think our distant ancestors' very different values are much better than ours. But I think the AI risk camp thinks that AI will likely not have values at all, just a thoughtless optimization function.

It's true that in many times in history humans have acted and believed that other tribes were not thinking beings or didn't have meaningful value, and were wrong. But I think there is a very large class of things that everyone agrees don't have values and we're all right about it, and I think you think that too.

So maybe the main difference is that when you and your discussants are drawing lines around things that think and have values versus don't, AI is on one side for you and the other for them. And then many further ideas cascade from there.

[1] https://en.wikipedia.org/wiki/Gray_goo

3 replies by Robin Hanson and others

46 more comments...

Overcoming Bias

AI Risk Convo Synthesis