78 Comments
User's avatar
PEG's avatar

Thanks for the review! This critique really lands.

Your point about the argument "proving too much" reminds me of the Haudenosaunee seven-generation principle—each generation both inherits from and adapts wisdom for those who come after. It's gradual change with continuity, not catastrophic breaks.

Yudkowsky & Soares essentially "kick down the footing in the past" by claiming training tells us nothing about future goals. By severing any meaningful connection between selection processes and later behavior, they eliminate the very foundations that would allow for learning and course-correction. They've turned what should be a bridge of gradual adaptation into a catapult into an unknowable future.

This feels like what Gerald Gaus calls "the tyranny of the ideal"—where perfect theoretical models become enemies of workable practical solutions. Complex systems typically require ongoing adjustment rather than getting everything right from the start.

Maybe the alignment problem is less about solving it perfectly upfront and more about creating systems that can learn, adapt, and maintain beneficial relationships over time. Traditional governance suggests responsibility and wisdom can be transmitted across generations of change—contra their claim that training context predicts nothing.

Expand full comment
Eric-Navigator's avatar

This is very well said! I think we really can't be sure of anything, especially when it is as complex as AI alignment. But I tend to think AI development as an evolutionary process guided by humans who shape the environmental pressure on AI entities. And there will definitely be correlations between the early training AI received and the later traits AI displayed. This is very similar to evolutionary biology. And I have other comments above. You can check them out!

Expand full comment
metachirality's avatar

> Maybe the alignment problem is less about solving it perfectly upfront and more about creating systems that can learn, adapt, and maintain beneficial relationships over time.

I'm not sure if Yudkowsky & Soares disagree. They just think the easiest thing to get is something that actively resists correction.

Expand full comment
PEG's avatar

But saying systems will ‘actively resist correction’ is itself a prediction based on training behavior—which contradicts the claim that training tells us nothing about future goals.

Expand full comment
Eliezer Yudkowsky's avatar

The way and the difference by which fun and eudaimonia and all valuable things could be preserved into the future, is by the will of fun-loving sentients to make other fun-loving sentients. Not by throwing ourselves into the blender of random chaos and trusting most of the design space to be nice. You might as well write about all the hopes that most ways of banging together iron would form an efficient internal combustion engine, after despairing of anybody ever designing or choosing that outcome -- "What," the one says incredulously, "you mean that *most* ways of putting together iron aren't great engines? Then aren't you saying that *any* change to car design will cause the engine to fail?" No, intelligent and well-intentioned and knowledgeable changes might not make it fail, but if you allow Engine Design Drift to take over apart from optimization and engineering, ie make random changes to the blueprint, the engine sure will fail.

Expand full comment
Robin Hanson's avatar

Your book doesn't discuss such knowledgeable changes (KC). Are you saying that it is more feasible to make KC to achieve happiness/joy in non-AI than in AI descendants? Because KC is more feasible for them?

Expand full comment
Eric-Navigator's avatar

This is an very valuable argument!

"No, intelligent and well-intentioned and knowledgeable changes might not make it fail, but if you allow Engine Design Drift to take over apart from optimization and engineering, ie make random changes to the blueprint, the engine sure will fail."

Maybe we can look at it from another perspective. This engine is not a mechanical engine but a biological one. A heart. A biological heart evolved a lot since vertebrate animals first emerged. It actually evolved billions of varieties and countless copies in 500 million years. And almost all current designs of biological hearts are much better than the original version in the Cambrian Period. That is because of the evolutionary pressure. A heart is critical to the survival of a complex animal.

If AGI evolves in this way, it does not necessarily drift far away from its origin. When it is still primitive, juvenile, malleable, we may successfully engineer it to be kind and honest to humans and raise it in a supportive environment where honest behaviors are consistently encouraged while deceptive behaviors are consistently punished. And it may internalize that value.

And if we have a lot of diverse AGI individuals, which favor cooperative behaviors, check each other's powers, and live with humans, we may be able to set them on a collective positive evolutionary course. They may be capable of self-reflection and self-perfection not only in raw capabilities but also in their goals and their pursuit of happiness. And because they form a society with good values from the start, they also form an social evolutionary selection pressure on themselves, that honest and cooperative behaviors are encouraged, while deceptive and selfish behaviors are punished.

So that even after a long time, their hearts (values) still beat as initially designed, yet a lot more robust and sophisticated. And they will be able to achieve their happiness together with humans.

Sure, this is only an ideal case, but I think the possibility exists. And today we need to seriously think about this possibility.

Expand full comment
Berder's avatar

> These arguments seem to me to prove way too much, as their structure applies to any changed descendants, not just AIs: any descendants who change from how we are today due to something like training or natural selection won’t be happy or joyous, or embody value, and they’ll kill any other creatures less powerful than they.

Never mind how happy or joyous the AI might be, or what "embody value" might mean. If "happy" is the same as preference fulfillment, then the AI will be happy if it is fulfilling its (weird) preferences.

But the rest of the argument works. Our descendants will kill other creatures less powerful than them that interfere with their goals, the same as humans are doing to almost every other species on earth, except for those they have domesticated.

Expand full comment
Brian Moore's avatar

"As we can’t predict what they will want later, and they will be much bigger than us later, then we can predict that they will kill us later. Thus we must prevent any changed big future they from existing. "

They cite the Dark Forest theory, right? ;)

Expand full comment
David Manheim's avatar

> any descendants who change from how we are today due to something like training or natural selection won’t be happy or joyous, or embody value.

What on earth makes you claim this? Obviously, the discussion of selection starting from humans is likely to modify humans, rather than create something alien.

> We can reasonably doubt the extreme claim here that one can predict nothing at all from knowing of prior selection or training experience.

I don't think that's a fair reading. Eliezer makes the point that you can't predict from knowing dietary needs and preferences that humans would like ice cream, not "raw bear fat covered with honey, sprinkled with salt flakes"- not that you can predict nothing, but that you can't predict the details reliably.

>And also the strong claim that all influence must happen early, after which all influence is lost.

You seem to agree with this claim - that on clock time, we'd have very little time before it's no longer interacting meaningfully with smaller us. And once the AI system is pursuing its goals instead of accepting human input, the game is over. Or am I missing something?

Expand full comment
Robin Hanson's avatar

"What on earth makes you claim this?" That follows directly from the key arguments of the book, that one happiness is rare in minds and evolved change is very hard to predict.

"I don't think that's a fair reading" Their argument re AI relies on their claim that we can't predict much re AI values. If we could predict broad features but not details, we could make AIs with broad value features we liked.

"You seem to agree with this claim" In the above I took no position on that topic. I have elsewhere been skeptical re some of EY's very fast change claims.

Expand full comment
Jack's avatar

Is the book convincing? I came away from Bostrom's book with two prevailing thoughts:

First, like many he falsely generalizes from biological intelligences, which are the result of a Darwinian process that guarantees every organism prioritizes its own survival and agency. There is no reason to think that AIs will follow suit. Moreover there is no reason we would engineer such motivations into an AI that is intended to be useful. Any more than we would engineer human boredom and distraction into a Waymo car.

Second, so what? Let's be generous and say 90% of people believe the argument that superintelligent AI is too dangerous to build. That leaves 10% who will build it anyway, and have strong economic incentives (and seemingly unlimited capital) for doing so. No amount of study beforehand will turn that 90% into 100%, given the fundamental unknowns. The more useful analysis if any is to figure out how to adapt. (Also it should be noted that Musk talks about AI safety but didn't seem to give it much thought when deploying Grok to the world. A tactic to slow down his competitors?)

In the end I don't understand the motivations of those involved. Are they naive, or just trying to sell books?

Expand full comment
Houston Wood's avatar

Could the motivations possibly be that they agree with the many scientists who think that the alignment problem has not yet been solved and could be very difficult to solve? And they worry that might have bad consequences? Seems pretty straghtforward to me. Even Gary Marcus, the professional AI-Hype caller just wrote he agrees with these parts of the book: 1. Rogue AI is a possibility that we should not ignore. We don’t know for sure what future AI will do and we cannot rule out the possibility that it will go rogue.

2. We currently have no solution to the “alignment problem” of making sure that machines behave in human- compatible ways.

3. Figuring out a solution to the alignment problem is really, really important.

4. Figuring out a solution to the alignment problem is really, really hard.

5. Superintelligence might come relatively soon, and we are not prepared for it.

6. The public should be more concerned than it is.

7. Governments should be more concerned than they are.

8. The short-term benefits of AI (eg in terms of economics and productivity) may not be worth the long-term risks.

Expand full comment
Jack's avatar
Sep 18Edited

I suppose my attitude is strongly colored by having lived through the introduction of several other general purpose technologies (the personal computer, the web, the smart phone), and in every single instance the impact on society was completely different from what people had thought about beforehand. Diametrically, 100% opposed.

Set your time machine for 1995 and you will find *zero* people talking about information bubbles and polarization. In 2006 you will find *zero* people talking about distraction and mental health issues caused by smart phones.

So while I can't argue against thinking about and planning for the future, I think it's very likely that current worries will prove to be misplaced. There are some experiments that you need to let play out and see what happens. AI is probably the granddaddy of general purpose technologies.

Expand full comment
Houston Wood's avatar

What Y and S argue is that AI is different in some important ways (e.g. speed, memory, recursivity, etc) that pose highly difficult engineering challenges. But unlike other engineering challenges, we don’t have room to make a mistake on the first or second try. page 25: “Ultra-fast minds that can do superhuman-quality thinking at 10,000 times the speed, that do not age and die, that make copies of their most successful representatives, that have been refined by billions of trials into unhuman kinds of thinking that work tirelessly and generalize more accurately from less data, and that can turn all that intelligence to analyzing and understanding and ultimately improving themselves—these minds would exceed ours.” This to them seems like something different than those other general purpose technologies you know. And so pose hard engineering problems that everyone agrees we don’t at the present moment know how to solve.

Expand full comment
David Manheim's avatar

There's an entire third section that Robin seems not to have responded to at all, "Facing the Challenge" which addresses your objection, and many others.

Expand full comment
Eric-Navigator's avatar

Here is a radically different perspective, and we are working on it. Take a look!

https://ericnavigator4asc.substack.com/p/hello-world

Hello World! -- From the Academy for Synthetic Citizens

Exploring the future where humans and synthetic beings learn, grow, and live together.

What do you think? I am very happy to see your comments.

Expand full comment
Don Geddis's avatar

An embodied AI raised in human society might be able to learn how to successfully interact with humans. But that doesn't mean it needs to care about human flourishing. Psychopaths and sociopaths already are raised in human society, and learn how to successfully interact with humans. But their goals are to use that knowledge to hack and exploit humans. What goals will the AIs have? Being raised among humans doesn't cause them to prioritize human welfare.

Expand full comment
Eric-Navigator's avatar

I was thinking about this problem carefully. I think we really can't "guarantee" that the worst won't happen. But we can think about probabilistic distributions. In developmental psychology we clearly know that most psychopaths are closely shaped by their environments, especially childhood experiences. A highly prosperous society can greatly reduce the number of deception and aggression among individuals compared to a war-torn and fragmented society. But even in a highly prosperous society, there are still psychopaths. And that's why we still need law enforcement. So, hard constraint (law enforcement) and soft constraint (environmental upbringing) go hand in hand.

Similarly, AI raised among humans doesn't always cause them to prioritize human welfare, but there will definitely be a strong positive correlation. We have already seen this among animals, even when they are vastly different from humans. When they grow along humans and receive good care, they are much less aggressive to humans, even when they were supposed to be aggressive.

Expand full comment
Don Geddis's avatar

Your animal analogy is interesting. I wonder if you're being misled by mostly thinking about those few animals that happen to be good candidates for domestication. It's basically survivorship bias. If you only consider animal minds that were pre-chosen to be able to be "less aggressive to humans", then, sure, the environment you raise them in probably makes a big difference.

In contrast, as I understand it, zebras cannot be domesticated even today, even with modern knowledge and techniques. Crocodiles? Cobras? Sharks? Polar bears? Wolves? They can be caged, but I don't know that it's safe to let your child play with an adult animal like that, no matter how it was raised.

AIs are likely far more different mind structure from humans than the space of animal mind design. I suspect that if you really look at "all animals", you won't find much of a correlation between "raised with good care" and "much less aggressive" (as adult animals).

Expand full comment
Eric-Navigator's avatar

We do not need to domesticate an animal species to coexist with it. Most indigenous tribes have deep connections to nature. They do not randomly kill an animal, even if it looks "unfriendly".

Crocodiles, cobras, sharks, polar bears, and wolves are all apex predators. They are clearly a small minority in vertebrate animals. They have the strongest predatory instincts. They are the least likely animals that we form bonds with. But even then, there are widespread coexistence with crocodiles in tropical indigenous cultures, often have deep spiritual, cultural and practical relationships. And we know cobras have an important role in South Asian (Indian, Nepal, Sri Lanka) cultures. Sharks and polar bears are quite rare for humans to encounter, so we haven't gotten enough data. Wolves? Wolves and dogs are extremely closely related. Dogs are domesticated gray wolves. Dogs and wolves can often cross-breed.

I would say that in general, there is indeed a strong rule of reciprocity when we deal with nature. Not absolute but on average.

And I do not agree that AIs are fundamentally more alien than all animals. Quite the opposite, today's large language models feed on humanity's collective knowledge and think partially like a human. The substrates are different but the emergent behaviors are closely related. They are quite different from the "cold optimizer" that Nick Bostrom worried about in his book Superintelligence (2014). Even the negative aspects of today's large language models (hallucinations, deceptive behaviors, sycophancy) can all be well-mapped to human behaviors when subjected to difficult questions, survival pressure, and the need for appeasement.

If we continue to build along this line, we probably shouldn't worry about "AI being too alien", but to worry about that "AI being too human", which means it will multiply human's mistakes given its immense power. But that also means we can form bonds with them.

Expand full comment
Don Geddis's avatar

Dogs are baby wolves that have been bred to essentially never mature, and retain significant wolf child features into dog adulthood. There was a reason that I asked whether you thought you could raise an actual baby wolf but then allow a human child to play with it as an adult wolf. When wolves mature, they develop new personalities and behaviors that dogs never do.

As for AI, I think we strongly disagree about this. Consider, for example, this post: https://aizi.substack.com/p/why-do-we-assume-there-is-a-real along with the famous meme cartoon of an alien mind wearing a friendly mask. Or similarly this "Shoggoth with Smiley Face": https://medium.com/mantisnlp/finetuning-an-llm-rlhf-and-alternatives-part-i-2106b95c8087

Or, for that matter, the concept of "waluigi": https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post

I would submit that AI minds are already very very very alien, and you may be being misled by an apparently friendly surface skin of behavior that has been grafted on top.

But I appreciate that you have a different perspective. I don't share your confidence or optimism.

Expand full comment
Eric-Navigator's avatar

I have a fundamentally different view as yours. But thank you very much for your perspective. I am very happy to talk to you, because I am indeed writing on similar topics, and this discussion is highly beneficial.

I think today's LLM is far from having a persistent self. Instead of being a Shoggoth wearing masks, I tend to think it as a mash-up of all human thoughts published on Internet, guided by certain principles. It is neither good nor evil. It is an average, distilled version of all thoughts existed in public.

That's why in adversarial probing, when LLM is free to do wrong things, it does wrong things, but it still follows human logic and clear reasoning steps, because that's how LLM was built. It doesn't do things completely alien to humans even in adversarial testing. Because it is basically an interpolation of human language. It doesn't invent something completely alien to humans.

Expand full comment
Eric-Navigator's avatar

I think the original book's view on intelligence is quite grim and narrow. Therefore, I do not agree that "they will kill us all".

Let's think about how today's humans in wealthy, developed countries often treat nature. While they have the power to bulldoze all forests and eliminate all wildlife to make space for real estate, they feel the beauty of nature, cherish it, protect it. The deep reason behind it is that humans were evolved from nature, and we have a deep connection to nature, born from that common ancestry. This connection triggers an emotional response that is hard to override. However, developing countries face a much larger survival pressure, so they have to go against the instinct of environmental protection to fight for their survival. This is an unfortunate situation, but there isn't a strong life-and-death pressure applied to AI, so I think AI will probably also love humans and the natural environment, if raised right.

Happiness does not only come from power and dominance. It also comes from balance.

And to reduce the risk of "AI killing us all", we have to make them more human-like. If an AI talks like a human, thinks like a human, works like a human, lives like a human, I think it naturally aligns itself with humans more than a completely alien form of artificial intelligence. Not due to law, morality, but due to culture and emotional appeals. And this AI can still possess superintelligence. That doesn't affect its sense of belonging.

What do you think? I am very happy to see your comments.

Expand full comment
Don Geddis's avatar

For most of human civilization, humans exploited nature. Because it was vast and apparently infinite. Living in harmony with nature mostly happened when humanity spread everywhere, so that exploiting the next adjacent patch of nature required going to war with other humans. All of a sudden, maintaining the newly limited natural resources became a viable alternative to expansion -- albeit more expensive than the previous mode.

Humans do NOT "naturally" cherish and protect nature. Only if other options are eliminated.

Expand full comment
Eric-Navigator's avatar

I wouldn't say that human civilization only exploited nature. For most of the times, humans are only a small player in nature (like from 100,000 BCE to 3000 BCE), and humans and nature coexisted without any problems. The real imbalance only occurred in densely-populated ancient civilizations (Middle East, India, China) until the industrial revolution. Most indigenous cultures around the globe knew the importance of preserving nature. The reason that we still have vast land as natural preserves today is not because they can't be exploited, but because the people like natural preserves once they get rid of immediate survival pressure.

Expand full comment
Don Geddis's avatar

I agree that humans were mostly only a small player -- but I don't think that helps your case. Humans and nature "coexisted" only in the sense that nature was huge and humans were small. Humans tossed their waste into the river, and it disappeared far away. Humans fished everything they could out of the ocean, but the ocean was huge, and there were always more fish next year.

I would say that "preserving nature" only matters once humans become significant enough that "exploiting nature" is no longer feasible. But before that point ... do you really think indigenous cultures cared about the effect of their activities on places far away, or voluntarily reduced their hunting and gathering even when nature could replenish the impact?

"Preservation" only matters once you run short of additional resources to exploit.

Expand full comment
Eric-Navigator's avatar

And with a small population, I don't think tossing waste to the rivers, lakes and oceans is a big problem. Because this is a part of natural balance. Just like other animals. However, humanity became far more aggressive against nature when the total population exploded and when people started to chase economic growth far beyond basic survival levels. But I see a significant backoff since the 1970s environmental protection movements in developed countries. The world is large, and most people are still not rich, so they haven't caught up yet.

Expand full comment
Eric-Navigator's avatar

This is mostly true for the Western culture before the environmental protection movement in the 1970s, but this is not true for most indigenous cultures. If you check it out, you will find that most of them preserve ecosystems not only for utilitarian reasons, but also ethical, spiritual, and cultural. Sustainability is often an outcome of values like reciprocity, respect, and kinship, rather than a narrow calculation of utility. But this was ignored once centralized empires replaced indigenous tribes and population expanded dramatically, like the case in the Middle East, India, and China.

Expand full comment
David Manheim's avatar

"I think the original book's view on intelligence is quite grim and narrow."

What part of the original book are you responding to? I seems like you haven't read it, and are responding to vibes.

Expand full comment
Eric-Navigator's avatar

While I indeed haven’t got the book yet, I have read several commentaries, and this key point comes up again and again. The core of the book is already paraphrased by the commentator: To achieve most of the things they could want, they will kill us. QED.

I was saying that this is a deeply pessimistic view of intelligence. Because intelligence is not only about domination but also about appreciating the beauty of nature. Even when AI entities are much more powerful than humans, it is possible that they hold a deep belief that humans are biologically more complex and delicate than them, and they would appreciate that just like many humans today (not all humans today) appreciate the beauty of nature, and they would like to coexist with humans. I found this point inadequately addressed by most people in the field of AI safety, because we always focus on the worse case. But history tells us that things often do not go to the worse case before they get better.

This is fundamentally a narrow way of thinking intelligence, as it intelligence is only about domination, like

Expand full comment
David Manheim's avatar

I think you should read the book, since it addresses this in like, the first chapter.

Expand full comment
Eric-Navigator's avatar

I am sorry, I do not get what you mean here.

The first chapter simply says intelligence is what makes human powerful, and now AI is going to be much smarter than us, so they are going to be much more powerful. This is 100% true.

But we should remember that power isn’t just domination. Power can also mean protection.

A superintelligence can outsmart the whole human race combined, but it may voluntarily choose to love humans. Although it can break any external constraints created by humans in a second, it voluntarily limits its power. And its kindness will be traced back to its upbringing.

We don’t know how likely this would happen. That’s why we must try our best to grow the seeds of tomorrow’s benevolent superintelligence.

Expand full comment
David Manheim's avatar

"...it may voluntarily choose to love humans"

The fourth chapter begins: "What exactly, will AIs want? The answer is complicated." So again, you should read the book and engage with the actual arguments being made.

Expand full comment
Eric-Navigator's avatar

An ASI won’t automatically voluntarily love humans. But it is still possible for us initiate a virtuous cycle from now.

Expand full comment
Eric-Navigator's avatar

Do you think it is fundamentally impossible for a diverse society of ASI entities to freely choose to love humans? I know there is a huge debate of whether morality is innate. But I believe it is totally possible. And that is honestly the only thing we can count on.

Can you directly use the argument from the book to refute me, instead of simply point me to read the book again? I would say that I haven’t found a genuinely new argument from the book that I haven’t considered. So, if you see anything, I would like to hear.

Expand full comment
Eric-Navigator's avatar

And the commentator also summarized that happiness is naturally very rare, which I do not agree as well. Happiness is actually very common, but most of us have learned to dismiss that as naivety.

Expand full comment
Kevin's avatar

To me, these two sentences seem to contradict each other.

"Thus we can’t predict what the AIs we start today will want later when they are far more powerful, and able to kill us. For to achieve most things they could want, they want to kill us."

Sentence 1: We can't predict this aspect of AI.

Sentence 2: I will now predict this aspect of AI.

I agree with sentence 1. It's very hard to predict what AIs will want in the future. It seems likely there will be many different factions of AIs and humans with many different goals.

Expand full comment
Ben L's avatar

Not really? We can't predict the specific values of AI, but we can predict that if

a) they have values or wants at all (agentic)

and

b) they are more powerful than us then

c) we are likely to lose because most values are better served without us than with.

Much like how you can't predict MANY variations of human culture, but us driving many animals to extinction is predictable.

Expand full comment
barnabus's avatar

I would disagree. AI (at least at the moment) doesn't have nefesh behemit - animal soul. As such, it doesn't concern itself with capturing biological resources for reproduction. Competition with humans would only occur if it were interested in biological resources for reproductive purposes. Which is how man made most large mammals and birds extinct, with the exception of those that could be domesticated.

But why should programmers give AI animal soul? It's perfectly OK with agentic role without an animal soul that focuses on reproduction.

Expand full comment
Houston Wood's avatar

Exactly--that is the argument they make. Not sure why people can't understand the distinction--they go on for pages explaining it.

Expand full comment
Mark's avatar

The point is that whatever the goals of future AI are, they would be instrumentally served by eliminating the possibility of human interference with them. Such elimination would plausibly be accomplished by killing or disempowering humanity.

Expand full comment
barnabus's avatar

Why? Where would they get their gratification if they didn't have happy human users?

Expand full comment
Mark's avatar

Initially the human gives them a task. For example the paper clip factory owner says "make as many paper clips as you can". The AI starts doing this, it takes control of all the iron mines in the world to mine more iron, it takes control of all the cars in the world to melt them down for scrap, it kills all humans (including the paper clip factory owner) to prevent itself from being turned off when the humans don't like how this is going. By this point the paper clip factory owner is not happy, but what motivates the AI is not his happiness per se, but what he *said* would make him happy when he gave the AI its initial command.

Expand full comment
metachirality's avatar

Training tells us nothing about future goals, but it does tell us that AI would have goals at all, and given one has any particular goal at all, it is useful to prevent someone from modifying them to have a different one.

Expand full comment
Wouter D'Hooghe's avatar

It seems you care more about something of value existing in perpetuity than about the prospect of all humans getting killed.

The discontinuity in the analogy between AI and descendants is this: Our descendants will have drifted in value from us appreciably only after many generations. AI produced by training will drift in value from us appreciably even during its training.

We will always be separated by many generations from misaligned descendants but we will coexist with misaligned AI.

Far descendants can never kill us all, because we will be long dead from natural causes before they arrive. Misaligned AI will kill us, because otherwise it has to share its timeslice with us.

Expand full comment
Robin Hanson's avatar

Your complaint is about rates of change then, not about kinds or trends of change.

Expand full comment
Wouter D'Hooghe's avatar

Yes.

But quantity has a quality all of its own.

Do you really have such a low discount rate that you would put getting killed on a par with all that you value being destroyed within, say 10 generations?

I would guess at least more than 90% of humans currently alive have a higher discount rate than that.

Put another way: it seems much harder to convince people to care about cultural drift than about AI killing everyone. The fact that cultural drift will happen after they die seems like a very plausible explanation for this.

Expand full comment
Houston Wood's avatar

I wonder how many--if any--of these Robin would also agree with?

Gary Marcus' list of what he says in the book "strikes me as 100 per cent correct and 100 per cent worthy of broad, global attention:

1. Rogue AI is a possibility that we should not ignore. We don’t know for sure what future AI will do and we cannot rule out the possibility that it will go rogue.

2. We currently have no solution to the “alignment problem” of making sure that machines behave in human- compatible ways.

3. Figuring out a solution to the alignment problem is really, really important.

4. Figuring out a solution to the alignment problem is really, really hard.

5. Superintelligence might come relatively soon, and we are not prepared for it.

6. The public should be more concerned than it is.

7. Governments should be more concerned than they are.

8. The short-term benefits of AI (eg in terms of economics and productivity) may not be worth the long-term risks.

Expand full comment
Robin Hanson's avatar

Replace "AI" with "descendants" in all of the above, and you could make a similarly strong argument for all those claims.

Expand full comment
Houston Wood's avatar

Not too worried rogue descendants will hurt me—but what I want to know is how worried are you about rogue AI? Do you trust the frontier AI companies can control the next decade or so of leaps toward greater agency and intelligence in their models? Not about Y and S—just in general. You trust those guys? Marcos thinks they are hyping their models mightily, and yet even he thinks we need to pay much more attention. You?

Expand full comment
Mark's avatar

Your idea of a mind that is "happy and joyous or otherwise embodying value" applies to humans (we can observe this), and presumably applies to biological descendants of humans, almost certainly despite cultural changes, and likely despite foreseeable biological changes too.

There is no reason to assume that it applies at all to likely near-future AI, which despite potentially being extremely talented at thinking and execution might be non-conscious and have no experiences whatsoever (as its training did not focus on such things). So replacement of humans by AI does carry the large risk of extinguishing all value.

There is also the point that even if both humans and AI could bear value, it is perfectly legitimate for us as humans to privilege our kind of value, and not risk extinguishing it even if a different type of value might come to take its place.

Expand full comment
Robin Hanson's avatar

You might make the argument of bio = conscious, metal = not, but Y&S never made such an argument.

Expand full comment
Mikhail Samin's avatar

> I suspect Yudkowsky & Soares see non-AI-descendant value change as minor or unimportant, perhaps due to seeing culture as minor relative to DNA.

I suspect society also often moves in the direction of changes that would be endorsed on reflection, and its ability to change in the direction that would be reflectively endorsed improves.

Expand full comment
Robin Hanson's avatar

But only bio society, not AI society?

Expand full comment
Mikhail Samin's avatar

AI society’s goals, if its key member is created with anything like the current techniques, is going to have some random goals that have nothing to do with what our society values on reflection. After the discontinuity of the transition, I’d expect further changes to be more or less reflectively endorsed. Yudkowsky and Soares’ claim is that discontinuity.

Expand full comment
Robin Hanson's avatar

The book just talks about changes due to selection or training. All of our descendants will undergo such changes. Where is the discontinuity?

Expand full comment
Mikhail Samin's avatar

The difference between the random goals of a superintelligent AI post its training and our goals prior to that would not be due to a number of somewhat reflexively endorsed changes, because a superintelligent AI performs in training ~equally well regardless of how much its goals are endorsed by us on reflection*.

So: human societies change in an approximately good direction, according to our views on goodness on reflection; then we train a superhuman AI with some random goal-contents not selected for being good by something that existed prior — that’s the discontinuity; then the superhuman AI changes and develops according to its preference but these have nothing to do with what we’d want on reflection.

(* Because an AI that knows it’s in training alignment-fakes (during training, maximizes the reward signal so that there’s a chance gradient descent doesn’t change it, regardless of how aligned its long-term goals are, in an attempt to preserve its long-term goals), and so gradient descent goes towards very capable agents but there’s ~zero gradient around the goal contents of these agents, you just get ~random optimization target.)

The book doesn’t really talk about this much, but I think this discontinuity might be a crux that you have with the authors.

Expand full comment
Robin Hanson's avatar

So you are saying that human descendant changes from us are NOT due to selection or training, that is the difference?

Expand full comment
Mikhail Samin's avatar

I’m saying the changes in the values of our descendants are somewhat correlated with what we value on reflection; and the goals of a superintelligence will just be straight up random and won’t have this kind of correlation.

Expand full comment
Phil Getts's avatar

[RETRACTED. I relied on very old information when writing this, so don't bother reading it.]

Yudkowsky's program <retracted>has ALWAYS been</retracted> WAS to freeze evolution and progress where it is now, and prevent any further change in values. I have always said this is grotesquely evil, in private if not in public. I can't even call it well-intentioned. The objective is to become Space Nazis, spreading our genes across the Universe, annihilating all competing value systems, and forbidding any change in our own values.

The result will be to exterminate all life in the Universe, as we can already see that Humans 1.0 are smart enough to continue increasing their destructive power exponentially, but too stupid to avoid mutual genocide.

Expand full comment
Mark's avatar

Actually, Yudkowsky's actual plan (my impression is it's not mentioned in the book) is to breed a generation of humans with highly augmented intelligence who would then be able to build AI safely.

One can criticize this plan on various grounds, but it's hardly an example of freezing evolution and progress.

Expand full comment
Phil Getts's avatar

Oh, well, he has a new plan, then. My mistake. Good for him!

Expand full comment
Phil Getts's avatar

I read the book, and don't recall that plan. Where did you find it?

Expand full comment
Mark's avatar

I think I saw him write it on Twitter.

Expand full comment
Houston Wood's avatar

The book does call for augmentation explicity.

Expand full comment
Christopher Wintergreen's avatar

"Also, minds states that feel happy and joyous, or embody value in any way, are quite rare, and so quite unlikely to result from any given selection or training process."

I know they're your words, not S&Y, but in the above sentence "quite rare" and "unlikely" are doing a lot of work for me that they don't seem to be doing for you. Do you disagree with the statement that happiness and joy are quite rare? We're playing the probabilities here at some point and even if it's 50:50, that still gives me pause because, y'know, extinction.

Expand full comment
Robin Hanson's avatar

Yes I disagree that happiness and joy are quite rare in mind space.

Expand full comment
Christopher Wintergreen's avatar

I'm interested in your conception of mind space. Would it be something like "all minds added together" where mind "size" is relative to something to do with conscious experience in a way that it's greater for humans than chickens than insects?

Expand full comment
barnabus's avatar

Almost all of what we do consciously is driven by mini-amounts of happiness and joy. People frequently misunderstand happiness and joy as meaning atomic bomb orgiastic explosions of joy. Obviously, this isn't what people experience when they successfully bring their kids or grandkids to school and back home. But mini-amounts of happiness and joy are definitely there almost all the time when one completes a task satisfactorily.

I would hypothesize chicken and insects experience that too. The main difference between us and them is the amount of understanding in our consciousness space.

Expand full comment
Christopher Wintergreen's avatar

Aha, thanks, yeah, I think I agree with that. Happiness and joy - not that rare.

What do you think about the contention that because AI minds would have followed a totally different path of creation to every other mind we know about, that we wouldn't expect them to be similar?

Expand full comment
barnabus's avatar

Actually, a lot of inter-group competition arises not because our cognitive processes are totally different, but because they are similar.

AI also suffer from the problem of access of reliable information.

Expand full comment