Discussion about this post

User's avatar
PEG's avatar

Thanks for the review! This critique really lands.

Your point about the argument "proving too much" reminds me of the Haudenosaunee seven-generation principle—each generation both inherits from and adapts wisdom for those who come after. It's gradual change with continuity, not catastrophic breaks.

Yudkowsky & Soares essentially "kick down the footing in the past" by claiming training tells us nothing about future goals. By severing any meaningful connection between selection processes and later behavior, they eliminate the very foundations that would allow for learning and course-correction. They've turned what should be a bridge of gradual adaptation into a catapult into an unknowable future.

This feels like what Gerald Gaus calls "the tyranny of the ideal"—where perfect theoretical models become enemies of workable practical solutions. Complex systems typically require ongoing adjustment rather than getting everything right from the start.

Maybe the alignment problem is less about solving it perfectly upfront and more about creating systems that can learn, adapt, and maintain beneficial relationships over time. Traditional governance suggests responsibility and wisdom can be transmitted across generations of change—contra their claim that training context predicts nothing.

Expand full comment
Eliezer Yudkowsky's avatar

The way and the difference by which fun and eudaimonia and all valuable things could be preserved into the future, is by the will of fun-loving sentients to make other fun-loving sentients. Not by throwing ourselves into the blender of random chaos and trusting most of the design space to be nice. You might as well write about all the hopes that most ways of banging together iron would form an efficient internal combustion engine, after despairing of anybody ever designing or choosing that outcome -- "What," the one says incredulously, "you mean that *most* ways of putting together iron aren't great engines? Then aren't you saying that *any* change to car design will cause the engine to fail?" No, intelligent and well-intentioned and knowledgeable changes might not make it fail, but if you allow Engine Design Drift to take over apart from optimization and engineering, ie make random changes to the blueprint, the engine sure will fail.

Expand full comment
76 more comments...

No posts

Ready for more?