48 Comments

Very similar to my thoughts on the subject. I like to start the argument with Orson Scott Card's Hierarchy of Foreignness https://sergey.substack.com/p/what-is-to-be-done

Expand full comment
author

A thoughtful post.

Expand full comment

I wonder if there's a fundamental disagreement on what you are even talking about, relative to the people you've been discussing this with. (Kudos, btw, for public discussions on a contentious topic; very few try to drive forward shared understanding like you do!)

In particular, I think that your conception of AI and theirs is so different as to be talking about entirely different categories of things. For instance, you say "Future AIs can also have different beliefs, attitudes, values, and mental styles." But I think many of your interlocutors would deny that current AIs even have any of those things, that the analogues they have are so impoverished that there is nothing meaningfully there that could be considered a value. It's less that AIs will have different values, it's that they may not have values at all.

Would you say that grey goo[1] has values of growth and competition? I think Zvi would say it's mindlessly replicating, it doesn't have value. If the future was automated factories building self driving Teslas but no humans, would that be a future that seems good with entities that have very different values? Or would the loss of humanity be something to mourn?

I think you think that our AI descendants will have values and beliefs and thought styles, and there's no particular reason to think that ours are much more valuable than theirs, for the same reason that we don't think our distant ancestors' very different values are much better than ours. But I think the AI risk camp thinks that AI will likely not have values at all, just a thoughtless optimization function.

It's true that in many times in history humans have acted and believed that other tribes were not thinking beings or didn't have meaningful value, and were wrong. But I think there is a very large class of things that everyone agrees don't have values and we're all right about it, and I think you think that too.

So maybe the main difference is that when you and your discussants are drawing lines around things that think and have values versus don't, AI is on one side for you and the other for them. And then many further ideas cascade from there.

[1] https://en.wikipedia.org/wiki/Gray_goo

Expand full comment
author
May 12, 2023·edited May 12, 2023Author

There's no good need to interpret my words like "values" narrowly; my points here work fine if we interpret such words very broadly. From that view, you are suggesting that AIs may have very different "values" etc. And I'm saying such a view is correlated with more "othering" AIs.

Expand full comment

I think you're saying, "please use my version of the word 'values' instead of the meaning the other folks have." I don't find that usually helps resolve disagreements; better to taboo the term or break it down somehow.

Also I just find it hard to have such an expansive meaning of "values" that we could meaningfully say that a rock has values or my car has values, unlike say a dog. There's a line that exists somewhere!

Expand full comment

Nothing is exactly equal, and "all models are wrong; but some are useful".

Does treating current LLMs as conscious agents helps to wring more value out of them? It would seem to me, at least for chat models, it does - it is trained on data from interactions between conscious agents, and prompting it that way helps to guide it to areas of better value (fix mistakes, provide alternative explanations, describe right steps even if it cannot follow them itself).

Does treating AI as completely inscrutable bunch of numbers and computations help anything?

Obviously they have values - they are bunch of values in electric circuits. And those values are, without question, different to values we have in our biological neural networks. They have different modes of communicating those values from and into the world too.

But the very fact that we can influence and _understand_ their output (even when we know nothing about inner processes) makes them compatible with humans.

Expand full comment

So you are psychologizing people’s legitimate concerns about alignment? Shame on you!

Expand full comment

Interesting if a bit hard to understand.

(Long somewhat repetitive comment aheadk)

But if “fertile factions” means that you want to collaborate with and strengthen factions that spreads your genes and values, and “partiality” is that you put higher value on specific groups or things, and put less or negative value on different things or change; then that is very interesting

I have some relatives that are very nationalistic and rightwing, and some that are anti trans; i wont discuss whether they are right or wrong here, more some interesting things ive noticed on differences between them and me in thinking

Im liberal and pretty libertarian btw; im also Effective altruist aligned and longtermist.

One thing ive noticed is that when they talk about doom or long term catastrophe, they are far more concerned about short term deviations from what they value, and see changes of almost any kind as dangerous, unless the change specificlly strengthen their faction.

Im of the mindset that most change is fine, and that competition and freedom is good as it lets us navigate to better states then the current ones

But while they see competition and freedom of various kinds as having been good in the past, now its too risky and should generally be held back if it conceivebly can weaken the factions they support.

I think they are far too worried about change, and have shown various times in the past when people have panicked about a new tech or social change or demographic change, but things have turned out fine or great, and the panick have conceivably slowed down progress or shut down new ideas.

But they usually find these as bizarre examples that has nothing to do with the curreng thing. Or sonetimes have said that the fact they panicked is the reason that things are ok now, and thus its good to be super wary

They have ocassionally also said that im disrespectful to our ancestors and im calling them stupid. When i talk about progress statistics or past misinformation, they also sometimes say that im disrespectful.

One relative says that he is super concerned about the long term future, but im also an longtermist and we seem to think completely differently on lots of things. He loves to talk about civilizations that have collapsed in the past and how thats a proof of how fragile ours is, and that we need to protect it. Then when i say something along the lines of “new civilizations popped up and life is wag better now, i dont think we should be too concerned about those past civs. We should mostly be concerned about if an risk irreversibly or strongly destroys civilizations and humanity” he thinks im totally missing the point and being borderline a national traitor.

One hypothetical scenario i sometimes think about is “if your kids all became muslims and an opposite ethnicity, but are very happy and prosperous, how do you feel about that?”

Im completely fin with that, but my relative is horrified by it from what i can tell.

We also have completely different views on threats, conformity, and intuitions: i view moral intuitions woth suspicion, while he sees them as super strong proof, and that past atrocities like slavery primarly happen from intellectuals or elites coming up with fancy rationalizations to excuse away the intuitions everyone have of it being terrible.

And on conformity, i see conformity of a sign of stagnation and that something is wrong. Conformity also seems correlated with individuals being oppresed or stopped from flourishing. My personal experience with conformity is also that it makes me very unhappy, as im gay, autistic and adhd, and in lots of ways very odd (im an effective altruist, rationalist, a furry)

His view of conformity is the opposite: its something very good, it creates social harmony makes you stronger against threats from the outside. Its when everyone is conforming that progress can happen from cooperation and building things together, and when conformity doesnt happen, conflict inevetably happens.

His personal experience of conformity is also the opposite: he felt strongly disrespected by society growing up from being a mormon, and mocked and ridiculed. His most positive experiences were of the church and when everyone were the same. People talking about diversity or being different were associated with stress and threats for him.

And of course we view new ideas and testing things differently. I think you have to strongly prove something is bad to ban it or regulate it, while he thinks that very slight proofs of badness is sufficient to regulate or ban it as threats are bad fpr society

I should probably stop here, maybe this was interesting.

Expand full comment

This nicely encapsulates the whole disagreement. It isn't necessarily the case that those most bullish on AI are looking forward to an EM-style human future of their own, but it seems 'right' from what I've seen online.

From a purely (or properly) human perspective, however, getting killed by one's actual human son isn't the same kind of loss as killing one's only son. Someone else's son killing your son is a similar loss, your son killing someone else's son (and taking his wife) isn't (necessarily) a loss at all. Us-vs-them debates are just a proxy for this. Perhaps it is possible to transcend this basic human programming entirely, or perhaps those who say they have are mentally ill in some way. For the human-essentialist, all of my descendants making digital copies of their brain-states before killing themselves is as grave a failure as them all being turned into paperclips.

Either way I won't be there watching it happen.

Expand full comment

How do we attribute consciousness to other entities, and therefore empathize with them? I believe that Robin Hanson is conscious – even though I can't define exactly what that means – because we share a common biological basis, and a similar process by which we developed into adulthood. No words that come out of Robin's mouth or off his keyboard would ever convince me of his consciousness; LLMs show perfectly well that we can be aped to arbitrary precision by a large statistical model. For our pets we have a lesser degree of experiential overlap and so they merit a kind of semi-empathy.

For synthetic AIs there is this unbridgeable gulf in terms of otherness, and I don't know how that would ever be overcome. What would have to happen for anyone to develop empathy for GPT-N? I think you'd have to do that in order to lessen people's fears about being replaced. The scenario you lay out in Age of Em is an interesting one, because I *could* see developing empathy for such simulated minds, because of the commonality of background.

Expand full comment

Why would we want to overcome that gap? AI IS a cold alien mind.

Expand full comment

Because it helps to guide/design it toward empathetic outcomes rather then "There, AI suggests how to destroy the world, exactly as expected from cold alien mind, good job, working as intended!" ?

Expand full comment

While it's very interesting to see the debate rephrased in these terms, I'm not sure what the takeaway is for forecasting existential risk as stated. Obviously it will never be the case that all humans, or all but 4,999 humans, voluntarily agree to replace themselves with AI clones or whatever. So unless the argument is that robots will kill more-or-less all humans, but they won't be killing "us" because "we" will have become the robots and any remaining meatbags will be converted to a "they", *and that this is all quite welcome and good,* then I don't see how this changes the forecast for "chance to an existential catastrophe (where fewer than 5,000 humans survive) owing to AI by 2100". Can you clarify?

Expand full comment
author

Why I ask WHY anyone would think that AIs would kills us all, this is where that investigation leads, to assumptions about the ways in which AI will be different from humans.

Expand full comment

> Karger and Tetlock actually did this exercise for four different big risks (AI, nukes, bioterror, CO2), and that only on AI did extensive discussion not close the opinion gap between experts and superforcasters, or leave the gap so large.

Well, what I want to know about that is in what direction the opinion gap closed for the other three issues. Did the superforcasters come closer to the opinions of the experts, or the other way around?

Anyway, it sounds like you don't deny that AI will cause the end of biological humans at some point, you just don't mind because you see the AI as "our descendants." That's a difference in values, not a difference in the predicted outcome.

Expand full comment
author

Future humans will likely cause the end of humans-like-us at some point as well. Why don't you mind that?

Expand full comment

What I want is to promote the future welfare of people - of any shape or substrate - who share my core values, to the extent that those core values aren't (unknown to me) misguided/inconsistent with themselves. My core values include ideas like rationality, kindness, honesty, fairness, creativity. So whether I mind replacement by humans-unlike-us depends on which ways they are unlike us. If they lack those values and won't ever give rise to others with those values, that would be a problem. Total replacement by AIs that *do* have those values would be okay from my perspective. Replacement by AIs that don't have them would not be okay.

Expand full comment

I find scary the folk who want to preserve those who share their values. How about preserving others, period? Do you really think your values are so important? Are you really so sure you've got the "right" values? On what grounds? Does a dog share your values? Do you really know if your family (however defined) does? Weird measuring stick. Same with the emphasis on qualia/consciousness, which you don't know anyone/thing else has (I don't think you've got it, either, just emergent property from arrangement of physical matter, plants and bats "feel" like something, too). If you spend lots of time with your AI, it's very helpful, funny, etc., I'm guessing you wouldn't just blithely destroy it. And the p-zombie thing assumes that X arrangement of matter shouldn't give rise to feelings, so there's some magic to explain, even though the science is pretty clear there's just the X arrangement of matter, and so that's what any consciousness arises from. P-zombie concept doesn't do any work as an analogy, not sure why so many smart people think it does (said as person who's studied philosophy and neuroscience at top schools, etc.).

Expand full comment

If my values aren't the right ones then I would want to promote the welfare of people who do have the right ones. I mentioned that: "... to the extent that those core values aren't (unknown to me) misguided/inconsistent with themselves."

No, I don't want to promote the welfare of everyone. I don't want to promote the welfare of serial killers and bullies and sadists. I want to promote the welfare of good people, according to my understanding of what is good, which is of course open to revision on new information.

A dog does not share most of my values. But it does share some of them, for example a dog can be kind or loving, and so I'd like to promote its welfare for that reason, though not nearly to the extent I'd like to promote the welfare of most humans.

Expand full comment

Though, I'd add that I prefer that meat brains survive because I'm not convinced that the substrate is irrelevant for giving rise to qualia (might be philosophical zombies). But it's only a small preference (tiny difference in probability) and a sufficently good theory of qualia could change my mind on that.

Expand full comment

The gradual-brain-replacement argument suggests that the substrate is irrelevant. If tiny bits of your brain are gradually replaced with functionally-equivalent nanites, would you notice any gradual loss of consciousness?

No, you wouldn't. Because if you did notice that, you'd be able to make a comment about it (or even just have a thought about it). If you made a comment about it or had a thought about it, that would be a functional difference between your gradually-replaced brain and a fully bio-brain, contradicting the premise that the nanites are functionally equivalent. Even just to "notice" a difference - presuming that noticing things has a physical basis in brain activity - would be contradicting the premise.

It follows that even after the nanite brain replacement is complete, you are still conscious.

There are a couple objections I can think of to this, neither of them physically very plausible. Objection 1: what if the loss of consciousness is very sudden, so there is no point where you could notice it happening while you're still conscious? But the replacement process is gradual so that doesn't seem likely.

Objection 2: what if the premise that the nanites are functionally identical to the brain tissue they replace cannot be met? Well, technically that premise can never be *exactly* met, but we can suppose the functional fidelity of the nanites is high enough that you won't notice the difference, and the argument goes through. Objection 2 thus has to propose that the fidelity of the nanites can never be that high. This could only be the case if there's something essential to brain operation that can't be accurately simulated in a conventional computer, e.g. if the brain is really a quantum computer, which seems unlikely.

Expand full comment

But the whole premise of the philosophical zombie is that it reports having experiences and acts like it does. It just doesn't actually have them. So the gradual brain replacement shows nothing. Yup, at some point you stop really having experiences (or start having fewer) but you still claim to do so and act like you do.

Ok you've replaced the visual cortex. Now maybe there is no experience of seeing anything but since it's I/O indistinguishable ofc you still act and behave just like you were having that experience.

The problem is that in some sense actual experiences are causally irrelevant (I don't like putting it that way because that's all dependent on how you formulate your laws of nature....it's just as lawlike as any other part of nature it's just harder to observe).

Expand full comment
May 12, 2023·edited May 12, 2023

Are you proposing that when you gradually replace the visual cortex with nanites, your *entire* consciousness shuts out like a light? That would be objection 1 and it doesn't seem plausible to me. If your entire consciousness does *not* shut out like a light, but only the visual portion of it, then *the portion of your consciousness that remains would still be able to notice that you've lost your sight*. Because you aren't a total P-zombie, only a partial one. But if your remaining consciousness can notice that you've lost your sight, then your remaining consciousness can also *think* about that, and those thoughts must have neural correlates. So the partial P-zombie must have different neural correlates, contradicting the premise that it is functionally equivalent to a normal human.

Expand full comment

For example, maybe you remove the visual cortex. Now you don't have an experience of a visual field anymore. But the part of your brain that's responsible for describing that is still getting the same signals so it still produces the experience of thinking "I have the experience of seeing a full visual field" but it's wrong. You don't have that experience anymore.

Expand full comment

2 but how would it notice? It just wouldn't be there but you'd think and act like it does.

As an imperfect analogy consider the way split brain patients don't actually have a full field of view anymore, they only can report things that reach one side, yet they don't say "ohh my god I can't see anything to my left" you can only discover they don't via subtle experiments. Like that except it's literally u discoverable because the parts you replaced are I/O identical.

Or, to put the point differently, the fact that it seems like

Expand full comment

I suspect that we've evolved to see our the survival of our biological descendents as itself highly valuable (the people who were cool with just having their ideas live on probably were less likely to have their biological children survive).

I don't think it's desierable from a moral perspective but I suspect arguing against it is like trying to convince parents they shouldn't be partial to their own offspring.

Expand full comment

I personally wouldn't like either AI or humans causing the end of humans-like-us at some point. I agree that my values are unlike those of my ancestors, and are likely to be different from those of my descendants. But, I don't care! I still have the particular values I have, even if some of them are abstract and accepting of change.

Your position seems to me to prove too much - if we were fighting a total war against an enemy, no matter how alien or horrible-seeming, I think most of your arguments would support the claim that we should simply identify with the enemy and consider their victory to be ours. Is this a fair characterization?

Expand full comment

https://en.wikipedia.org/wiki/Homophily explains like behavior to similar kinds via kin selection and kin recognition. However I am not clear about why you would think: "as evolutionary selection is the plausible cause of these specific fertile faction inclinations, a force that should not apply to expansions into new territory". I do expect kin selection - and cultural kin selection - to continue into the future. It seems unreasonable to dismiss these effects as some kind of historical legacy. Homophily underlies the uncanny valley - for example.

Expand full comment

> my best one-factor model to explain opinion variance here is this: some of us “other” the AIs more.

Too easy. I don't think AIs could pose a risk *without* becoming more like us. There is the possibility that they might never be conscious, though - and I am concerned that they'll be not enough like us to be able to survive without us, though. Anders from Wood From Eden put it succinctly: https://thingstoread.substack.com/p/is-the-singularity-really-serious/comment/1595924

If I could be reassured that consciousness and self-reflectivity would be developed along the way toward the singularity, I would not only increase my prediction that the singularity would occur, I'd welcome the end.

Expand full comment

"more essentialist views regarding what each side has in common"

I think this assertion is just not accurate as a representation of those you're arguing against, and that it is in some sense specifically the diversity of possible AIs compared to the relatively modest variations in values among humans that generates much of the potential risk. Yudkowsky has been writing about that since at least 2008, around when your original FOOM debate(s) happened.

To put it another way, calling it an us-them "axis" works for many human differences, but for AI that is assuming away most of the difference of opinion by collapsing it to a single dimension, when the other side is arguing that you're actually sampling from a very high-dimensional space.

Personally I would be said to learn that the future shares all my current values, even if I think it would be good for more people to share more of them in the near term. That would mean we'd failed to learn and grow. I do not find this at all incompatible with believing that a world dominated by optimization processes that hold most theoretically possible values would also represent a major civilizational failure.

Expand full comment

Some random thoughts on the topic…

1) I am more pessimistic of humans than AI. I am not clear what the chances are that humans destroy each other in the next century, but I would bet it is far greater than the chance of robots doing so.

2) I am also less impressed with human values. I think that a computer as smart as a god might actually be more benevolent and less iatrogenic than our human overlords.

3) If super intelligent machines are possible (and I think they are), then they are inevitable. It isn’t if, it is when, and thus our only long term hope is that with intelligence comes wisdom and morality.

4) Thus the real question is whether AI's are the solution or the problem. We don’t really know, but I would bet we are gonna find out. Hope I live long enough to find out.

Expand full comment

1. AI has higher inferential distances than those other topics

2. Power corrupts, and foom produces absolute power. None of the checks and balances that constrained us and our ancestors from oppressing others.

Expand full comment

This it seems accurate that AGI is supported by nihilism with little consideration for humanity, and thus inflicting this upon humanity in an undemocratic fashion for the sake of machines is seen as a moral good.

Gotcha. This attitude should be known by everyone who is regulating this technology.

Expand full comment

It has been said that 'Civilization is a mile wide and an inch deep'. We recognize our primitive selves within us You only have to look at the Ukraine and Rwanda as two examples of humanity's brutality to itself and of wars generally.

The difference here is that, for all the massacres and wars, humans were guaranteed to continue to exist on Earth (barring a Nuclear Winter), just not all humans.

With AI, unless we imbue it with controls to work for humanity, there is no reason to expect that AI will continue to do so. It will clinically determine a 'belief' about a subject based on whatever evidence it can collect. It may, for instance taking The Matrix as an example, determine that humanity is a parasite and that the Earth would be better off without us.

I am reminded of the 3 Laws of Robotics stated by the great Sci-fi writer Isaac Azimov which were built into the positron brains used in robots in his great books.

The first law is that a robot shall not harm a human, or by inaction allow a human to come to harm. The second law is that a robot shall obey any instruction given to it by a human, and the third law is that a robot shall avoid actions or situations that could cause it to come to harm itself.

In 'I, Robot' there was a loophole but let's ignore that as being the exception.

Taking those laws as a requirement for independent action robots, this rules out military robots being used to kill humans. Not barring using them for surveillance and interpretation of information but solely armed to kill humans.

That is the essential difference between humans and AI. We may be creating another 'race' of 'creatures' that learn (very quickly) to be critical of humans and their abilities and consider us as inferior to them and treat us no better than we did gorillas and chimpanzees for all of time until perhaps the last 75 years. Even now, not all of humanity thinks the same way. The poaching of animals from Africa in particular and the marketing of protected animals in places like the wet markets in China are proof of that.

Why do we expect that AI during its 'learning' phase about treatment of animals (and us) to do better UNLESS we build those attitudes into its software in such a way that removing or altering those attitudes make AI incapable of functioning at all.

We, through International Agreements, have put in a rule based system such as the banning of the use of gas and certain munitions in War. Unfortunately, this has not stopped rogue Nations using gas against its own people (Syria in the current civil war and Iraq under Saddam Hussein, are examples).

Caution is required and careful preparation in the development of AI for the betterment of mankind is needed or it may well be the last great technological advancement by humanity with the 'birth' of its successors.

Expand full comment

What stands out to me is that of those four topics AI risk is the only one without clear facts that can be used to bring differing priors into line.

With CO2, bioterrorism and even nukes I think most people share similar models and there are historical and scientific results that let us pin down an estimate of risk. There isn't much equivalent for AI risk.

In other words, if your model of human disagreement is that people change their mind only when they are forced to do so the AI subject doesn't have enough precedent to force people to change their mind. Especially when the incentives (it makes AI research super important) not to are strong.

Expand full comment