Russell’s Human Compatible

Aug 26, 2020

My school turned back on its mail system as we start a new semester, and a few days ago out popped Stuart Russell’s book Human Compatible (published last Oct.), with a note inside dated March 31.

Read →

32 Comments

UWIR

May 15, 2023

You are suggesting that checks and balances, death threats, & police can't work for ems as they work for us.1. I'm saying that their effectiveness with regard to humans has limited evidentiary value towards the question of whether they will work with regard towards ems. You said "As ems are basically human minds in robot brains and bodies, any control argument for rejecting them would also seem to be an argument for rejecting humans." I find you to have the burden of proof in this matter. It is not for me to prove that ems definitely will kill us all, it is for you to prove they won't. It is not enough to make an argument from ignorance of "We can't see any clear different between ems and humans, so we should just assume that arguments regarding one apply equally to the other." 2. I disagree that I didn't give any reason why the checks and balances would not apply the same. I gave the lack of equivalence of imprisonment and death penalty as examples. 3. I don't think much explanation is needed for the claim that humans would not be able to police ems the same as humans. How would humans punish ems, if ems have advantages over humans? You'd want the em you punish to be the "same" as the one that committed the offense, but what does it mean to be "the same"? Do you punish just the instance that committed the offense, or all instances of "the same" em? On top of this, just because one population is drawn for another population doesn't mean that statements about one apply equally to the other. All sociopaths are humans, but that doesn't mean that any arguments against giving power to sociopaths apply equally well to giving power to humans in general.

A neg, weight on harming others is the same as a big pos. weight on helping them.No, it's not. One can have a utility function that treats deviations from the status quo asymmetrically. Yes, it would require there being some concept of "status quo", but given that humans can understand the concept, it follows that a human-level AI can as well.

PS: Is the set of tag generators at the bottom of the comment box determined by Disqus or the site owner? I'm not seeing the blockquote option.

PPS: You use the word enormity apparently referring to the task of working towards AI safety. This word, at least from a prescriptivist point of view, refers to a large *bad* thing.

Expand full comment

RobinHanson

May 15, 2023

You are suggesting that checks and balances, death threats, & police can't work for ems as they work for us. But you don't say why.

A neg, weight on harming others is the same as a big pos. weight on helping them.

Expand full comment

Peter Gerdes

May 15, 2023

Yah isn't the cost of reprogramming a part of itself to augment some faculty incredibly cheap for an EM? So cheap you'd expect the benefits to have to decrease to near 0 before you reach equilibrium?

Expand full comment

Peter Gerdes

May 15, 2023

Yah, I agree the line-cutting argument isn't very good. Either we can't make Robbie reliably work on Harriet's behalf in the first place or we should be able to make it only do so when it doesn't harm overall well-being.

I suppose the counter would be that in fact we allow people to do slightly selfish things like try to get tickets to their favorite concert even though they know others would enjoy it more but have a fuzzy line about what's too much that would be hard to communicate but I’m not very convinced.

Expand full comment

UWIR

May 15, 2023

> How dare we continue to allow humans to be created to serve each other, when their objectives are also so opaque and out of control?

I find that argument rather fallacious. It's a mixture of "We had this problem in the past, so why should we give any consideration to solving this problem in designing future systems" and "Arguments that apply to one system apply without any modification to fundamentally different systems".

Just a few of the counters to this position:

We have extensive checks and balances to human malfeasance.

Humans can be threatened with imprisonment or death if they misbehave (an AI isn't going to necessarily fear death, apart from how it decreases its opportunities to further its utility function).

Humans can police each other, but they can't police an sufficiently advanced AI.

We have shown that policing humans is a manageable problem, but that doesn't mean that policing any entity in general is.

If humans run amok, we have world with amok humans. If AI runs amok, we have a world with no humans at all.

>Russell doesn’t seem to notice that any more weight put on Harriet relative to others could induce Robbie to steal, or cut lines, on her behalf.

There are a variety of ways of addressing this. Robbie could have a negative weight to harming others, such that stealing would have to help Harriet more than the weighted harm to others. That wouldn't completely eliminate theft, but it would eliminate the most destructive types. And from a consequentialist point of view, that's the most important type to eliminate.

But presumably, the reason society supports the building of AI is because it creates economic surplus. If Robbie is creating surplus, we could allow Robbie to weight allocation of this surplus to Harriet higher than allocation to others. Then Robbie would still be giving preference to Harriet, but would not steal from others.

I suppose there would be issues such as that Robbie can do something that results in Alice realizing $10 less profit, but Bob gets $15 mores, and Harriet get $1 more. For a deotonologist, at least, there would be cases where this sort of thing would not be acceptable.

Expand full comment

Peter Gerdes

May 15, 2023

No, what I'm arguing is more basic. That it's incorrect to even assume that the AIs we develop will be well understood in terms of things like goals, beliefs or a utility function. (Yes, in some sense you can describe anything in terms of a utility function but the danger is in having a utility function that globally favors optimizing some simple goal).

For instance, even humans don't fully act like they have a single set of beliefs. Depending on context and salience different they will act as if they have different beliefs. There is no reason at all to suppose an AI that is trained to, say, minimize the number of times people complain that a translatoin was incorrect has anything like a global belief that would result in it acting to achieve that minimum by destroying all people.

Evolution creates a certain pressure to have global beliefs since (more or less) it favors globally optimizing the number of descendents you produce but the fact that even humans don't truly have global beliefs suggests that it takes additional effort to achieve the feared kind of global optimization.

It doesn't make AGI a total non-risk. I can see some reasons people might try and instill simple global goals in an AI (e.g. military purposes) but I don't see a reason to assume it will happen automatically rather than require deliberate intent.

—

Yes, I've read the args for having global beliefs or utility function in Bostrom's book … don't find them very convincing.

Expand full comment

Overcoming Bias Commenter

May 15, 2023

Note: I am not affiliated with CHAI and do not have a detailed model of their work. But my mental model seems fairly different to what you outlined, which is why I'm commenting.

I was wrong about there being no penalty to that method. They've managed to make the linked technique relatively efficient for small action spaces and large state spaces. Yet the complexity increases exponentially with the size of the action spaces.

But that's just one exploration of Russel's principles. Other papers seem to produce more or less efficient solutions, depending on the problem. They don't require every piece of data be fed into every AI for every problem, because that's intractable. Or even for most problems.

Rather, he wants an AI seeks out info in order to learn what the human wants whilst not manipulating the human into wanting something easier to maximise. If that means it needs data on every aspect of humanity, then fine. If it turns out to be simpler, then that's OK too. At least, as long as the AI is willing to be corrected.

Speaking of current techniques, they're used on relatively simple problems. The ones for which we can ignore human irrationality, hypocrisy and wider context when making decent proxies. The sort of problems where Stuart expects the current approaches to fail.

By "augmentations" I meant the ones you think would occur after the early EM era (ch. 4 "Complexity"). You argue there will be slight modifications with diminishing returns in the early era. But diminishing in what way, over what time scales, and what sorts of returns and what trade-offs all seem like key questions.

Which is why I was suprised when you said the book has an 80% chance of being a good guide to the future unconditionally.

Expand full comment

Steve Witham

May 15, 2023

The form of your argument I've seen most often takes selfishness as the human trait that AIs won't necessarily share. In other words it's chauvanistic of us to assume AIs would have such an all-too-human issue as selfishness. I hope this is close enough to what you're saying to be relevant to it.

The way a lot of us (including economists, I haven't read Bostrom's argument) think of selfishness is to just assume there are agents with arbitrary, but different, sets of goals. At that point they're in a situation where interactions with other agents could go well or less well for those goals, and the adaptive thing in a strategic situation is to strategize. Visible phenomena like "selfishness" (in the more familiar meaning), greed, and competitiveness would then show up as the game-theoretic consequences. As soon as you have things with both cross purposes and adaptability (it doesn't even require human-like adaptability) in a setting where the participants can see that they can affect each other, selfishness shows up.

Often what people mean by selfishness is going outside ethical rules or standards in distasteful ways. But going outside human rules is not a special human trait, it's the default: why would we assume AIs would stay within our boundaries of politeness? Or, on the other hand, how sure are we that they would invent and come to mostly abide by rules that were compatible with ours?

If I'm right that the patterns that we see as ethical issues are really just game theoretical phenomena, then one would expect that the most mathematically basic phenomena would be the ones we're most familiar with in humans, and would be the most visible to us looking at AI behavior. So, selfishness, greed, competitiveness, lying, violence, careful mutual tiptoe-then-run, cheating, sneakiness, double-crossing... it's almost as inevitable as ticking off the prime numbers.

Expand full comment

RobinHanson

May 15, 2023

I don't ignore augmentation; I discuss it. I just don't think it makes much of a difference to the many topics I discuss.

Russell discusses many kinds of human data and says he wants each one included.

Surely this method has some cost penalties of some sort, else people would have already been doing it long ago.

Expand full comment

Overcoming Bias Commenter

May 15, 2023

Robin, there's a question I've been meaning to ask you about EMs. Why do you think age of EM will be a good guide to the future when you ignore EMs being massively augmented? A slight edge in some aspect of intellect could lead to huge gains in market share. Because copying and deleting yourself is a minor issue, I'd expect the economic forces to make this inevitable.

Perhaps it is hard to reach super-human intelligence, and it takes a few centuries EM time. That's still months real time, and all of a sudden you have these superintelligences running around who were shaped by economic pressures into something alien. It is unclear to me what would occur then, or if it would be a future our descendants would be happy with.

RE the review, Russell is not advocating that every AI be trained on all human knowledge. No idea where you got that from, to be honest. Just that the current formalism is inadequate for cases where we know it is hard to give a good reward function/training set.

Instead, he argues for a formalism where the AI takes a co-operative approach with a human in learning the relevant reward function. That's a bit fuzzy, but you can look up CHAI's research agenda for details. At present, their methods don't seem to suffer performance penalties. Maybe this paper would be a good place to start:https://arxiv.org/pdf/1606....

As to why worry about this now. Well, I plan to stick around for a few centuries. Heck, I plan to convince my family and friends to do so too. Hostile superintelligences would probably be an issue. Since I'm unsure if we can even work around that problem, I assign high value to resolving it whilst I'm alive.

Expand full comment

Yash Sharma

May 15, 2023

Ok.From the summary, I agree that a sufficient case for changing course of AI has not been made in the book.

Specifically the point that "we must stop optimizing using simple fixed objectives". Doing away with fixed objective AI systems would leave us with very few options. As such systems are one of our best chances to produce anything useful with AI.

Expand full comment

RobinHanson

May 15, 2023

Thats a possible argument, I'm responding to the argument Russell made.

Expand full comment

Yash Sharma

May 15, 2023

There is a possibility that instances like these might actually be successful in preventing AI deployment on a scale that we need.

I don't know if this comparison is fair but: It is clear that nuclear energy is a very reliable energy source, but it has been stalled for a long time, largely due to misguided opposition.

Similarly if AI is not human compatible to begin with, any thing that can be held against it, would be held against it.

Expand full comment

Tom Hart

May 15, 2023

The algorithm reveals aspects of reality that the official ideology finds unacceptable; therefore, the algorithm must be scrapped. But, in the long term, those regimes that have no ideological objection to using the algorithm will defeat those that have scruples.

Expand full comment

Yash Sharma

May 15, 2023

Relevant to the scrapping of UK home office algorithm - https://www.theguardian.com...

Expand full comment

Yash Sharma

May 15, 2023

The problem I think is that there is already a lot of opposition to deploying AI that is even remotely biased and we would have to do all the human compatibility work anyway. Wouldn't it be less expensive and time consuming to do it from the start?

For e.g: UK Home Office scrapped its Visa application algorithm and also A-Level grading algorithm. And other predictive systems are being protested as well - https://www.wired.com/story...

Expand full comment