Russell’s Human Compatible

Aug 26, 2020

My school turned back on its mail system as we start a new semester, and a few days ago out popped Stuart Russell’s book Human Compatible (published last Oct.), with a note inside dated March 31. Here’s my review, a bit late as a result.

Let me focus first on what I see as its core thesis, and then discuss less central claims.

Russell seems to say that we still have a lot of time, and that he’s only asking for a few people to look into the problem:

The arrival of super intelligence AI is inherently unpredictable. … My timeline of, say eighty years is considerably more conservative than that of the typical AI researcher. … If just one conceptual breakthrough were needed, …superintelligent AI in some form could arrive quite suddenly. The chances are that we would be unprepared: if we built superintelligent machines with any degree of autonomy, we would soon find ourselves unable to control them. I’m, however, fairly confident that wee have some breathing space because there are several major breakthroughs needed between here and superintelligence, not just one. (pp.77-78)
Scott Alexander … summed it up brilliantly: … The skeptic’s position seems to be that, although we should probably get a couple of bright people to start working on preliminary aspects of the problem, we shouldn’t panic or start trying to ban AI research. The “believers,” meanwhile [take exactly the same position.] (pp.169-170)

Yet his ask is actually much larger: unless we all want to die, AI and related disciplines must soon adopt a huge and expensive change to their standard approach: we must stop optimizing using simple fixed objectives, like the way a GPS tries to minimize travel time, or a trading program tries to maximize profits. Instead we must make systems that attempt to look at all the data on what all humans have ever done to infer a complex continually-updated integrated representation of all human preferences (and meta-preferences) over everything, and use that complex representation to make all automated decisions. Modularity be damned:

Let’s remind ourselves of the task at hand: to design machines with a high degree of intelligence … while ensuring that those machines never behave in ways that make us seriously unhappy. … Like many other fields, AI has adopted the standard model: rebuild optimizing machines, we feed objectives into them, and off they go. That worked well when the machines were stupid and had a limited scope of action. … As machines designed according to the standard model become more intelligent, however, and as their scope of action becomes more global, the approach becomes untenable. Such machines will pursue their objective, no matter how wrong it is; they will resist attempts to switch them off; and they will acquire any and all resources that contribute to achieving the objective. (pp.171-172)
In a nutshell, I am suggesting that we need to steer AI in a radically new direction if we want to retain control over increasingly intelligent machines. We need to move away from one of the driving ideas of the twentieth-century technology: machines that optimize a given objective. I’m often asked why I think this its even remotely feasible, given the huge momentum behind the standard model in AI and related disinclines. (p.179)

It is the enormity of this task that pushes Russell to insist that we start now, even eighty years in advance. Though he doesn’t mind suggesting that we may have much less time:

If an intelligence explosion does occur, and if we have not already solved the problem of controlling machines with only slightly superhuman intelligence … then we would have no time left to solve the control problem and the game would be over. (p.144)
It’s common to see sober-minded people .. pointing out that because human-level AI is not likely to arrive for several decades, there is nothing to worry about. … This argument fails … long-term risks can still be cause for immediate concern. … depends not just on when the problem will occur but also on how long it will take to prepare and implement a solution. For example, if we were to detect a large asteroid on course to collide with Earth in 2069, would we say it’s too soon to worry? (pp.150-151)

In my view, Russell has not made a remotely sufficient case for such a huge ask. We are very far from knowing even the rough outlines of how such distant future systems and their human associates will be organized. For example, they might be broken into many small Comprehensive AI services, as suggested by Drexler (whom Russell didn’t cite), none of which have a very global scope of action. We seem now quite badly placed to think well about what sort of problems will be the most serious and the best ways to deal with them. In the past, we have almost always waited until we had pretty concrete ideas of system design and placement before we worked out how to mitigate specific failure modes; why can’t we do that again here?

Instead of being warned that asteroid will collide with Earth in 2069, we are more being told here that some sort of “thing” will “interact” in some way with us at some future time, but that we will slowly learn more over time and know a lot more a decade beforehand. Or this is like being warned about about the abstract possibility of nuclear weapons in the year 1500 – way too early to do much useful. In these sorts of cases, usually the best thing to do is collect general resources and wait until one learns much more.

Russell’s obsession with absolute control leads him to reject ems entirely:

Let’s remind ourselves of the task at hand: to design machines with a high degree of intelligence … while ensuring that those machines never behave in ways that make us seriously unhappy. … Our chance of controlling a superinteligent entity from outer space are roughly zero. Similar arguments apply to methods of creating AI systems that guarantee we won’t understand how they work; these methods include whole-brain emulation – creating souped-up electronic copies of human brains – as well as methods based on simulated evolution of programs. I won’t say more about these proposals because they are so obviously a bad idea. (p.171)

I wrote a whole book on the topic of ems not being obviously a bad idea, but I guess he finds that so obviously wrong as to not be worth a counter-argument. As ems are basically human minds in robot brains and bodies, any control argument for rejecting them would also seem to be an argument for rejecting humans. How dare we continue to allow humans to be created to serve each other, when their objectives are also so opaque and out of control?

I also find fault with Russell’s insistence that we simply can’t allow loyal AIs, who work only to please their owners.

Loyal AI – Let’s begin with a very simple proposal. If Harriet owns Robbie, the Robbie should pay attention only to Harriet’s preferences. … [But] if Harriet doesn’t give a fig for the preferences of others … might [Robbie] not spend his time pilfering money from online bank accounts to swell indifferent Harriets’s coffers, or worse? … Strict liability doesn’t work: it simply ensures that Robbie will act undetectably when he delays planes and steals money on Harriet’s behalf. … Even if we can somehow prevent the outright crimes, a loyal Robbie … will … cut in line at the checkout whenever possible. … In summary, he will find innumerable ways to benefit Harriet at the expense of others – ways that are strictly legal but become intolerable when carried out on a large scale. … Humans tend not to take advantage of these loopholes, either because they have a general understanding of the underlying moral principles or because they lack the ingenuity required to find the loopholes in the first place … It seems difficult, then, to make the idea of a loyal AI work, unless the idea is extended to include consideration of the preferences of other humans, in addition to the preferences of the owner. (pp.215-217)
No one would buy such a [uniformly] altruistic robot, so no such robots would be suit and there would be no benefit to humanity. … Robbie will need to have some amount of loyalty to Harriet in particular – perhaps an amount related to the amount Harriet paid for Robbie. Possibly, if society wants Robbie to help people besides Harriet, society will need to compensate Harriet for its claim on Robbie’s services. … Perhaps some completely new kinds of economic relationships will emerge to handle the (certainly unprecedented) presence of billions of purely altruistic agents in the world. (pp.226-227)

Russell doesn’t seem to notice that any more weight put on Harriet relative to others could induce Robbie to steal, or cut lines, on her behalf. And he seems to imagine a situation where only Harriet has an advanced robot. If, in contrast, most everyone has such robots, then Robbie will not find it easy to steal, cut lines, or find loopholes in social rules and norms. The other robots will watch for violations, and coordinate to discourage them. In that scenario there isn’t more of a social-cheating problem from having smart robot servants than we see from having smart human servants.

Let me finish with a few minor disagreements.

Everyone should also have the right to mental security – the right to live in a largely true information environment. (p.107)

While we have long tried to make laws against lies and fraud, there seem to be enough subtle border cases to make that pretty hard. I don’t see it getting much easier with AI. And we often prefer to use other social mechanisms than law to deal with lies.

Building machines that can decide to kill humans is a bad idea. (p.112)

Always a bad idea? This isn’t at all obvious to me. Why can’t there be cases where machine advantages allow a lower error rate, to better let us approach the ideal of only the right people getting killed?

Although, as it will soon become evident, I am by no means qualified to opined on what is essentially a matter for economists, I suspect that the issue is too important to leave entirely to them. (p.113)

One should uphold a higher standard. It is not enough that the topic be important, you also need a reason to think you have some insight and expertise that others may not contribute.

As AI progresses, it is certainly possible – perhaps even likely – that within the next few decades essentially all routine physical and mental labor will be done more cheaply by machines. … One rapidly emerging picture is that of an economy where far fewer people work because work is unnecessary. (pp.119-120)

Seems he’s drunk the cool-aid. I’ve said a lot about why I’m skeptical.

UWIR

May 15, 2023

You are suggesting that checks and balances, death threats, & police can't work for ems as they work for us.1. I'm saying that their effectiveness with regard to humans has limited evidentiary value towards the question of whether they will work with regard towards ems. You said "As ems are basically human minds in robot brains and bodies, any control argument for rejecting them would also seem to be an argument for rejecting humans." I find you to have the burden of proof in this matter. It is not for me to prove that ems definitely will kill us all, it is for you to prove they won't. It is not enough to make an argument from ignorance of "We can't see any clear different between ems and humans, so we should just assume that arguments regarding one apply equally to the other." 2. I disagree that I didn't give any reason why the checks and balances would not apply the same. I gave the lack of equivalence of imprisonment and death penalty as examples. 3. I don't think much explanation is needed for the claim that humans would not be able to police ems the same as humans. How would humans punish ems, if ems have advantages over humans? You'd want the em you punish to be the "same" as the one that committed the offense, but what does it mean to be "the same"? Do you punish just the instance that committed the offense, or all instances of "the same" em? On top of this, just because one population is drawn for another population doesn't mean that statements about one apply equally to the other. All sociopaths are humans, but that doesn't mean that any arguments against giving power to sociopaths apply equally well to giving power to humans in general.

A neg, weight on harming others is the same as a big pos. weight on helping them.No, it's not. One can have a utility function that treats deviations from the status quo asymmetrically. Yes, it would require there being some concept of "status quo", but given that humans can understand the concept, it follows that a human-level AI can as well.

PS: Is the set of tag generators at the bottom of the comment box determined by Disqus or the site owner? I'm not seeing the blockquote option.

PPS: You use the word enormity apparently referring to the task of working towards AI safety. This word, at least from a prescriptivist point of view, refers to a large *bad* thing.

You are suggesting that checks and balances, death threats, & police can't work for ems as they work for us. But you don't say why.

A neg, weight on harming others is the same as a big pos. weight on helping them.

30 more comments...

Overcoming Bias

Discussion about this post

Ready for more?