An Outside View of AI Control

Robin Hanson

Oct 1, 2017

I’ve written much on my skepticism of local AI foom (= intelligence explosion).

Read →

33 Comments

Riothamus

May 15, 2023

I may be mistaken, but I think the drift here means ‘relative to what we believed them to be’.

There isn’t much to distinguish something that is directly random from something that is bounded in its randomness but the bound is unknown.

Expand full comment

David Krueger

May 15, 2023

"In the context of foom, the usual AI concern is a total loss of control of the one super AI, whose goals quickly drift to a random point in the space of possible goals."

That seems very wrong to me. The concern is not about goals drifting; it is about them being relentlessly pursued. What am I missing?

"From this view, those tempted to spend resources now on studying AI control should consider two reasonable alternatives. The first alternative is to just save more now to grow resources to be used later, when we understand more. The second alternative is to work to innovate with our general control institutions, to make them more robust, and thus better able to handle larger coordination scales, and whatever other problems the future may hold. (E.g., futarchy.)"

These seem like reasonable options to consider in any case. I tend to think that solving global coordination is essential. I think most AIS-ppl are not focused on that because it seems intractable to them, so even if the chances of technical success seem low, it still seems like a better bet.

Expand full comment

RobinHanson

May 15, 2023

I assume "mostly peaceful" here mainly to separate scenarios according to factors that are most relevant to each one. We'd focus on different factors in a war scenario.

Expand full comment

Joe

May 15, 2023

Robin, how well do you think the "mostly peaceful" assumption holds up here? I ask because in your em scenario, you give a sizeable chance of human extermination per objective year of em era (30%, IIRC?). Won't this be equally true in a multipolar non-em AI future? Perhaps moreso, if the AIs are more able to develop new separate institutions due to not being as tied to a human past?

If so, then under the further assumption of complex slowly-accumulating AI that takes centuries to reach human level (rather than Christiano's prosaic AGI model), perhaps our best bet is to hurry ems, so that this transition to an AI future happens when we're running at the same speed as the AIs and thus can better integrate into their institutions.

(Note that this assumes a framework in which all that matters is our own personal survival and happiness - which is probably not right but is nonetheless useful as a concrete model that's even partially correct.)

Expand full comment

RobinHanson

May 15, 2023

Again, that's just a generic concern; it has nothing to do with AI in particular.

Expand full comment

Will Pearson

May 15, 2023

Hey Robin, I'm interested in a system that changes the method of control of programs inside a computer system.

Currently you can see your management of the resources inside a computer as akin to a command economy. You install new programs which take up as much hard disk space, a resource, as they want. You have to delete files it creates. They also take up as much memory as they want, unless you put specific bounds on them.

This puts a strain on you, the computer user, to the number of programs that you want to manage and also how quickly they can be updated as you need to get to know how to use the new features.

So I am interested in creating systems that use a economy based on feedback from the user to manage normal computer programs. I think this could radically change the speed of program change, especially when novelty is added into the system (there was no previous way of telling what good novelty was vs bad, but the economy decides).

I'm not sure this will work, but I think a change in the management of computational resources will have a large impact on the efficiency how software is managed which will impact the world a lot.

This sort of thing seems required for true general AI and also in need of making sure that it works well.

I have a website about my approach and what I hope to do with it, if you are interested.

Expand full comment

Yosarian2

May 15, 2023

But capitalism thus far has been done in such a way that while maybe not everyone benefits (externalities or imbalances of information or whatever), that at least some people benefit. An AI driven economy could turn into something that literally benefits no one, especially if the AI's do not have moral value of their own. Over the long run the entire AI economy could turn into the equivalent of an AI paperclipper, using up all available resources while producing nothing of true value.

Expand full comment

RobinHanson

May 15, 2023

That's just a generic concern about all competition.

Expand full comment

Yosarian2

May 15, 2023

One possible concern in this kind of situation is what Scott Alexander described as an "ascended economy". That would be an economy where several corporations owned and run by AI's with a fully automated workforce start to just trade with each other directly in ways that don't involve humans at all. In that case they could just continue to expand economically and consume more and more resources without there being any humans capable of putting any control mechanisms on them.

http://slatestarcodex.com/2...

Expand full comment

Overcoming Bias Commenter

May 15, 2023

I responded to that paragraph because it seemed like the main substantive argument against working on control now. I agree with most of the rest of the claims in the post, but don't see you as making much of an argument against working on control.

I don't see why this post shows that law enforcement and government will clearly be able to utilize AI without control problems. You say things like:

> Such broad mechanisms are effective, entrenched and robust enough that future advanced software systems and organizations will almost surely continue to use variations within these broad categories keep control over each other. So humans can reasonably hope to be at least modestly protected in the short run if they can share the use of such systems with advanced software. For example, if law protects software from stealing from software, it may also protect humans from such theft.

Which does not really speak to most AI risk advocates' concerns (we are concerned about humans realizing much of the value in the universe over the long term).

You say:

> In peacetime, control failures mainly threatened those who initiated and participated in such organizations, not the world as a whole.

and

> Outside of a foom scenario, control failures threaten to cause local, not global losses (though on increasingly larger scales).

But don't provide much argument---if government and law enforcement cannot effectively apply AI to protect human interests because of technological limitations, then that "local" problem will arise in *every* locality, i.e. it will be a global problem.

You say:

> such initiators and participants thus have incentives to figure out how to avoid such control losses, and this has long been a big focus of organization innovation efforts

With which I agree; future people will have incentives to deal with these problems. But the argument for working on control now is that if there is a problem with small investment today and large investment in the future, additional investment today can be much more highly leveraged than additional investment in the future (owing to serial dependencies and communication latency).

I guess you give the relevant example:

> And citizens who let AI tell their police who to put in jail may suffer in jail, or from undiscouraged crime. But such risks are mostly local, not global risks.

But you seem to have basically agreed that police need to use AI to enforce the law. So citizens everywhere will face a decision between allowing AI to be involved in law enforcement, and tolerating ineffective law enforcement.

You point to an analogy with other control tasks:

> The future problem of keeping control of advanced software is similar to the past problems of keeping control both of physical devices, and of social organizations.

That's an analogy I've considered at some length, and it doesn't make me feel much more optimistic. You don't say much about why it should make us feel more optimistic. It's not convincing to just say that two things look similar, and one isn't catastrophic. One data point does not make a strong inductive trend (and the interpretation is not even clear, agency costs today are quite large and mainly acceptable collectively because they are paid to other humans and easily dominated by wages anyway, before we even get into large differences in cognitive ability which are one of the main sources of concern in the AI case).

Expand full comment

Michael Vassar

May 15, 2023

Law, property rights, command and control mechanisms, heirarchy and democracy all converge on degenerate states of social control where they become self evident lost purposes, eventually leading to the society being overthrown from outside by less civilized humans. Movement towards unipolarity makes the cycle take longer as there are fewer people around to overthrow it.

It's probably possible to do better if we are ever able to construct a system on a decent foundation of knowledge of human behavior, but I think it's more likely to happen by constructing desired genomes, since those are probably more linear and thus more understandable.

Expand full comment

Fleshy506

May 15, 2023

Not sure what you mean by “social control mechanisms usually go irrevocably out of control,” but I’m curious. The specific examples of social control mechanisms Robin gives are “law, property rights, constitutions, command hierarchies, and even democracy.” In what sense have these or other social control mechanisms gone out of control, and why irrevocably? Are you talking about the sort of thing people mean when they complain about there being too many lawyers, regulators, and/or bureaucrats gumming up the works? (I can think of more sinister examples, like totalitarian regimes, but those don’t seem so irrevocable.)

Also, regarding the trend towards unipolarity, are you proposing out of control social control mechanisms as a manifestation/mechanism of that? Are there any plausible alternative paths humanity could go down?

Expand full comment

RobinHanson

May 15, 2023

This post is not about the time estimate; is there nothing else in it you find relevant? I don't know of any posted criticisms of my time estimates to engage. (In general, if you want me to engage something, TELL ME IT EXISTS.) Also, I don't know what you are talking about re 400 years for visual processing.

Expand full comment

Overcoming Bias Commenter

May 15, 2023

> Today is a very early date to be working on AI risk; I’ve estimated that without ems it is several centuries away.

If I thought AI were very likely to be > a century away, then I wouldn't work on AI risk. I expect most people interested in safety don't believe your estimates.

I see good arguments that ML-similar-to-today has a >20% chance of scaling up to human-level performance, and if it does then work done in the context of existing ML has a good chance of being applicable to the earliest powerful AI systems. If this is the case, then it's a good deal, since it seems quite easy to make progress on such problems relative to their importance. In the future spending will be radically larger, and it will be much harder to have a similar impact.

(You lean a lot on your timeline estimation methodology, but haven't really made the argument precise or engaged with some typical criticisms, and I don't currently consider it nearly as convincing as other lines of argument. The predictions also already look like they are in bad empirical shape---400 years for early visual processing in humans?)

Expand full comment

Stefan Schubert

May 15, 2023

Effective altruists have talked about "narrow" vs "broad" interventions to reduce existential risk and shape the long-term future, where narrow interventions target specific risks and include, e.g., technical AI safety work, and where broad interventions target multiple risks and include, e.g., institutional reform.

You here suggest some broad interventions:

These are interesting ideas. I'd be very interested if you had further thoughts on what broad interventions to pursue.

Expand full comment

Nathan Taylor

May 15, 2023

My underlying thought here is taken from the common criticism of group selection by evolutionary theorists. Where groupish behavior is constantly undermined by cheaters in the group. So human cooperation ultimately is built on social cooperation instincts (gossip/policing/punishing against norm violations). And going down a level, our bodies are constantly fighting off cancers (defectors) that evolve.

That is to say, I don't think a society of pyshcopaths/purely calculating actors can exist. They only exist in human society in small numbers, and above a certain threshold society would end. So our groupish instincts seek to find them and cast them out. A type of immune system.

The more competent the psychopaths, the faster their society would end with all defecting on all in the prisoners dilemma. Tit-for-tat is not enough for the center to hold. I'm not sure of your view. But sounds like you believe a society can exist whose members are rationally very competent in social skills, but do not have an innate baseline program ("instinct") to abide certain group beneficial norms. I'm skeptical of that being possible. Sorry if I'm confused on your position.

Unfortunately examples in the animal world aren't too helpful, because of course all cooperative animal groups use instinct for this, not rational calculation. But to the extent this applies, I'd say suggestively instinct for group behavior is hard to evolve or retain. Humans arguably boot strapped it from our unusual selective pressure that came from social learning (aka Henrich). Where the groupish traits got pulled along by selective pressure for social learning. We are constantly under that darwinian/malthusian pressure, but in special circumstance that pressure operates at multiple levels at once: cell, organism, tribe. Getting to social cooperating groups (tribes) is highly unlikely, and potentially unstable.

Expand full comment