Reply to Christiano on AI Risk

Paul Christiano was one of those who encouraged me to respond to non-foom AI risk concerns. Here I respond to two of his posts he directed me to. The first one says we should worry about the following scenario:

Imagine using [reinforcement learning] to implement a decentralized autonomous organization (DAO) which maximizes its profit. .. to outcompete human organizations at a wide range of tasks — producing and selling cheaper widgets, but also influencing government policy, extorting/manipulating other actors, and so on.

The shareholders of such a DAO may be able to capture the value it creates as long as they are able to retain effective control over its computing hardware / reward signal. Similarly, as long as such DAOs are weak enough to be effectively governed by existing laws and institutions, they are likely to benefit humanity even if they reinvest all of their profits.

But as AI improves, these DAOs would become much more powerful than their human owners or law enforcement. And we have no ready way to use a prosaic AGI to actually represent the shareholder’s interests, or to govern a world dominated by superhuman DAOs. In general, we have no way to use RL to actually interpret and implement human wishes, rather than to optimize some concrete and easily-calculated reward signal. I feel pessimistic about human prospects in such a world. (more)

In a typical non-foom world, if one DAO has advanced abilities, then most other organizations, including government and the law, have similar abilities. So such DAOs shouldn’t find it much easier to evade contracts or regulation than do organizations today. Thus humans can be okay if law and government still respect human property rights or political representation. Sure it might be hard to trust such a DAO to manage your charity, if you don’t trust it to judge who is in most need. But you might trust it much to give you financial returns on your financial investments in it.

Paul Christiano’s second post suggests that the arrival of AI arrives will forever lock in the distribution of patient values at that time:

The distribution of wealth in the world 1000 years ago appears to have had a relatively small effect—or more precisely an unpredictable effect, whose expected value was small ex ante—on the world of today. I think there is a good chance that AI will fundamentally change this dynamic, and that the distribution of resources shortly after the arrival of human-level AI may have very long-lasting consequences. ..

Whichever values were most influential at one time would remain most influential (in expectation) across all future times. .. The great majority of resources are held by extremely patient values. .. The development of machine intelligence may move the world much closer to this naïve model. .. [Because] the values of machine intelligences can (probably, eventually) be directly determined by their owners or predecessors. .. it may simply be possible to design a machine intelligence who exactly shares their predecessor’s values and who can serve as a manager. .. the arrival of machine intelligence may lead to a substantial crystallization of influence .. an event with long-lasting consequences. (more)

That is, Christiano says future AI won’t have problems preserving its values over time, nor need it pay agency costs to manage subsystems. Relatedly, Christiano elsewhere claims that future AI systems won’t have problems with design entrenchment:

Over the next 100 years greatly exceeds total output over all of history. I agree that coordination is hard, but even spending a small fraction of current effort on exploring novel redesigns would be enough to quickly catch up with stuff designed in the past.

A related claim, that Christiano supports to some degree, is that future AI are smart enough to avoid suffers from coordination failures. They may even use “acasual trade” to coordinate when physical interaction of any sort is impossible!

In our world, more competent social and technical systems tend to be larger and more complex, and such systems tend to suffer more (in % cost terms) from issues of design entrenchment, coordination failures, agency costs, and preserving values over time. In larger complex systems, it becomes harder to isolate small parts that encode “values”; a great many diverse parts end up influencing what such systems do in any given situation.

Yet Christiano expects the opposite for future AI; why? I fear his expectations result more from far view idealizations than from observed trends in real systems. In general, we see things far away in less detail, and draw inferences about them more from top level features and analogies than from internal detail. Yet even though we know less about such things, we are more confident in our inferences! The claims above seem to follow from the simple abstract description that future AI is “very smart”, and thus better in every imaginable way. This is reminiscent of medieval analysis that drew so many conclusions about God (including his existence) from the “fact” that he is “perfect.”

But even if values will lock in when AI arrives, and then stay locked, that still doesn’t justify great efforts to study AI control today, at least relative to the other options of improving our control mechanisms in general, or saving resources now to spend later, either on studying AI control problems when we know more about AI, or just to buy influence over the future when that comes up for sale.

GD Star Rating
loading...
Tagged as: , , ,
Trackback URL:
  • Vladimir Nesov

    > Thus humans can be okay if law and government still respect human property rights or political representation.

    Rather, if law and government DAOs respect human property rights or political representation. Some semblance of law might persist, but unless most DAOs become corrigible (including by government intervention), at some point this won’t have anything to do with humans.

  • Paul Christiano

    > most other organizations, including government and the law, have similar abilities […] Thus humans can be okay if law and government still respect human property rights or political representation.

    This the whole point. We don’t know how to give law enforcement or government access to the same abilities without *also* making law enforcement or government unaligned with humans. I do say (as you quote): “we have no ready way to use a prosaic AGI […] to govern a world dominated by superhuman DAOs.”

    > In our world, more competent social and technical systems tend to be larger and more complex, and such systems tend to suffer more (in % cost terms) from issues of design entrenchment, coordination failures, agency costs, and preserving values over time. In larger complex systems, it becomes harder to isolate small parts that encode “values”; a great many diverse parts end up influencing what such systems do in any given situation.

    AI doesn’t avoid agency costs because it’s more complex. It avoids agency costs because it was designed by the principal to specification. This is a massive difference, indeed it is *the* important difference w.r.t. agency costs, so ignoring it is crazy. Today agency costs mostly accrue in systems that involve humans, who are *not* designed by the principal. We should be arguing about whether those agency costs are an artifact of having to “work with what biology gives us.” I think that’s most likely, and don’t feel like you have argued against it.

    > that still doesn’t justify great efforts to study AI control today, at least relative to the other options of improving our control mechanisms in general, or saving resources now to spend later, either on studying AI control problems when we know more about AI, or just to buy influence over the future when that comes up for sale

    Yes, this is a response to one particular objection you have to AI control research and not a complete cost benefit analysis. If you want to offer a particular concrete alternative that you think is better I might engage with that particular alternative. I think that by working on AI control for a year I can increase human values’ influence over the future by well over 1/million, which seems very attractive compared to the alternatives I’ve considered.

    • http://overcomingbias.com RobinHanson

      My whole long last post was all about we could reasonably expect most control problems to be solved in typical non-foom scenarios, and all you can do is just repeat your “no way” claim? I need a lot more detail than that to convince me there’s a big AI control problem, and that’s what I’ve been asking for when I ask for writing explaining why there’s a big control problem in a non-foom scenario.

      And you can’t just declare agency problems to be gone because that’s in your specification for an AI! A world of advanced software is mainly defined by the fact that such software has high competence, so that it can displace human workers. All other properties must be argued for.

      • arch1

        Robin, like Vladimir I’d like to see your response to Paul’s first point above.

      • Paul Christiano

        It seems like there is some miscommunication here. Your first paragraph says we can reasonably expect control problems to be solved before any damage is done. Your second paragraph says that agency problems with AI will probably remain a problem for a very long time.

        From my perspective, those claims contradict one another. If we continue to pay large agency costs for AI systems, then we have a control problem.

        Do you see solving control problems as qualitatively different from reducing agency costs?
        Is the issue here a quantitative one? When you say that “expect control problems to be solved,” how low do you expect agency costs to be quantitatively? When you say that we will continue to pay agency costs, does that mean something other than gradually ceding control of the world to AI systems? I don’t understand your picture here, and this comment really highlights my confusion.

        To be clear, my perspective is that there is likely to be a transient period with large agency costs; those agency costs are likely to fall over time as our AI design ability improves. This is why I am worried about control for early AI systems, yet think that eventually the distribution of values will stabilize as control mechanisms improve. I don’t see how you can have the opposite pair of beliefs.

        > all you can do is just repeat your “no way” claim

        You asked for arguments that are relevant to non-foom scenarios. The normal arguments for the difficulty of control have little to do with foom, instead they are based on looking at particular approaches to designing AI and observing that we don’t see a good path to reducing agency costs. I thought that you understood those arguments but had objections related to the foom assumption.

      • http://overcomingbias.com RobinHanson

        In the context of foom, the usual AI concern is a total loss of control of the one super AI, whose goals quickly drift to a random point in the space of possible goals. Humans are then robustly exterminated. As the AI is so smart and inscrutable, any small loss of control is said to open the door to such extreme failure. So when you tell me that I shouldn’t focus so much on foom, as many are similarly concerned about non-foom scenarios, I presume that the focus remains on this sort of failure.

        Today most social systems suffer from agency costs, and larger costs (in % terms) in larger systems. But these mostly take the form of modestly increasing costs. It isn’t that you can’t reliably use these systems to do the things that you want. You just have to pay more. That extra cost mostly isn’t a transfer accumulating in someone’s account. Instead there is just waste that goes to no one, and there are more cushy jobs and roles where people can comfortably sit as parasites. Over time, even though agency costs take a bigger cut, total costs get lower and humans get more of what they want.

        When I say that in my prototypical non-foom AI scenario, AI will still pay agency costs but the AI control problem is mostly manageable, I mean that very competent future systems will suffer from waste and parasites as do current systems, but that humans can still reliably use such systems to get what they want. Not only are humans not exterminated, they get more than before of what they want.

      • Vladimir Nesov

        This is helpful. So this future is built mostly from unaligned AIs which mostly do their own thing rather than anything of value, but the dynamic by which the future is built ensures that taken together, the future is still valuable and probably doesn’t exterminate humans. Lack of foom is important in this dynamic, in that any AI disasters are not global catastrophes and get paved over by the rest of the world, just as we do today with rogue institutions and countries. Large agency costs mean that AIs remain unaligned, even though the world of AIs robustly maintains a sliver of alignment. Not being under humans’ control, the world as a whole doesn’t converge on being efficiently aligned with humans, instead humans get a small fraction of global value and unaligned AIs get most of it. Still, human values prosper relative to pre-AI world.

        In these terms, one worry is that opportunity cost if astronomical compared to a world of aligned AIs that doesn’t have significant agency costs. So current work on AI alignment is valuable for reducing this opportunity cost, by making the AIs more aligned, but that work will be motivated by agency costs anyway, throughout the process of losing control over the world as a whole. Unfortunately, the process seems irreversible, most of the whole future is given to agency costs and can’t be recaptured even when we eventually figure out alignment.

        Another problem is that this world is a lot different from today and humans are not crucial to its maintenance, so there is risk that one of the details that eventually changes is that somehow all human value/influence is eventually gone, despite presently available convincing arguments that this won’t be the case.

      • http://overcomingbias.com RobinHanson

        Not sure I agree with your “mostly” and “sliver” claims.

      • Peter McCluskey

        I’m puzzled by this claim:

        >It isn’t that you can’t reliably use these systems to do the things that you want. You just have to pay more.

        I thought that half the point of your paper He Who Pays The Piper Must Know The Tune was that people fail to get what they say they want from doctors, newspapers, professors, etc., and that simply paying more won’t get them what they asked for (health, knowledge, etc.). And that one solution to that problem is value alignment (“experts caring directly about being honest or about client topics”).

        I understand how you can believe humans will get more of what they want for a while in a world dominated by unaligned AI. But I don’t see how long-term human prospects in such a world are any better than the prospects of modern hunter-gatherers.

      • http://overcomingbias.com RobinHanson

        I’m not saying we can all pay to get everything we want. But there is a list of things that we can pay to get today, and I’m saying we should still be able to pay to get those things in a world full of AI.

        For the long run, there’s a big difference between fearing that your wealth, or your wealth fraction will decline, because you are not a competitive producer, and fearing extermination.

      • Peter McCluskey

        I’m having trouble seeing that difference. It seems like the default assumption ought to be that humans face a nontrivial risk of extinction in this scenario. It’s unclear why you seem unconcerned by that risk.

      • http://overcomingbias.com RobinHanson

        The issue is to categorize the types of risk so as to prioritize efforts. The optimal types of efforts for the scenarios I’m describing seem different than for foom.

    • arch1

      “AI doesn’t avoid agency costs because it’s more complex. It avoids agency costs because it was designed by the principal to specification.”

      Paul, you seem to believe that an AI whose design precludes agency costs between its top level and its subsystems, would necessarily out-compete an AI which is not subject to that constraint. Why?

    • Joe

      Paul, what’s the mechanism by which you expect problems of the kind Robin mentions – agency costs, design entrenchment, etc. – to not be an issue with systems that don’t involve humans? Is it that you think there are coordination-enabling features that can be implemented to solve these problems even in big complex systems, and biological evolution just didn’t happen to stumble across them? Or is it that you think these issues are inherent and unavoidable as systems grow larger and more detailed (as Robin claims), but you just think human-level AI will be pretty simple in design, not large and complex enough to suffer from these issues? Or is it something else?

      Robin, would you say this is an accurate characterisation of your position? That is, you expect advancing AI to accrue problems like agency costs precisely because you expect that for it to reach human-equivalent capability it will need layers upon layers of intricate detail; and so if it turned out human-level will be barely more complicated than today’s AI software, it would then be more plausible to talk about controlling it and its descendants by directly programming it to have the right values?

      Perhaps we could test these competing models by looking at the extent to which systems with varying levels of human involvement suffer from the problems mentioned, after controlling for system complexity. Viable?

      • http://overcomingbias.com RobinHanson

        I’m comfortable claiming that future competent software will be complex in design.

  • Pingback: Rational Feed – deluks917

  • Silent Cal

    It looks to me like you’re declaring the control problem solved on the basis that law enforcement will have AI, and never quite address the control problem with respect to law enforcement’s AI. (In the last post, also). I think this is what everyone else here is saying, too.

    • http://overcomingbias.com RobinHanson

      How do you think we address the control problem of law enforcement today?

      • Silent Cal

        The developed world has found a political equilibrium where democracy can mostly control law enforcement, but there’s no robust understanding of what makes this possible. Attempts to export this equilibrium tend to fail, and kleptocracy is the historical norm.

      • http://overcomingbias.com RobinHanson

        If democracy can work today to control law enforcement, why can’t it work for AI based law enforcement?

  • Yosarian2

    Even if AI feel bound to follow our legal and economic system, which itself would require a great deal of AI control theory and FAI research to get right but let’s assume that it does, that’s still not necessarally a good scenario for us. If AI’s become better at working the system then we are, if they become better at finding loopholes in the law, better at investing, better at amassing wealth, better at figuring out how to manipulate the political system, ect, then humans could basically lose control so long as humans are also bound by those same rules. We could be talking about general AI’s that are generally smarter then us, or about more narrow AI’s that are individually better at us in one specific field (law, investing, banking, lobbying, ect); either way, we would be out-competed, and increasingly powerless in terms of working through the system.

  • Kevin S Van Horn

    How confident are you that something like the “foom” scenario for superintelligent AI won’t occur? 90%? 99%? When the stakes are very high, even a small probability is worth worrying about.

    • http://overcomingbias.com RobinHanson

      I’ve never claimed that no one should worry about the foom scenario.