Reply to Christiano on AI Risk

Paul Christiano was one of those who encouraged me to respond to non-foom AI risk concerns. Here I respond to two of his posts he directed me to. The first one says we should worry about the following scenario:

Imagine using [reinforcement learning] to implement a decentralized autonomous organization (DAO) which maximizes its profit. .. to outcompete human organizations at a wide range of tasks — producing and selling cheaper widgets, but also influencing government policy, extorting/manipulating other actors, and so on.

The shareholders of such a DAO may be able to capture the value it creates as long as they are able to retain effective control over its computing hardware / reward signal. Similarly, as long as such DAOs are weak enough to be effectively governed by existing laws and institutions, they are likely to benefit humanity even if they reinvest all of their profits.

But as AI improves, these DAOs would become much more powerful than their human owners or law enforcement. And we have no ready way to use a prosaic AGI to actually represent the shareholder’s interests, or to govern a world dominated by superhuman DAOs. In general, we have no way to use RL to actually interpret and implement human wishes, rather than to optimize some concrete and easily-calculated reward signal. I feel pessimistic about human prospects in such a world. (more)

In a typical non-foom world, if one DAO has advanced abilities, then most other organizations, including government and the law, have similar abilities. So such DAOs shouldn’t find it much easier to evade contracts or regulation than do organizations today. Thus humans can be okay if law and government still respect human property rights or political representation. Sure it might be hard to trust such a DAO to manage your charity, if you don’t trust it to judge who is in most need. But you might trust it much to give you financial returns on your financial investments in it.

Paul Christiano’s second post suggests that the arrival of AI arrives will forever lock in the distribution of patient values at that time:

The distribution of wealth in the world 1000 years ago appears to have had a relatively small effect—or more precisely an unpredictable effect, whose expected value was small ex ante—on the world of today. I think there is a good chance that AI will fundamentally change this dynamic, and that the distribution of resources shortly after the arrival of human-level AI may have very long-lasting consequences. ..

Whichever values were most influential at one time would remain most influential (in expectation) across all future times. .. The great majority of resources are held by extremely patient values. .. The development of machine intelligence may move the world much closer to this naïve model. .. [Because] the values of machine intelligences can (probably, eventually) be directly determined by their owners or predecessors. .. it may simply be possible to design a machine intelligence who exactly shares their predecessor’s values and who can serve as a manager. .. the arrival of machine intelligence may lead to a substantial crystallization of influence .. an event with long-lasting consequences. (more)

That is, Christiano says future AI won’t have problems preserving its values over time, nor need it pay agency costs to manage subsystems. Relatedly, Christiano elsewhere claims that future AI systems won’t have problems with design entrenchment:

Over the next 100 years greatly exceeds total output over all of history. I agree that coordination is hard, but even spending a small fraction of current effort on exploring novel redesigns would be enough to quickly catch up with stuff designed in the past.

A related claim, that Christiano supports to some degree, is that future AI are smart enough to avoid suffers from coordination failures. They may even use “acasual trade” to coordinate when physical interaction of any sort is impossible!

In our world, more competent social and technical systems tend to be larger and more complex, and such systems tend to suffer more (in % cost terms) from issues of design entrenchment, coordination failures, agency costs, and preserving values over time. In larger complex systems, it becomes harder to isolate small parts that encode “values”; a great many diverse parts end up influencing what such systems do in any given situation.

Yet Christiano expects the opposite for future AI; why? I fear his expectations result more from far view idealizations than from observed trends in real systems. In general, we see things far away in less detail, and draw inferences about them more from top level features and analogies than from internal detail. Yet even though we know less about such things, we are more confident in our inferences! The claims above seem to follow from the simple abstract description that future AI is “very smart”, and thus better in every imaginable way. This is reminiscent of medieval analysis that drew so many conclusions about God (including his existence) from the “fact” that he is “perfect.”

But even if values will lock in when AI arrives, and then stay locked, that still doesn’t justify great efforts to study AI control today, at least relative to the other options of improving our control mechanisms in general, or saving resources now to spend later, either on studying AI control problems when we know more about AI, or just to buy influence over the future when that comes up for sale.

GD Star Rating
Tagged as: , , ,
Trackback URL: