AI Risk, Again
Large language models like ChatGPT have recently spooked a great many, and my Twitter feed is full of worriers saying how irresponsible orgs have been to make and release such models. Because, they say, such a system might have killed us all. And, as some researchers say that they are working on how to better control such things, worriers say we must regulate to slow/stop AI progress until such researchers achieve their goals. While I’ve written on this many times before, it seems time to restate my position.
First, if past trends continue, then sometimes in the next few centuries the world economy is likely to enter a transition that lasts roughly a decade, after which it may double every few months or faster, in contrast to our current fifteen year doubling time. (Doubling times have been relatively steady as innovations are typically tiny compared to the world economy.)
The most likely cause for such a transition seems to be a transition to an economy dominated by artificial intelligence (AI). (Perhaps in the form of brain emulations, but perhaps also in more alien forms.) Especially as the doubling time of a fully-automated factory today is a few months, and computer algorithm gains have been close to hardware gains. And within a year or two from then, another transition to an even faster mode might plausibly occur.
Second, coordination and control are hard. Today, org leaders often gain rents from their positions, rents which come at the expense of org owners, suppliers, and customers. This happens more-so at non-profits and publicly-held for-profits, compared to privately held for-profits. Political and military leaders also gain rents, and sometimes take over control of nations via coups. While leader rents are not the only control problem, the level of such rents is a rough indication of the magnitude of our control problems. Those who are culturally more distant from leaders, such as the poor and third world residents, typically pay higher rents.
Today such rents are non-trivial, but even so competition between orgs keeps them tolerable. That is, we mostly keep our orgs under control. Even though, compared to individual humans, large orgs are in effect “super-intelligences”.
Third, there may be extra obstacles to slow bio humans controlling future org ventures. Bio humans would be more culturally distant, slower, and less competent than em AIs. (Though the principle-agent lit doesn’t yet show smarts differences to be an issue.) And non-em AIs could be even more culturally distant. However, even an increase of a factor of two or four in control rents for AIs seems tolerable, offering such bio humans a rich and growing future. Yes, periodically some ventures would suffer the equivalent of a coup. But if, like today, each venture were only a small part of this future world, bio humans as a whole would do fine. Ems, if they exist, could do even better.
Of course the owners of such future ventures, be they bio humans, ems, or other, are well advised to consider how best to control such ventures, to cut leader rents and other related costs of imperfect control. But such efforts seem most effective when based on actual experience with concrete fielded systems. For example, there was little folks could do in the year 1500 to figure out how to control 20th century orgs, weapons, or other tech. Thus as we now know very little about the details of future AI-based ventures, leaders, or systems, we should today mostly either save resources to devote to future efforts, or focus our innovation efforts on improving control of existing ventures. Such as via decision markets.
Most of the worriers mentioned above, however, reject the above analysis, based as it is on expecting a continuation of historical patterns, wherein ventures and innovations have been consistently small compared to the world economy. They instead say that it is possible that a single small AI venture might stumble across a single extremely potent innovation, which enables it to suddenly “foom”, i.e., explode in power from tiny compared to the world economy, to more powerful than the entire rest of the world put together. (Including all other AIs.)
This scenario requires that this venture prevent other ventures from using its key innovation during this explosive period. It also requires that this new more powerful system not only be far smarter in most all important areas, but also be extremely capable at managing its now-enormous internal coordination problems. And it requires that this system be not a mere tool, but a full “agent” with its own plans, goals, and actions.
Furthermore it is possible that even though this system was, before this explosion, and like most all computer systems today, very well tested to assure that its behavior was aligned well with its owners’ goals across its domains of usage, its behavior after the explosion would be nearly maximally non-aligned. (That is, orthogonal in a high dim space.) Perhaps resulting in human extinction. The usual testing and monitoring processes would be prevented from either noticing this problem or calling a halt when it so noticed, either due to this explosion happening too fast, or due to this system creating and hiding divergent intentions from its owners prior to the explosion.
While I agree that this is a logically possible scenario, not excluded by what we know, I am disappointed to see so many giving it such a high credence, given how crazy far it seems from our prior experience. Yes, there is a sense in which the human, farming, and industry revolutions were each likely the result of a single underlying innovation. But those were the three biggest innovations in all of human history. And large parts of the relevant prior world exploded together in those cases, not one tiny part suddenly exterminating all the rest.
In addition, the roughly decade duration predicted from prior trends for the length of the next transition period seems plenty of time for today’s standard big computer system testing practices to notice alignment issues. And note that the impressive recent AI chatbots are especially unlike the systems of concern here: self-improving very-broadly-able full-agents with hidden intentions. Making this an especially odd time to complain that new AI systems might have killed us all.
You might think that folks would take a lesson from our history of prior bursts of anxiety and concern about automation, bursts which have appeared roughly every three decades since at least the 1930s. Each time, new impressive demos revealed unprecedented capabilities, inducing a burst of activity and discussion, with many then expressing fears that a rapid explosion might soon commence, automating all human labor. They were, of course, very wrong.
Worriers often invoke a Pascal’s wager sort of calculus, wherein any tiny risk of this nightmare scenario could justify large cuts in AI progress. But that seems to assume that it is relatively easy to assure the same total future progress, just spread out over a longer time period. I instead fear that overall economic growth and technical progress is more fragile that this assumes. Consider how regulations inspired by nuclear power nightmare scenarios have for seventy years prevented most of its potential from being realized. I have also seen progress on many other promising techs mostly stopped, not merely slowed, via regulation inspired by vague fears. In fact, progress seems to me to be slowing down worldwide due to excess fear-induced regulation.
Over the last few centuries the world did relatively little to envision problems with future techs, and to prepare for those problems far in advance of seeing concrete versions. And I just do not believe that the world would have been better off if we had instead greatly slowed tech progress in order to attempt such preparations. Especially considering the degree of centralized controls that might have been required to implement such a slowdown policy.
As I discussed above, it just looks way too early to learn much about how to control future AI systems, about which we know so few details. Thus when facing the risk of our fear essentially halting progress here, I’d rather continue down our current path, and work harder on controls when we better see concrete serious control problems to manage.
Added March 7: Some say that, given enough data and hardware, predict-the-next-token models like ChatGPT will have human or better performance. Using action tokens, that would include many kinds of behavior. But this isn’t sufficient for such a system to rapidly “foom”. To even try, it needs high competence in design and testing alternative system architectures, and there’s no guarantee even with that.
Would you consider inviting Yud for another debate? Your last one became a classic. :)
I'd love it if those who aren't worried tackled the AI doom arguments directly. Please acknowledge things like instrumental convergence, the orthogonality thesis, mesa-optimizers, the issues of not knowing how to mathematically formalize human values in order to not be goodhearted by proxies.
Otherwise all I'm left with is: Robin Hanson is a pretty smart guy and isn't worried. That makes me update somewhat towards being less worried. On the other hand Eliezer is also pretty smart and has pages upon pages of technical arguments for why we should be worried. And so far I haven't seen any critics pointing out flaws in his reasoning. I'm not sure if I'm making a mistake but from where I stand I can't help but be very worried.