To extend our reach, we humans have built tools, machines, firms, and nations. And as these are powerful, we try to maintain control of them. But as efforts to control them usually depend on their details, we have usually waited to think about how to control them until we had concrete examples in front of us. In the year 1000, for example, there wasn’t much we could do to usefully think about how to control most things that have only appeared in the last two centuries, such as cars or international courts.
Someday we will have far more powerful computer tools, including “advanced artificial general intelligence” (AAGI), i.e., with capabilities even higher and broader than those of individual human brains today. And some people today spend substantial efforts today worrying about how we will control these future tools. Their most common argument for this unusual strategy is “foom”.
That is, they postulate a single future computer system, initially quite weak and fully controlled by its human sponsors, but capable of action in the world and with general values to drive such action. Then over a short time (days to weeks) this system dramatically improves (i.e., “fooms”) to become an AAGI far more capable even than the sum total of all then-current humans and computer systems. This happens via a process of self-reflection and self-modification, and this self-modification also produces large and unpredictable changes to its effective values. They seek to delay this event until they can find a way to prevent such dangerous “value drift”, and to persuade those who might initiate such an event to use that method.
I’ve argued at length (1 2 3 4 5 6 7) against the plausibility of this scenario. Its not that its impossible, or that no one should work on it, but that far too many take it as a default future scenario. But I haven’t written on it for many years now, so perhaps it is time for an update. Recently we have seen noteworthy progress in AI system demos (if not yet commercial application), and some have urged me to update my views as a result.
The recent systems have used relative simple architectures and basic algorithms to produce models with enormous numbers of parameters from very large datasets. Compared to prior systems, these systems have produced impressive performance on an impressively wide range of tasks. Even though they are still quite far from displacing humans in any substantial fraction of their current tasks.
For the purpose of reconsidering foom, however, the key things to notice are: (1) these systems have kept their values quite simple and very separate from the rest of the system, and (2) they have done basically zero self-reflection or self-improvement. As I see AAGI as still a long way off, the features of these recent systems can only offer weak evidence regarding the features of AAGI.
Even so, recent developments offer little support for the hypothesis that AAGI will be created soon via the process of self-reflection and self-improvement, or for the hypothesis that such a process risks large “value drifts”. These current ways that we are now moving toward AAGI just don’t look much like the foom scenario. And I don’t see them as saying much about whether ems or AAGI will appear first.
Again, I’m not saying foom is impossible, just that it looks unlikely, and that recent events haven’t made it seem moreso.
These new systems do suggest a substantial influence of architecture on system performance, though not obviously at a level out of line with that in most prior AI systems. And note that the abilities of the very best systems here are not that much better than that of the 2nd and 3rd best systems, arguing weakly against AAGI scenarios where the best system is vastly better.