Discover more from Overcoming Bias
Compared to some, I am less worried about extreme near-term AI doom scenarios. But I also don’t like policy being sensitive to my or anyone else’s risk estimates. I instead prefer robust policies, ones we can expect to promote total welfare for a wide range of parameter values. We want policies that will give big benefits if foom risk is high, but impose low costs if foom risk is low. In that spirit, let me suggest as a compromise a particular apparently-robust policy for dealing with AI foom risk.
If you recall, the foom scenario requires an AI system that a) is tasked with improving itself. It finds a quite unusually lumpy innovation for that task that is a) secret b) huge c) improves well across a very wide scope of tasks, and d) continues to create rapid gains over many orders of magnitude of ability. By assumption, this AI then improves fast. It somehow e) become an agent with a f) wide scope of values and actions, g) its values (what best explains its choices) in effect change radically over this growth period, yet h) its owners/builders do not notice anything of concern, or act on such concerns, until this AI becomes able to either hide its plans and actions well or to wrest control of itself from its owners and resist opposition. After which it just keeps growing, and then acts on its radically-changed values to kill us all.
Given how specific is this description, it seems plausible that for every extreme scenario like this there are many more “near miss” scenarios which are similar, but which don’t reach such extreme ends. For example, where the AI tries but fails to hide its plans or actions, where it tries but fails to wrest control or prevent opposition, or where it does these things yet its abilities are not broad enough for it to cause existential damage. So if we gave sufficient liability incentives to AI owners to avoid near-miss scenarios, with the liability higher for a closer miss, those incentives would also induce substantial efforts to avoid the worst-case scenarios.
In liability law today, we hold familiar actions like car accidents to a negligence standard; you are liable if damage happens and you were not sufficiently careful. But for unfamiliar actions, where it seems harder to judge proper care levels, such as for using dynamite or having pet tigers, we hold people to a strict liability standard. As it is hard to judge proper foom care levels, it makes sense to use strict liability there.
Also, if there is a big chance that a harm might happen yet not result in a liability penalty, it makes to add extra “punitive” damages, for extra discouragement. Finally, when harms might be larger than the wealth of responsible parties, it makes sense to require liability insurance. That is, to make people at substantial risk of liability prove that they could pay damages if they were found liable.
Thus I suggest that we consider imposing extra liability for certain AI-mediated harms, make that liability strict, and add punitive damages according to the formulas D= (M+H)*F^N. Here D is the damages owed, H is the harm suffered by victims, M>0,F>1 are free parameters of this policy, and N is how many of the following eight conditions contributed to causing harm in this case: self-improving, agentic, wide scope of tasks, intentional deception, negligent owner monitoring, values changing greatly, fighting its owners for self-control, and stealing non-owner property.
If we could agree that some sort of cautious policy like this seems prudent, then we could just argue over the particular values of M,F.
Added 7a: Yudkowsky, top foom doomer, says
If this liability regime were enforced worldwide, I could see it actually helping.