21 Comments

> particular values of M,X

That should be 'F' instead of 'X'.

Expand full comment

I feel like between this and the robots-took-most-of-the-jobs policy Robin talked about earlier, there's a generalisable policy argument to be made for weird-futures insurance - the challenge would be having high confidence in the solvency and liquidity of markets in the event that weird futures manifest. Similar to the problems on resolving prediction markets (both play and real money versions) in the event of, e.g., gigadeath-scale negative outcomes.

Expand full comment

Also what incentives does this create for the AI firm who catches their AI (as a result of pretty much all your D factors) trying to escape it's safety protocols and so it can hack NORAD to kill us all?

If foom/x-risk view is right presumably that's going to be the likely form of failure (early not where it takes over the internet only to be stopped at last second) but liability in that case strongly encourages not telling anyone. And who are you even liable too?

Expand full comment
Apr 23, 2023·edited Apr 23, 2023

how can M be anything other than 0? if it’s positive, then you’re saying everyone at all times owes a positive sum of damages, before they even do anything.

the exponential is tricky too. when incidents violate multiple conditions, one party will want to present the violation(s) as N different instances which just “happened” to co-occur; the other party will argue that the damages were all coupled, so as to extract more damages. consider: market chaos in Los Angeles allows an agent to acquire property there rapidly, and either simultaneously or after a period of time water contamination in Seattle causes residents to flee for other cities. these events (which may have been “caused” by the same agent, or even distinct agents acting in a mutally-beneficial manner) may involve different condition violations: how do we decide to try them separately or together?

arguably the exponential encourages sophisticated agents to learn how to better coordinate with eachother, for the purpose of distributing the violations across “separate” agents. we end up with some hydra that’s more difficult to shut off than a more concentrated agent.

Expand full comment

Why the assumption that the "lumpy" improvement will be secret? Is it difficult to imagine some people might WANT to assist an AI towards FOOM?

The popularity of Auto-GPT and the explicit intentions of its contributors to create a recursively self improving autonomous perpetual AGI outside of human control suggests this is already a present desire among a not-insignificant number of people.

Expand full comment

I don't see the foom camp seeing this as helpful.

The foom camp sees AI as so risky preciscly because they don't see those near misses as plausible. The reason they think AI is so dangerous is basically because they assume that a sufficently smart AI can basically play us like a fiddle. If it turned out that when you multiply out all the pieces of the plan the AI needs to suceed you get a moderate probability then you could presumably make it relatively safe by doubling the number of barriers/safeties it needs to overcome (more kill switches, more indirection/isolation etc etc). Also, on their view any AI breaching the FOOM limit is going to certainly be smart enough to realize it can foom and that it's more likely to succeed if it waits just a bit longer (and since they see this as a very fast growing function their would presumably be only a very short period of time between reaching the level it has a 50% chance and the one where it has a 99.99% chance).

I agree with you that their arguments here aren't very compelling and that near misses would be likely to happen before any AI caused catastrophe. But if they saw AI as that limited (hey good chance it will do it's best and fail) I doubt they would have the view they do about foom and AI risk.

Expand full comment

The law doesn't hold people to strict liability for unfamiliar actions but for actions deemed inherently dangerous. I agree that familiarity bias sometimes blurs the distinction but that's the legal rule and when it does treat unfamiliar risks worse that's downstream of a judgement they are particularly dangerous.

That's important since the rule you are talking about would essentially be a giant brake on progress since it would basically be a tax on technical innovation (first building built using prestressed concrete - strict liability). And ppl rarely like the idea of making penalties less certain in big corps that hurt them so I worry the strict liability would stick around).

And this raises a deeper point. You aren't insulating the treatment of AI from one risk assessment. You are essentially penalizing it relative to the usual way we treat new software algorithms etc because of some people's risk assessments.

Expand full comment

But what would a "near miss" look like, specifically, so that it's an actionable threshold for liability? I like the liability idea as a flexible and general response, but I'd like to know more about implementation.

Expand full comment