I feel like between this and the robots-took-most-of-the-jobs policy Robin talked about earlier, there's a generalisable policy argument to be made for weird-futures insurance - the challenge would be having high confidence in the solvency and liquidity of markets in the event that weird futures manifest. Similar to the problems on resolving prediction markets (both play and real money versions) in the event of, e.g., gigadeath-scale negative outcomes.
Also what incentives does this create for the AI firm who catches their AI (as a result of pretty much all your D factors) trying to escape it's safety protocols and so it can hack NORAD to kill us all?
If foom/x-risk view is right presumably that's going to be the likely form of failure (early not where it takes over the internet only to be stopped at last second) but liability in that case strongly encourages not telling anyone. And who are you even liable too?
> Non-crime law deals mostly with accidents and mild sloppy selfishness among parties who are close to each other in a network of productive relations. [...] This approach, however, can fail when “criminals” make elaborate plans to grab gains from others in ways that make they, their assets, and evidence of their guilt hard to find.
I like the idea of having entities feel the financial risk of AI misalignment and D= (M+H)*F^N goes a long way if it can be agreed to. Other concepts I think would be valuable in a robust risk monetization system could be included in the equation. For example, the penalty could be reduced by half if a 3rd party safety audit had been done (similar to GDPR liability rules and evoking comparisons to Underwriter Labs). Also, perhaps a company at fault must always pay 20% of D, but could insure 80% of D through an insurance company (similar to malpractice insurance), this could encourage a robust competitive insurance market which itself would be a 3rd party review of safety as the insurance companies would want to reduce their exposure.
how can M be anything other than 0? if it’s positive, then you’re saying everyone at all times owes a positive sum of damages, before they even do anything.
the exponential is tricky too. when incidents violate multiple conditions, one party will want to present the violation(s) as N different instances which just “happened” to co-occur; the other party will argue that the damages were all coupled, so as to extract more damages. consider: market chaos in Los Angeles allows an agent to acquire property there rapidly, and either simultaneously or after a period of time water contamination in Seattle causes residents to flee for other cities. these events (which may have been “caused” by the same agent, or even distinct agents acting in a mutally-beneficial manner) may involve different condition violations: how do we decide to try them separately or together?
arguably the exponential encourages sophisticated agents to learn how to better coordinate with eachother, for the purpose of distributing the violations across “separate” agents. we end up with some hydra that’s more difficult to shut off than a more concentrated agent.
Why the assumption that the "lumpy" improvement will be secret? Is it difficult to imagine some people might WANT to assist an AI towards FOOM?
The popularity of Auto-GPT and the explicit intentions of its contributors to create a recursively self improving autonomous perpetual AGI outside of human control suggests this is already a present desire among a not-insignificant number of people.
If the trick is not secret, many projects can use it at the same time, and then we don't get one project vastly more capable than the rest of the world put together.
Then it just needs to be secret to one group who succeeds at it. A group developing successful recursively self-improving AI in secret seems far more likely than an AI spontaneously developing these capabilities without its creators knowing. The former seems highly likely while the latter almost implausible.
The foom camp sees AI as so risky preciscly because they don't see those near misses as plausible. The reason they think AI is so dangerous is basically because they assume that a sufficently smart AI can basically play us like a fiddle. If it turned out that when you multiply out all the pieces of the plan the AI needs to suceed you get a moderate probability then you could presumably make it relatively safe by doubling the number of barriers/safeties it needs to overcome (more kill switches, more indirection/isolation etc etc). Also, on their view any AI breaching the FOOM limit is going to certainly be smart enough to realize it can foom and that it's more likely to succeed if it waits just a bit longer (and since they see this as a very fast growing function their would presumably be only a very short period of time between reaching the level it has a 50% chance and the one where it has a 99.99% chance).
I agree with you that their arguments here aren't very compelling and that near misses would be likely to happen before any AI caused catastrophe. But if they saw AI as that limited (hey good chance it will do it's best and fail) I doubt they would have the view they do about foom and AI risk.
The law doesn't hold people to strict liability for unfamiliar actions but for actions deemed inherently dangerous. I agree that familiarity bias sometimes blurs the distinction but that's the legal rule and when it does treat unfamiliar risks worse that's downstream of a judgement they are particularly dangerous.
That's important since the rule you are talking about would essentially be a giant brake on progress since it would basically be a tax on technical innovation (first building built using prestressed concrete - strict liability). And ppl rarely like the idea of making penalties less certain in big corps that hurt them so I worry the strict liability would stick around).
And this raises a deeper point. You aren't insulating the treatment of AI from one risk assessment. You are essentially penalizing it relative to the usual way we treat new software algorithms etc because of some people's risk assessments.
If near misses as I've defined them are rare, then this new kind of liability suit will also be rare, and thus have only a modest effect on this industry.
Besides, the only way this makes any difference is if you define near miss in a contentious way. If a near miss means something that the AI builder would agree was strong evidence that AI development is likely to kill us all than it accomplishes nothing since at the point they are convinced AI is very likely to kill us all they are also convinced the cost of further AI research is far worse than any liability.
So this only really changes incentives if it imposes liability on events that some people see as not being evidence of AI x-risk (eg maybe they don't think self-improvement is a serious danger sign).
It almost feels like what you want is more along the lines of a policy bet. The high and low risk groups get together now and agree to certain triggers for more restrictions that the low risk group thinks won't happen but high risk does.
I assumed this was a different proposal on top of the near misses (or near miss could be non-rarw warning sign), i.e. we also impose strict liability on AI when it hurts people in mundane failures (the AI decides you are a terrorist threat wrongly or the AI driver runs you over ...if the owner wanted to go the other way it was fighting for control, and plenty of other mundane ways to push up D).
If not I retract this objection but in that case the liability will be irrelevant since every company knows it has bankruptcy as a floor and if you are just talking about near misses of world destroying events they are likely to miss bc they are stopped before someone is hurt or because the bioweapon only killed half the population. Either case seems unlikely to result in nonzero but non-bancrupting damages.
But what would a "near miss" look like, specifically, so that it's an actionable threshold for liability? I like the liability idea as a flexible and general response, but I'd like to know more about implementation.
> particular values of M,X
That should be 'F' instead of 'X'.
Fixed; thanks.
I feel like between this and the robots-took-most-of-the-jobs policy Robin talked about earlier, there's a generalisable policy argument to be made for weird-futures insurance - the challenge would be having high confidence in the solvency and liquidity of markets in the event that weird futures manifest. Similar to the problems on resolving prediction markets (both play and real money versions) in the event of, e.g., gigadeath-scale negative outcomes.
Also what incentives does this create for the AI firm who catches their AI (as a result of pretty much all your D factors) trying to escape it's safety protocols and so it can hack NORAD to kill us all?
If foom/x-risk view is right presumably that's going to be the likely form of failure (early not where it takes over the internet only to be stopped at last second) but liability in that case strongly encourages not telling anyone. And who are you even liable too?
Firms want to prevent undetected near misses exactly to also prevent detected near misses, which may be heavily penalized.
To answer the rhetorical: The incentive for the firm is to hide it.
Which is part of why Robin's fine-insured bounties in https://www.overcomingbias.com/p/privately-enforced-punished-crime.html makes much more sense to me as a mechanism:
> Non-crime law deals mostly with accidents and mild sloppy selfishness among parties who are close to each other in a network of productive relations. [...] This approach, however, can fail when “criminals” make elaborate plans to grab gains from others in ways that make they, their assets, and evidence of their guilt hard to find.
Wouldn't there be a countervailing incentive for someone inside the firm to take the equivalent of a short position, then leak the information?
Yes, and that's good!
Agreed - I'm all for rewarding whistleblowers / leakers of public-good info as a default policy behaviour.
I like the idea of having entities feel the financial risk of AI misalignment and D= (M+H)*F^N goes a long way if it can be agreed to. Other concepts I think would be valuable in a robust risk monetization system could be included in the equation. For example, the penalty could be reduced by half if a 3rd party safety audit had been done (similar to GDPR liability rules and evoking comparisons to Underwriter Labs). Also, perhaps a company at fault must always pay 20% of D, but could insure 80% of D through an insurance company (similar to malpractice insurance), this could encourage a robust competitive insurance market which itself would be a 3rd party review of safety as the insurance companies would want to reduce their exposure.
how can M be anything other than 0? if it’s positive, then you’re saying everyone at all times owes a positive sum of damages, before they even do anything.
the exponential is tricky too. when incidents violate multiple conditions, one party will want to present the violation(s) as N different instances which just “happened” to co-occur; the other party will argue that the damages were all coupled, so as to extract more damages. consider: market chaos in Los Angeles allows an agent to acquire property there rapidly, and either simultaneously or after a period of time water contamination in Seattle causes residents to flee for other cities. these events (which may have been “caused” by the same agent, or even distinct agents acting in a mutally-beneficial manner) may involve different condition violations: how do we decide to try them separately or together?
arguably the exponential encourages sophisticated agents to learn how to better coordinate with eachother, for the purpose of distributing the violations across “separate” agents. we end up with some hydra that’s more difficult to shut off than a more concentrated agent.
Why the assumption that the "lumpy" improvement will be secret? Is it difficult to imagine some people might WANT to assist an AI towards FOOM?
The popularity of Auto-GPT and the explicit intentions of its contributors to create a recursively self improving autonomous perpetual AGI outside of human control suggests this is already a present desire among a not-insignificant number of people.
If the trick is not secret, many projects can use it at the same time, and then we don't get one project vastly more capable than the rest of the world put together.
Then it just needs to be secret to one group who succeeds at it. A group developing successful recursively self-improving AI in secret seems far more likely than an AI spontaneously developing these capabilities without its creators knowing. The former seems highly likely while the latter almost implausible.
I don't see the foom camp seeing this as helpful.
The foom camp sees AI as so risky preciscly because they don't see those near misses as plausible. The reason they think AI is so dangerous is basically because they assume that a sufficently smart AI can basically play us like a fiddle. If it turned out that when you multiply out all the pieces of the plan the AI needs to suceed you get a moderate probability then you could presumably make it relatively safe by doubling the number of barriers/safeties it needs to overcome (more kill switches, more indirection/isolation etc etc). Also, on their view any AI breaching the FOOM limit is going to certainly be smart enough to realize it can foom and that it's more likely to succeed if it waits just a bit longer (and since they see this as a very fast growing function their would presumably be only a very short period of time between reaching the level it has a 50% chance and the one where it has a 99.99% chance).
I agree with you that their arguments here aren't very compelling and that near misses would be likely to happen before any AI caused catastrophe. But if they saw AI as that limited (hey good chance it will do it's best and fail) I doubt they would have the view they do about foom and AI risk.
But in fact: https://twitter.com/ESYudkowsky/status/1649253100048494595
The law doesn't hold people to strict liability for unfamiliar actions but for actions deemed inherently dangerous. I agree that familiarity bias sometimes blurs the distinction but that's the legal rule and when it does treat unfamiliar risks worse that's downstream of a judgement they are particularly dangerous.
That's important since the rule you are talking about would essentially be a giant brake on progress since it would basically be a tax on technical innovation (first building built using prestressed concrete - strict liability). And ppl rarely like the idea of making penalties less certain in big corps that hurt them so I worry the strict liability would stick around).
And this raises a deeper point. You aren't insulating the treatment of AI from one risk assessment. You are essentially penalizing it relative to the usual way we treat new software algorithms etc because of some people's risk assessments.
If near misses as I've defined them are rare, then this new kind of liability suit will also be rare, and thus have only a modest effect on this industry.
Besides, the only way this makes any difference is if you define near miss in a contentious way. If a near miss means something that the AI builder would agree was strong evidence that AI development is likely to kill us all than it accomplishes nothing since at the point they are convinced AI is very likely to kill us all they are also convinced the cost of further AI research is far worse than any liability.
So this only really changes incentives if it imposes liability on events that some people see as not being evidence of AI x-risk (eg maybe they don't think self-improvement is a serious danger sign).
It almost feels like what you want is more along the lines of a policy bet. The high and low risk groups get together now and agree to certain triggers for more restrictions that the low risk group thinks won't happen but high risk does.
I assumed this was a different proposal on top of the near misses (or near miss could be non-rarw warning sign), i.e. we also impose strict liability on AI when it hurts people in mundane failures (the AI decides you are a terrorist threat wrongly or the AI driver runs you over ...if the owner wanted to go the other way it was fighting for control, and plenty of other mundane ways to push up D).
If not I retract this objection but in that case the liability will be irrelevant since every company knows it has bankruptcy as a floor and if you are just talking about near misses of world destroying events they are likely to miss bc they are stopped before someone is hurt or because the bioweapon only killed half the population. Either case seems unlikely to result in nonzero but non-bancrupting damages.
But what would a "near miss" look like, specifically, so that it's an actionable threshold for liability? I like the liability idea as a flexible and general response, but I'd like to know more about implementation.
Any event where A causes a hurt to B that A had a duty to avoid, the hurt is mediated by an AI, and one of those eight factors I list was present.