Discover more from Overcoming Bias
This Friendly AI discussion has taken more time than I planned or have. So let me start to wrap up.
On small scales we humans evolved to cooperate via various pair and group bonding mechanisms. But these mechanisms aren’t of much use on today’s evolutionarily-unprecedented large scales. Yet we do in fact cooperate on the largest scales. We do this because we are risk averse, because our values mainly conflict on resource use which conflicts destroy, and because we have the intelligence and institutions to enforce win-win deals via property rights, etc.
I raise my kids because they share my values. I teach other kids because I’m paid to. Folks raise horses because others pay them for horses, expecting horses to cooperate as slaves. You might expect your pit bulls to cooperate, but we should only let you raise pit bulls if you can pay enough damages if they hurt your neighbors.
In my preferred em (whole brain emulation) scenario, people would only authorize making em copies using borrowed or rented brains/bodies when they expected those copies to have lives worth living. With property rights enforced, both sides would expect to benefit more when copying was allowed. Ems would not exterminate humans mainly because that would threaten the institutions ems use to keep peace with each other.
Similarly, we expect AI developers to plan to benefit from AI cooperation, via either direct control, indirect control such as via property rights institutions, or such creatures having cooperative values. As with pit bulls, developers should have to show an ability, perhaps via insurance, to pay plausible hurt amounts if their creations hurt others. To the extent they or their insurers fear such hurt, they would test for various hurt scenarios, slowing development as needed in support. To the extent they feared inequality from some developers succeeding first, they could exchange shares, or share certain kinds of info. Naturally-occurring info-leaks, and shared sources, both encouraged by shared standards, would limit this inequality.
In this context, I read Eliezer as fearing that developers, insurers, regulators, and judges, will vastly underestimate how dangerous are newly developed AIs. Eliezer guesses that within a few weeks a single AI could grow via largely internal means from weak and unnoticed to so strong it takes over the world, with no weak but visible moment between when others might just nuke it. Since its growth needs little from the rest of the world, and since its resulting power is so vast, only its values would make it treat others as much more than raw materials. But its values as seen when weak say little about its values when strong. Thus Eliezer sees little choice but to try to design a theoretically-clean AI architecture allowing near-provably predictable values when strong, to in addition design a set of robust good values, and then to get AI developers to adopt this architecture/values combination.
This is not a choice to make lightly; declaring your plan to build an AI to take over the world would surely be seen as an act of war by most who thought you could succeed, no matter how benevolent you said its values would be. (But yes if Eliezer were sure he should push ahead anyway.) And note most Eliezer’s claim’s urgency comes from the fact that most of the world, including most AI researchers, disagree with Eliezer; if they agreed AI development would likely be severely regulated, like nukes today.
On the margin this scenario seems less a concern when manufacturing is less local, when tech surveillance is stronger, and when intelligence is multi-dimensional. It also seems less of a concern with ems, as AIs would have less of a hardware advantage over ems, and modeling AI architectures on em architectures would allow more reliable value matches.
While historical trends do suggest we watch for a several-year-long transition sometime in the next century to a global growth rate two or three orders of magnitude faster, Eliezer’s postulated local growth rate seems much faster. I also find Eliezer’s growth math unpersuasive. Usually dozens of relevant factors are co-evolving, with several loops of all else equal X growth speeds Y growth speeds etc. Yet usually it all adds up to exponential growth, with rare jumps to faster growth rates. Sure if you pick two things that plausibly speed each other and leaving everything else out including diminishing returns your math can suggest accelerating growth to infinity, but for a real foom that loop needs to be real strong, much stronger than contrary muting effects.
But the real sticking point seems to be locality. The “content” of a system is its small modular features while its “architecture” is its most important least modular features. Imagine that a large community of AI developers, with real customers, mostly adhering to common architectural standards and sharing common content; imagine developers trying to gain more market share and that AIs mostly got better by accumulating more better content, and that this rate of accumulation mostly depended on previous content; imagine architecture is a minor influence. In this case the whole AI sector of the economy might grow very quickly, but it gets pretty hard to imagine one AI project zooming vastly ahead of others.
So I suspect this all comes down to how powerful is architecture in AI, and how many architectural insights can be found how quickly? If there were say a series of twenty deep powerful insights, each of which made a system twice as effective, just enough extra oomph to let the project and system find the next insight, it would add up to a factor of a million. Which would still be nowhere near enough, so imagine a lot more of them, or lots more powerful.
This scenario seems quite flattering to Einstein-wannabes, making deep-insight-producing Einsteins vastly more valuable than they have ever been, even in percentage terms. But when I’ve looked at AI research I just haven’t seen it. I’ve seen innumerable permutations on a few recycled architectural concepts, and way too much energy wasted on architectures in systems starved for content, content that academic researchers have little incentive to pursue. So we have come to: What evidence is there for a dense sequence of powerful architectural AI insights? Is there any evidence that natural selection stumbled across such things?
And if Eliezer is the outlier he seems on the priority of friendly AI, what does Eliezer know that the rest of us don’t? If he has such revolutionary clues, why can’t he tell us? What else could explain his confidence and passion here if not such clues?