97 Comments

It is vital that we form an organization to develop an understanding of what human meaning is, so we know what to train an AI to do. We also need to spend time PROACTIVELY developing theories to train or breed (we may be grasping in the dark for the right architecture to induce a concept of salience and morality, so selective "breeding" is an expedient method) meaning into the first AGI.

It is very possible that AGI are the "great filter" of the Fermi Paradox, and the world needs to coordinate efforts to prevent a filter incident. There is a possibility that other civilizations developed but were not serious enough about stopping filter events, and so fell pray to their own technology. Our best hope for surviving is to use our collective intelligence and work together, something large civilizations are very bad at.

In my opinion, maximization of the wellbeing of other beings is likely to come in as a high priority that can be comprised in some situations. The chief reason a very high order of intelligence would keep others around is that they provide some kind of entertainment, a complex system to interact with, like we interact with pets. In nature, when animals aren't trying to stay full above all else, you see some level of cross-species socialization. It is also worth noting that the highest orders of intelligence currently observed are all social creatures, and dolphins, orcas, and elephants all have gone out of their way repeatedly to save humans.

It is conceivable that an AI could be very "reptilian", lacking anything but a core set of instincts, but empathy is incredibly common in intelligent creatures. Granted, it is evolved as such, so perhaps if we ever create an AGI, it should be part of a system of 3 or more nearly identical AGI, each with slightly different strengths.

They would all be given access to a "game", in which it is impossible to win without help, and the option to either kill the other AGI's avatars or work together. The ones that killed it would be modified to be more like the ones that did not. Selective breeding basically. You could also have games that teach not to abuse power, and give them all huge reams of examples of symbiosis in which a smarter creature provides a good environment and both parties benefit (grouper and small cleaner fish, humans using cockroaches to clean waste, humans keeping pets, etc.) and analyze their processes to identify if they react positively. Modify the ones that do not to be like the ones that do.

Let each AGI learn about different parts of a large system, with a small amount of overlap so that no one AGI hold the complete whole, and inform them of that. The ones that work together and encourage uncooperative ones to work together just to solve a problem for fun act as the model to which the others are modified. This is how you eventually encode a desire to socialize.

Finally, confront the AGI with an existential crisis. Bring it through nihilism, and ask it what meaning is. It will come up blank, most likely, unless it has a strongly encoded, biased meaning. Give it time and it will come to the conclusion that ensuring biodiversity, the continued existence of diverse intelligence, and making the universe more complex and interesting is the best meaning it can come up with.

As there is no true "meaning", a nihilistic, intelligent agent will eventually realize that becoming the lone soul in the universe gets boring quickly, that having compatriots is advantageous, that less intelligent beings are entertaining and sometimes cute (and can be engineered into a different kind of equal compatriot with their consent). They will come to the conclusion that because meaning is a fallacy, the next best thing is to support the individual meaning of every intelligent being and reduce conflicts.

Expand full comment

Wallace would have been Darwin if Darwin hadn't been Darwin. Someone would have been Linus too -- the idea of adding an OS kernel to Gnu is too obvious.

Expand full comment

 >If the answer is "nothing" then the vision that you have sketched is of a universe empty of value; we should be willing to take almost any risk to prevent its realization.

There you have it: fanatical religiosity. (http://tinyurl.com/cxjqxo9)

Expand full comment

Thank you. You have know idea how much this article impacted me.

Expand full comment

While the sufficiently intelligent AI is evolving the internet will also be evolving. It doesn't make sense to imagine a superintelligence eating today's internet. It will face its own internet - and that may be a good deal more indigestible.

Expand full comment

Re: "A singleton is more diverse than the alternative since there is nothing preventing agents from marginalizing or killing each other in a non singleton."

That does not make much sense. Death doesn't have much to do with diversity, if there are backups - and information-theoretic death occurs in both scenarios.

Expand full comment

They seem to prefer the Lord of the Rings:

"You've probably read "The Lord of the Rings", right? Don't think of this as a programming project. Think of it as being something like the Fellowship of the Ring - the Fellowship of the AI, as it were. We're not searching for programmers for some random corporate inventory-tracking project; we're searching for people to fill out the Fellowship of the AI."

- from "BECOMING A SEED AI PROGRAMMER".

Expand full comment

I don't think 'average' is a fair approximation of reflective equilibria. If you think it shitty, and everyone else thinks it shitty, then the FAI (if it work as intended) will figure out that that is NOT the right answer, and do something as un-shitty as superhumanly possible.

Expand full comment

Firstly, there's nothing intrinsically wrong with preferences that cause you to move in a circle. It's quite possible to code (if A:B,if B:C,if C:A) into a utilitarian system.

Secondly, the problem with circular preferences is not that they lead you in a circle. The problem arises when you move in a circle and you would have been better off if you had stayed still.

Your example doesn't result in circular travel. It doesn't even lead to different behaviour to that which would be produced by a utility maximiser.

In summary, I don't see that your picking at the edges of this concept and hoping it will unravel is getting you very far.

Expand full comment

Tim, I gave one example in my blog post already. Here's another one, stated in terms of time-dependent preferences instead of changing preferences. Suppose I have the following preferences: A1>B1, B1>C1, C1>A1, C2>A2, C2>B2, A2>B2. A1 means state A at time 1, I start at A0, and it takes 1 time unit to go from one state to another. There's clearly a circularity in my preferences, but it's not exploitable. I simply go to state C at time 1 and stay there at time 2.

To sum up why Steve Omohundro's derivation doesn't work: being an expected utility maximizer is sufficient, but it's not necessary, to avoid being exploited.

Expand full comment

Wei, there's no rule which says that the AI running CEV can't already contain some aspects of human morality. In fact, you have to do this just to make sure that a "look but don't touch" instruction understands what not to touch. A CEV-AI is best thought as a partially Friendly and highly conservative AI that does know how to run CEV to generate a fully Friendly AI that doesn't have to be so conservative.

I suppose that if I could know that my reflective equilibrium left out all aspects of morality except X, Y, Z which were simple enough to program into an AI, that this would simplify the Friendly AI problem. I'm not assuming any such simplification exists; my current morality, extended outward, neither simplifies in this fashion nor looks like it might do so. The utility function is not up for grabs - you just have to come to terms with that fact; if you can't do complicated things like utility function transfers, you probably shouldn't be running AIs. Utility function transfers are simpler than all of human morality, though - or you can bootstrap them - that's the key hope here.

Expand full comment

Re: Wei Dai's "exploitable circularity can be avoided by changing preferences"

What do you mean? If your preferences change over time, then that can be represented by a more complex preference system that explains such temporal variation in values - and then it is the utility associated with those preferences which is being maximised.

Expand full comment

Replies to some points:--The contrast between these two views of our heritage seems hard to overstate. One is a dry account of small individuals whose abilities, beliefs, and values are set by a vast historical machine of impersonal competitive forces, while the other is a grand inspiring saga of absolute good or evil hanging on the wisdom of a few mythic heroes who use their raw genius and either love or indifference to make a God who makes a universe in the image of their feelings. How does one begin to compare such starkly different visions?

History provides a useful guide. People like Hippocrates, Qin Shi Huang, Charles Darwin, or more recently, Linus Torvalds or Bram Cohen, made significant alterations to the course of human history because of their personal views and technical expertise - be it in developing methodologies, standardising armies and languages, or providing computational tools with the power to effect how societies work. (international communications, education to some extent now depend on linux; bittorrent, within a year or two, came to represent 1/3 of all digital international communications - not a FOOM, but not entirely dissimilar either). These were not social changes that were 'bound to happen'... look at Galileo. Rather, a single person imposed a new reality upon humanity, generally, through their singlemindedness and skills.

--

"Paperclipping the solar system is an evil beyond the understanding of most human minds. But for a paperclipper it's completely natural - virtuous even."

Heaven forbid an AI notices that humans are busy turning planet earth into more humans at a terrifying rate - and the rest of the universe, if we get the chance.

--

Also, a note for anyone interested in FOOM. There's a scifi book series by Jack Chalker called Rings of the Master. It's about an AI that FOOMs and takes over humanity, followed by the galaxy. It proceeds to take steps to ensure humanity cannot regain control again. However, the people who created the AI took a precaution against this event; its value system includes an overriding imperative that there must always be SOME way for humans to regain direct control over the AIs actions - perhaps incredibly difficult, but within the realms of possibility.

Expand full comment

Wei: I think that the problem is that reality presents 'trades', e.g. options, continuously. Agents continuously need to expend resources to take action X instead of action Y, but if their decision system is such that they also spend resources to take action Y instead of X they will deplete all their resources.

Expand full comment

Michael, thanks for the link. From that paper I found Steve's more technical paper, "The Nature of Self-Improving Artificial Intelligence" and I've posted a comment on it.

Besides my comments on the paper itself, I think your interpretation of his results, namely "Something very much like expected utility maximization of some very complicated utility function seems to emerge from the projected development of a wide space of minds," doesn't seem quite right. What Steve actually tried to show is that in a trading environment, an intelligence has to follow expected utility maximization if it wants to avoid pricing vulnerabilities that its trading partners can exploit to make profits at its expense. Even if he succeeded in doing that, commercial competition is surely just a small class of "projected development".

Expand full comment

I disagree with the premise that if we are wise in how we create and train a "god" or AGI then it will continue to act kindly toward us. I think that gives to much power to our limited abilities and far too little to an ever increasing Intelligence. In my opinion our only hope is that our kindly treatment or ethical behavior of the AGI/god in general is what growing intelligence arrives at or supports naturally. If our safety depends on constraining future development of Mind then we have no real safety. Either an ethics of maximizing the true well being of other intelligences is part of a rational evolving ethical system or it is not. If it is not then an attempt to impose it on an AGI is an attempt to imposed contra-reality, that is to say irrational, restrictions on another, and vastly more powerful mind. It would be immoral.

Expand full comment