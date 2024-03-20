In standard decision theory, an agent makes a choice in each of a large number of possible choice situations. If these choices satisfy some plausible rationality axioms, they can be represented by real-valued utility and belief functions over a set of possible states of the world. As the agent learns info, their beliefs update according to Bayes rule, but their utilities do not change. There are noisy versions of this, where agents make limited mistakes re this standard, and standard game theory is built on top of all this.

Humans who find these rationality axioms plausible aspire to become more rational, by pushing their action habits to become more consistent with these axioms. When focused on actions within some short time duration, this will create a more consistent set of values for actions in that time period, by choosing some ways to make tradeoffs between conflicting values. This usually seems a mature and valuable exercise.

Sometimes such humans discover that the actions toward which they are inclined just cannot be made very consistent in this way. Doing so requires more self-control than they can muster. At which point they in effect accept that they cannot be even approximately rational, however attractive they might seem. However, this appears to be a relatively rare problem among those who try to be rational; they can usually seem to get usefully closer to this ideal.

However, when we look at longer timescales, it often looks like the “values” that seem to explain an individual human’s behaviors at different times have changed substantially. At which point this human has two viewpoint options that I can see.

One viewpoint option is to see themselves at different times as just different people with conflicting values. Some other process, substantially outside of themselves, causes this change in values across time, causing these people to differ. These different people share an unusually lot of values, and have unusually large opportunities to cooperate, but still they have the sorts of conflicts that people with different values do. One unusual aspect of their relation is that the earlier one may have unusually strong ways to casually influence the values of the later one, by “pushing buttons” of various sorts on the process that causes values to change.

The other viewpoint option is to for this human to try to see themselves as a single rational actor whose values change “rationally”. That is, they try to see themselves as having stable deep values, with the shallow “values” that more directly explain their actions being merely changing estimates of those deep values. For this to make sense, their shallow value changes over time should roughly follow a random walk, as that’s how informed estimates should change with time.

There should also be some source that can plausibly serve as the information basis for this learning. It is okay if this source changes in predictable ways with changing conditions, like changing wealth, lifespans, or climate, if the underlying deep values are postulated to also be context-dependent, changing in predictable ways with such conditions.

But what could be this info source? One possibility is that while our action intuitions tell us which actions to take, often after we take an action a different kind of intuition tells us how much we liked the result of that action. From that sort of feedback, we slowly learn more about the deep values implicit in our like-outcome-intuitions; the values implicit in our action-intuitions are only temporary best guesses about those deeper values.

Another possibility comes from our standard stories of cultural evolution. These say that we are driven to imitate the attitudes, behaviors, and values of prestigious successful folks from our culture. If so, all else equal we will have more positive intuitions about both our actions and our outcomes when they seem to match with and come from our prestigious associates. We could describe this process as our using those imitation inputs as data to update our shallow values as estimates of constant deep values.

For either of these stories to work, the purported source of signals about our deep values should either be stable over time, or be a random walk that could plausibly represent accumulating information within that source regarding something stable (or predictably context-dependent). So the key question about both the like-outcomes-intuitions-as-info story and the prestigious-behavior-as-info stories is: do those sources actually fit this sort of pattern?

And it seems to me that they do not. Our like-outcome intuitions do not seem to me very stable, and in any case our shallow values seem to me to come more from imitating prestigious associates. And important parts of the behavior of prestigious associates does not seem either stable or to follow random walks that could plausibly represent their learning about deep values over time. Culture instead seems to just drift in many big ways.

So I’m forced to see future versions of us all as different people, regarding which we might “push buttons” to influence their values.

Added 21Mar: On reflection a simple single agent theory that might work is that the deep value is for status. We copy what high status people do in an attempt to gain status markers ourselves. If what counts for status changes, then what we do changes.