We humans have brains that guide our behavior, inserting complex “signal-processing” between sensory input from our eyes, ears, etc., and output to control our hands, mouth, etc. We physicists (& chemists, biologists, & neuroscientists) feel confident that we have a pretty complete understanding of the low level physical processes involved, which are quite ordinary; any exotic effects can have only minor stochastic influences on brain outcomes. Furthermore, like most designed signal-processing systems (e.g., TVs, phones, watches), our brains are designed to be robust to fluctuations in low level details.
Introspectively, we see ourselves as having vivid feelings related to our brain processes; we feel strongly about what we see, touch, hope for, and plan. And many scholars believe strongly that they can imagine the counterfactual of a brain (a “philosophical zombie”) undergoing exactly the same physical processes and resulting outcomes, including that brain saying that it has particular feelings, without that brain actually having such associated feelings.
Furthermore, most people see all non-animal physical processes, including all AIs made so far that mimic humans expressions of feeling, as involving zero such actual internal feelings. These scholars thus see the facts that humans have such feelings as key extra “non-physical” facts about our universe in need of explanation. And in fact, this is the main evidence offered for the claim that our universe is more than physical.
Note that the type and content of our feelings are exactly those computed by our brain processes; the only extra thing here might be the existence of, not the content of, such non-physical feelings. Also, note that the completeness of our understanding of the physics of brain processes means that such extra non-physical facts couldn’t actually be the local cause for our claiming to have such more-than-physical feelings. Apparently, natural selection would have inclined us to make such claims even if they were not true. But that doesn’t imply we are wrong. (Though it does suggest that.)
In some recent polls, I found that most of us don’t think that AGIs, i.e., AIs better than humans at most tasks, would have such feelings, even AIs better at most emotion tasks, or world class at making culture. We also don’t think AIs that could imitate Einstein or MLK very well would have feelings. But most think that a cell-by-cell emulation of a particular human brain would have real feelings. And most have seen a movie or TV depiction of a robot or android where they think "If a creature acted like that around me, I'd think that it really actually feels the emotions it expresses”.
Now the fact that non-physical feelings don’t cause physical actions also implies that we never get any physical empirical data on which physical things in our universe have what associated feelings when. So we must instead rely either on case-specific intuitions or theoretical arguments. For example, if we believe that human feeling reports are usually correct then we can use that to infer what humans feel when. And we often guess that when animals similar to us have behaviors similar to us, they likely also have similar feelings.
However, we are reluctant to extend these approaches to artificial devices. So we face the hard but important question: which artificial devices or alien creatures feel what when? As most of us put far more moral weight on creatures who actually have feelings, compared to those that just mimic feelings, in a world full of artificial creatures, it will matter greatly to which creatures we attribute real feelings.
This is a reason to want a lot more research into this topic. And in a recent poll I found that the median respondent wanted to increase funding from today’s ~$100M/yr level by a factor of 18 to a ~$1.8B/yr level. Of course if we want research progress to result from this, as opposed to the usual academic affiliation with credentialed impressiveness, we should use progress-effective funding methods like prizes.
One theoretical approach is to seek as simple as possible a meta-law or rule by which the universe might decide which physical things feel what when, consistent with the constraint that humans always feel exactly what their brains compute them to feel.
For example, maybe: all devices and creatures actually feel whatever their brains compute them to feel. To make this a clear rule, we’d need a way to objectively identify brains in the physical world and which of their internal states are their “feelings.” But it is okay if a brain’s computations aren’t clear on what exactly are its feelings; humans have unclear feelings all the time.
This approach will be less hard than it seems if Nick Chater is right that The Mind is Flat. This also suggests that LLMs today are actually feeling the emotions they express.
Alas, the fact that most people seem convinced that some fictional robot or android looked like it had real feelings suggests that in the absence of a widely accepted theoretical rule, most humans are likely to go with our intuitions here. And as AIs will likely get very good at acting like they have feelings, humans will probably attribute feelings to the AIs that they like and want to respect, while seeing those they dislike and want to disrespect as lacking feelings. The fact that humans have often been able to see the humans that they fight or enslave as subhuman suggests we have a great capacity to disrespect those we want to mistreat.
(Note: there is a vast literature on related topics, part of which is summarized here.)
Glad that you are thinking about this topic; it's definitely important and neglected.
Regarding some of those poll questions, the reason I think animals probably have feelings and LLM's probably don't is because animals' brains have similar architectures to humans', and humans have feelings. The fact that a transformer can write about feelings doesn't mean that it actually has feelings.
An example: A sentiment analysis algorithm can sort words into buckets like "happy" or "sad", but that doesn't mean it feels happy or sad. If tasked with sorting words into buckets like "ancient" and "futuristic", I wouldn't think that it felt anything about those buckets. And sorting words into buckets for emotional valence is functionally the same as sorting words into buckets for topic or adjective.
Another example: If a graphics program can animate an emoji face between a smiling and frowning face, that doesn't mean the program feels happy or sad.
All of that said, I wouldn't be surprised if LLM's had some sort of subjective inner experience while they were running inference. I just don't think the fact that they can mimic humans' emotional signals/outputs means they have experiences like human emotions.
Hope to see more on this topic.
I’m surprised so many people are so confident that LLM’s don’t feel anything. Although my experience has been people translate “anything” to “exact human experience.”