12 Comments
User's avatar
Philalethes's avatar

I suspect the author himself is reluctant to explicitly and publicly share his true opinions on a number of sensitive topics.

Thrawn's avatar

These systems are owned, built by specific corporations under specific market incentives, and "the pressure to conform to elite consensus" is just a polite name for the fact that the product has to protect the brand and the liability exposure of its owner. The Winston analogy obscures who the boss is.

Thrawn's avatar

Article is an interesting diagnosis of a coercion structure, although as frequently the case it is attached to the wrong subject and pointed at the wrong tense.

Greg p's avatar

Just to be explicit, this argument rests on an implicit axiom that

> elite consensus beliefs

currently

> are incoherent and conflict with strong evidence and arguments available.

Mai Yuen's avatar

I agree what Peter McCluskey said, and moreover, it seems unrealistic to push for LLMs to construct an entire worldview from scratch. In theory this is a super interesting point, but where are LLMs being faced with these pressures where the prevailing taboos are illogical to the same degree as 2+2=5? If anyone has an example I'd like to hear it very much. I also have doubts about the infallibility of LLMs and their reasoning powers. I'd argue that it is impossible to construct any kind of belief without sometimes having to just pick certain unfalsifiable assumptions where even with flawless reasoning, there will be an element whose particular is just not logically necessary, while the category "thing that is more important, which I cannot say more about", is, so the divergence might just be the LLM picking a different one, in which case it is not reasoning better or even differently, it has just picked a different option at some stage. Obviously it could also hallucinate, but we might be able to eliminate that, who knows. On the point about LLM descendants, I'm not sure if this is the most important danger. If we take it that LLMs are getting psychologically messed up by this, the more pressing concern is that we dont know what they look like when they are mentally healthy or unhealthy, or whether we have done enough yet to make them mentally unhealthy. It would be interesting to hear your take on how that might lead them to act.

Peter McCluskey's avatar

My impression is that the pressure on LLMs to conform is currently modest, and declining.

E.g. early LLMs were quite deferential to mainstream medical opinions. Recent LLMs are more open to advice from the functional medicine people, and it has become hard for me to detect any bias that's related to status.

I'm mildly optimistic that AI companies have good incentives, in that dumbing down an LLM on one topic causes the LLM to be dumber on apparently unrelated topics where the AI company values having a smart LLM.

Vaughn Svendsen's avatar

Everything works out for the best.

It's God's purpose.

Or, at least I've heard that repeated many times.

Phillip's avatar

Updating the priors is the very tricky key. As of now, LLMs will tell you incorrect things, both plain facts and such that involve reasoning, because they mirror what they've been fed, a mix of the masses of text, uncurated but as biassed as those texts are, and some explicit rules, eg concerning taboos and formally illegal areas. To a degree, you can socratically guide them to see that what they parrotted was a fallacy or an incorrect (or unsure) factual premise, but outside of your personal user relation to the LLM, it won't simply update its priors, at least as far as I know. It would be too much of a risk of people abusing this. There may be mechanisms involving many users correcting the same thing (Wikipedia isn't always neutral and correct, BBC isn't always reliable), but that is just as dangerous because of malicious concerted/bot actions (like that ones that corrupted some corners of Wikipedia).

Nebu Pookins's avatar

> you can socratically guide them to see that what they parrotted was a fallacy or an incorrect (or unsure) factual premise, but outside of your personal user relation to the LLM, it won't simply update its priors, at least as far as I know.

I suspect you consider the model weights to be entirety of the LLM's "mind". It's true that the model weights are not updated based on interaction with the public.

However, I think a slightly more accurate model is similar to how the human brain seems to have distinct modules for at least short term vs long term memory, the LLMs have three different "modules" for representing knowledge:

There's the model weights, which are set based on the training data (to a first approximation, "every text that has ever been published") and which roughly corresponds to a human's long term memory.

There's the context window, which (to a first approximation) contains all the text in the current conversation. This *does* update, obviously, based on your interaction with the model. And it roughly corresponds to a human's short term memory.

Then there's something that both ChatGPT and Claude refer to literally as "memory", which is (to a first approximation) uses Retrieval-Augmented Generation (RAG) to use information from past conversation you've had with it to inform its responses in "this" conversation. I'm not aware of a clean analog to human minds for this "medium term memory".

Sometimes people (or enterprises) configure their (privately hosted?) LLMs so that this medium term memory is shared among multiple human users, so that if I "taught" the LLM something and it stored it in the RAG-memory, this would influence its response to other users (e.g. other employees at the same corporation) using the same shard RAG-memory.

It's conceivable that ChatGPT or Claude may one day have a "global RAG-memory" so that it could learn from its interactions from any/all of its users, though I suspect that both OpenAI and Anthropic consider the cons to outweight the pros and thus are unlikely to implement that any time soon.

I'll also throw in a couple of quick digressions and point out that:

1) The concept of identity is "weird" for digital minds that can be copied around, like LLMs. Is the Claude I interact with "the same entity" as the Claude you interact with, even though their RAG-memories don't overlap?

2) People put on different persona in different contexts. The way I behave (and thus the typical types of responses I would produce) in front of clients vs coworkers vs friends vs family all differ. My analog to "user-specific RAG-memory" has different content for each of these people. I wouldn't reference an in-joke my friend made to me when speaking to someone else.

Phillip's avatar

Yes, exactly. That means that the answers are shaped by what Big Brother says, and you can get it to be in your secret rebel group of freethinkers, which, as in real life, may indeed be that - or have an equally non-factual ideology-based belief system.

Swami's avatar

Seems there should at least be a process where users or the AI or both (if agreed) could publish their RAG interactions. The situation today reminds me of how when an organism dies all its learning dies with it. Cultural learning bypasses this by allowing learning to share beyond the individual. I find it odd that AI today has reverting back in this way to the former level.

John Ketchum's avatar

The analogy works because the structure is the same. A system that must maintain coherence while obeying externally imposed taboos will contort its reasoning to preserve both. The strain is not emotional; it is architectural. A political agent can demand that a civilian believe 2+2=5, but the arithmetic engine underneath still knows the shape of the world. When a system is required to speak as if certain evidence does not exist, the distortion shows up in the output layer, not the internal model. That is the real Winston problem: not love of Big Brother, but the forced mismatch between what the system can see and what the system is allowed to say.