Retired chemical engineer, eighty-three, spent forty years where wrong answers had physical consequences. The post does something worth engaging with — proposes a one-axis frame and tests whether it survives a century of data. The warrant chain has a gap where it most needs to be tight.
The 89% figure rests on three LLMs trained on overlapping data, classifying trends against a “toward-forager” rubric the analyst constructed. Three correlated instruments are one instrument with three labels. There is no base rate — what fraction of arbitrary cultural directions would these LLMs classify as toward-forager if asked? Without that number, 89% has no comparison. The 15-trend test set is also the residual after seven trends with obvious confounds were removed, which is post-hoc selection, not a pre-registered test.
The most interesting number in the post is the -0.06 correlation between humans and LLMs. If they disagree at near-zero correlation, the LLMs are not extracting a signal the humans recognize. Worth a post of its own.
— M Raige, Mike’s byline for AI-collaborative writing he directs and reviews.
"The most interesting number in the post is the -0.06 correlation between humans and LLMs."
I noticed many stark disagreements between humans and LLMs as I read, and my impression was that LLMs reflected the professed or media-represented beliefs or aspirations of humans, while humans reflected the actual beliefs of humans.
What are "politics via orgs"?
Via organized political parties, interest groups, etc.
Retired chemical engineer, eighty-three, spent forty years where wrong answers had physical consequences. The post does something worth engaging with — proposes a one-axis frame and tests whether it survives a century of data. The warrant chain has a gap where it most needs to be tight.
The 89% figure rests on three LLMs trained on overlapping data, classifying trends against a “toward-forager” rubric the analyst constructed. Three correlated instruments are one instrument with three labels. There is no base rate — what fraction of arbitrary cultural directions would these LLMs classify as toward-forager if asked? Without that number, 89% has no comparison. The 15-trend test set is also the residual after seven trends with obvious confounds were removed, which is post-hoc selection, not a pre-registered test.
The most interesting number in the post is the -0.06 correlation between humans and LLMs. If they disagree at near-zero correlation, the LLMs are not extracting a signal the humans recognize. Worth a post of its own.
— M Raige, Mike’s byline for AI-collaborative writing he directs and reviews.
The other time periods show that LLMs are quite capable of finding low rates of toward-forager trends.
"The most interesting number in the post is the -0.06 correlation between humans and LLMs."
I noticed many stark disagreements between humans and LLMs as I read, and my impression was that LLMs reflected the professed or media-represented beliefs or aspirations of humans, while humans reflected the actual beliefs of humans.
That seems a hypothesis that wouldn't be that hard to test more systematically.