I asked a set of polls, and 3 LLMs (ChatGPT,Claude,Gemini) to rate the relative explanatory power of the following 16 causes of cultural change for the two periods 1700-1900 and 1900-2025.
The problem with history is, can we really define "explanatory power" when N=1? People weave compelling stories about why lead plumbing caused the downfall of Rome, etc. but it always feels just-so to me. I wonder if it says more about the biases of the author (and their audience) than it does about Romans.
The randomness of LLM results may just be its peculiar way of saying, "I don't know." RLHF strongly penalizes the models for not giving answers.
I think 1700-1900 is too big a slice of time. At least divide it into Enlightenment and Industrial Revolution. 1789 would make a tolerable dividing line. I would more generally take some degrees of freedom out of the number of causes, and put them into finer time resolution.
It would be interesting to know what prompt you used, and whether you (implicitly or explicitly) gave guidance as to which geographical, socioeconomic, or ethnic group to focus on. Also I wonder whether the results would be the different if the question was asked in French or Arabic, for example. I don’t know if this says more about the biases in the question or in the models.
The problem with history is, can we really define "explanatory power" when N=1? People weave compelling stories about why lead plumbing caused the downfall of Rome, etc. but it always feels just-so to me. I wonder if it says more about the biases of the author (and their audience) than it does about Romans.
The randomness of LLM results may just be its peculiar way of saying, "I don't know." RLHF strongly penalizes the models for not giving answers.
Would be easier to parse if LLM scores and polls were on normalized scales.
I think 1700-1900 is too big a slice of time. At least divide it into Enlightenment and Industrial Revolution. 1789 would make a tolerable dividing line. I would more generally take some degrees of freedom out of the number of causes, and put them into finer time resolution.
I've forgotten what "I asked a set of polls" means. Are you buying online polls? How do you do that?
x
It would be interesting to know what prompt you used, and whether you (implicitly or explicitly) gave guidance as to which geographical, socioeconomic, or ethnic group to focus on. Also I wonder whether the results would be the different if the question was asked in French or Arabic, for example. I don’t know if this says more about the biases in the question or in the models.
I shared the LLM sessions in links in the post.