This Is My Dataset. There Are Many Datasets Like It, but This One Is Mine. . .

Having read a huge number of studies on "happiness research" over the past year or so, I have concluded that the data is not very good and tells us little about happiness as most of us intuitively understand it. In fact, some of the problems with the data seem so damning, and so daunting, that it has become a matter of some surprise to me that more researchers don’t see the alleged problems as damning or daunting at all, and just proceed pretty much as usual. 

Now, maybe my analysis of the difficulties in measuring happiness with surveys (which I would be happy to share at some other time) is wrong. But even if I and other critics of the data are wrong, it appears that many of the best criticisms aren’t taken very seriously, even when they are duly noted. Indeed, I’ve noticed a tendency to bristle defensively at mention of problems with the data, or even at requests simply to be more precise in what it is that is being measured. "Don’t tell us we’re only really measuring dispositions to say certain things about happiness under various conditions! We don’t call it the Journal of Saying Things About Happiness Studies, now do we!" seems to be a fairly widespread attitude.  And there also seems to be a willingness to cite just about anything that superficially seems to support the validity of the measurement instrument — a sign of a kind of confirmation bias.

Now this is just my cumulative impression from reading a boatload of papers, and I’m not prepared to press this any further, or more specifically, with respect to happiness research, which isn’t the point of this post, anyway. The general question I want to raise concerns the the possible biases of social scientists when it comes to the quality of sets of data they have come to depend upon.

Here’s a plausible fictional narrative on a topic other than happiness. Let’s do it in the second person:


You take a grad course on some aspect of income inequality in which you are introduced to a certain data set with information about household income. You write a paper using this data, get a good grade, and are invited by your professor to co-author something in the same vein. You agree, you’re paper is published in a good journal, you develop a reputation as an expert on some corner of the inequality literature, and you are offered a decent job. You publish a few more decent journal articles and have high hopes for tenure. Now, suppose someone comes along and argues that this particular survey of household income upon which you have been relying is shot through with problems, implying that everything that you have developed a reputation for having demonstrated may simply be gibberish.

What do you do?

(a) Sigh, open-mindedly dig into the claims about the data, and if they are right, reassess everything you have done?
(b) Latch on to any bit of reasoning that confirms the reliability of the data and dismiss the criticism?
(c) Fight dirty and attack the motives, credentials, etc. of the critic with anything you can lay your hands on?

My bet is that most human beings — even scientists! — will go for some combination (b) and (c). It is probably an inevitability for humans who have written a moralizing book using their potentially debunked data source. Now, this may in fact be a necessary part of "normal science," since most researchers would go crazy if they didn’t mostly ignore and/or dismiss manifestations of the fact of the underdetemination of theory by data — especially when it comes to the auxiliary hypotheses upon which their day-to-day work depends implicitly.

My worry is that whole fields of inquiry can get stuck in bad path-dependent channels due simply to a practically sensible but epistemically irrational disposition to affirm the reliability of one’s data sources. It seems that a poor-quality, but widely accepted body of data could impede the progress of human knowledge by decades!

I’m a newcomer around here, so maybe this has been discussed at length. If so, sorry! But I wanted to raise the issue, and ask if others have thoughts about it, or if there are any good studies that address it.    

GD Star Rating
loading...
Tagged as:
Trackback URL:
  • http://profile.typekey.com/robinhanson/ Robin Hanson

    Plausibly, anyone who has taken any position on anything will be less willing to question that position, all else equal. We might thus lower our confidence in everything we read, believing each paper a little less. But beyond that I don’t see how to use this to guess which sides of which disputes are most likely to be in error.

  • http://profile.typekey.com/willwilkinson/ Will Wilkinson

    Sure. I guess I’m trying to think about how to keep all else from being equal. I’m thinking more of the problem for the professional producer rather than that of the casual consumer of research. Scientific training and culture does and ought to embody debiasing norms. So how can scientists do better? It seems likely that if scientists are trained to recognize that they will have incentives to become attached to their data sources, then they may be more careful about choosing them, and do more to stay at a cool psychological distance. Are there good tricks for managing this? Can the scientific and academic community structure incentives so as not to punish people who abandon a whole line of research, thereby making them more willing to concede problems in their data? Should there be big cash prizes for the best data debunkers/reinterpreters? How to align practical and epistemic incentives, etc. Anyway, I was trying to say something not completely banal.

  • http://profile.typekey.com/robinhanson/ Robin Hanson

    One obvious strategy is to never write on the same subject twice, but that has large costs that are just as obvious. A cheaper fix is for readers to trust articles a bit less than they otherwise would when the author has already written on the same subject.

  • http://profile.typekey.com/halfinney/ Hal Finney

    The idea I raised the other day is that perhaps it is socially better for scientists individually not to act as pure truth-seekers. We might do better to have vigorous advocates for each side put forth the best arguments they can find for their position. In effect, the ideas themselves are fighting it out for survival, with the people involved being mere proxies and pawns for the concepts which drive the debate.

    My concern is that if everyone aims to be reasonable and objective, they’ll end up going along with the majority opinion most of the time, since how could so many people be wrong? Then we won’t see the vigorous exploration of alternatives which is such an important part of scientific debate. (And the same goes for other fields as well.)

  • http://profile.typekey.com/robinhanson/ Robin Hanson

    Hal, you issue has come up many times, and will no doubt come up many more times. The question is why one should need to bias one beliefs in order to explore a wide range of possibilities. Can’t the hunters of easter eggs coordinate to spread out over the field without each *believing* that his spot is where the egg will surely be found?

  • http://profile.typekey.com/sentience/ Eliezer Yudkowsky

    Robin, that would work if not for overconfidence. If you believe yourself to be an exceptionally good egghunter, you’ll hunt the maximum-probability-density square even if many others are already hunting there. Of course, the same overconfidence seems associated with an odd tendency to hunt unoccupied squares. But these two biases don’t cancel out – they don’t result in the same globally optimal hunting pattern that we would get from the Nash equilibrium.

  • http://profile.typekey.com/willwilkinson/ Will Wilkinson

    Maybe because my housemates are lawyers, but I kind of like the idea of scientists as attorneys for ideas. But what would this institution look like? Who is paying to back the ideas in trial? Who is the judge? How is the judge appointed? What are his/her incentives? What are the rules constraining how the judge makes decisions, and who enforces them? Are there appeals?

    I can actually see this working in certain special cases. Suppose a truth-seeking philanthropist puts up a 100 million dollars to attempt to decide a controversial issue. He creates what he takes to be an impartial expert commission, which appoints what it determines to be the most knowledgeable advocate on each side of the issue. Each advocate is given a budget (say 25 of the 100 mil) and creates the best team he can get, who then proceed to work on the topic at hand. After a period of time elapses (maybe there can be some mechanism where each side signals it is ready for trial). The expert panel is reconvened (or a different one selected by arbitration between advocates), who hear the evidence. They either make a verdict or call for new experiments/studies in a new round. Ideally, one or both sides, upon seeing what the other side has come up with, simply concedes. The remainder of the cash is split by the winning team, or if it was discovered both side were on the wrong track, among both teams.

    This would be especially neat if was made a huge spectacle and the public became involved. And it would be especially interesting if the panel’s decision was considered controversial by the scientific community, and spawned a large body of research that sought to show that it was mistaken. I would love to see something like this on the likely effects of global warming, the existence of general intelligence, and various other controversial issues.

  • http://cob.jmu.edu/rosserjb Barkley Rosser

    As someone who sees more of these papers than probably any readers of this blog, I shall repeat what I have said elsewhere. There is a pretty straightforward ranking of the reliability of studies based on this data (of course the data itself is of varying quality). Most serious are panel studies that track specific people and compare them to themselves. Next in reliability are studies within a nation. Least reliable are those doing cross-section studies across nations. I think the reasons for why this ranking makes sense are obvious.

  • http://profile.typekey.com/halfinney/ Hal Finney

    Robin, I see a potential coordination problem in getting reasonable people to vigorously explore the whole state space, devoting appropriate effort to each possibility based on its risk and reward, like your easter egg hunters figuring out who will go where. This would seem to require a degree of centralized planning and widespread communication. Whereas if each person is individually motivated by the hope of fame and fortune, he can make his decision using more localized information, finding an area that looks promising and does not have too much competition. Consider the analogy between centralized economic planning and a price system. With prices the amount of information that has to be disbursed is much lower and decision making can rely more on local information.

    Now of course, reasonable scientists/searchers can always adopt the strategies of unreasonable, overconfident fame-seekers, if those were proven more efficient. (A centralized economy can always give the order from on high to use locally-set prices.) But once they do that, wouldn’t things look pretty much like what we see today?

  • http://profile.typekey.com/willwilkinson/ Will Wilkinson

    Barkley, I’m glad to hear it, since I say exactly that in my forthcoming Cato paper. And I guess it should be pretty obvious, though it took me a little while for me to get my philosopher’s not-so-empirically-trained head around it, since I’d never grappled in depth with these kinds of studies before. Of course, since good longitudinal studies take so long, and are so expensive, there are hardly any of them. But lots and lots and lots of studies comparing national snapshots. Will Denmark hold on to the top spot this year?!

  • conchis

    Robin: Possibly because, human motivations being what they are, we are more inclined to search harder for the egg if we believe that we do have the best spot? (On the other hand, maybe this isn’t so much a function of “human motivations” as the fact that the incentives tend to be geared to reward individual rather than collective success.)

    Will: I wonder how much there is actually an issue about people becoming attached to their data sources and how much the problems you point to are really driven by deeper political or value biases. I’ve often found that those who have worked extensively with a particular data set are actually those who are most critical of it (because actually working with data inevitably exposes you to flaws that casual consumers of it tend to miss). In the happiness data case, I suspect that what’s really driving the sort of complacency you’re worried about is a bias towards either a particular welfarist value system or a particular political ideology. On the other hand, such biases seem equally evident on the other side of the debate: there are a lot of people who have heavily invested in a particular model of analysis and/or a political ideology that are threatened by this sort of thing, who are very concerned to uphold those commitments, and who consequently tend to think that some actually pretty weak criticisms of the work are much more important than they really are.

    In short, I’m not sure you’re example supports your case, and have (admittedly limited) anecdotal evidence supporting a contrary proposition. As such, I wonder what might constitute more convincing evidence for the data-source bias you’re suggesting, and whether there’s any such evidence out there. I guess that relatively low cost attempts at debiasing can’t hurt, even in the absence of such evidence, but perhaps there’s more bang for our buck to be had elsewhere.

  • http://www.ProductivityShock.com Jason Briggeman

    The sort of objection exemplified by the quote “Don’t tell us we’re only really measuring dispositions to say certain things about happiness under various conditions!” is at least partially about the divide between theory and history, on top of the concern about whether the underlying data set is “good” or not.

    Any data set is historical, i.e., yes, it can only tell you about dispositions toward happiness (or saying things about happiness) under various conditions, specifically, the conditions that prevailed when and where (and how) the data set was produced. If a scientist wants to say more, he has to defend the presumption that certain findings in the data will remain true at future times, such defense as must necessarily (even if not explicitly) call on concepts/arguments/data that arise externally to strict analysis of the immediate data set. The sorts of scientists who don’t see the need to make such a defense explicit in their work, then, base (relative to scientists who readily acknowledge the need for those arguments) a larger share of their reputation on the data set itself; in other words, 100% of their explicitly-made claims come out of the data, vs. less than 100% for those who acknowledge the need for extra arguments.

  • Sean

    Haha, love the post title. Full Metal Jacket is a great movie.

  • http://www.jamesdmiller.blogspot.com/ James D. Miller

    Can the proposition that “happiness surveys measure happiness” be falsified?

  • http://www.hedweb.com/bgcharlton Bruce G Charlton

    I have tried to use happiness questionnaires in psychological studies but never got anything out of them – by contrast a depression scale, such as the Beck depression inventory, quite easily yields useful stuff that can be tested further.

    The question of testing is obviously the key. Happiness studes would get sorted out if, but only if, people were using the results for practical purposes – making predictions, making interventions, seeing whether interventions conformed to predictions etc.

    But insofar as happiness studies are merely used as ammunition of socio-political agendas (eg. assertions about whether equality, status, wealth, democracy or the welfare state makes people happy) then they will never get sorted out.

  • http://cob.jmu.edu/rosserjb Barkley Rosser

    Will,

    Not going to make any national forecasts, but Scandinavia provides a good example about being careful regarding these nation-based cross-sections. So, most of the Scandinavian countries are near the top, but then they also have pretty high suicide rates.

    My own theory on them is that there is this strong tendency to conformity and to act like you are happy because you are supposed to be. “Ja, ja, I am happy, now please don’t bother me while I go home and blow my brains out.” Other cultures are notoriously whiney and kvetchy. Are they really unhappy because their members complain all the time?

    Another aspect of this “happiness” versus “satisfaction,” and is happiness being measured at a moment in time or more generally? So, more careful panel studies of moment-to-moment happiness by Kahnemann et al come up with pretty interesting stuff from a sample of 500 women in Columbus, Ohio: they are happier with being with their friends than with their husbands, happier being with their husbands than their kids, happier being with their kids than being alone, and happier being alone than with their bosses. Also, “intimate relations” provide the greatest happiness, but commuting provides the least. So? Commute with one’s friends.

    Also, on this with cross-national, US women claim in general to be happier around their kids than French women, but when one monitors this on a moment-to-moment basis, the French women seem to be happier. Does this mean American womena are lying hypocrites?

    Also, economic status (again, usually relative within a society) seems to be very important for “satisfaction,” but is a big fat zero on moment-to-moment happiness.

  • eric

    We can understand people preferring their own data because they feel it is accurate, presumably that’s why they spent so much time theorizing on it. But a true scientist is trying to explain the world, and facts being stubborn, it’s essential to have them on your side. If the goal is to create theories that explain the world, that should be recognized as any truthful data, and to actively seek good data wherever it is. Look what happened to Bellesiles gun control data (not good).

    People think theory is hard and facts are easy, when in fact they are tightly connected. Facts are hard, because the interesting ones aren’t obvious (minimum wage effects, effect of tougher sentencing, the relation of risk to return), they take wisdom because you can see how probable a fact is not merely by the face-value in a particular paper’s abstract, but its compatibility with other facts given a certain view on the way the world works.

  • DED

    Will —

    Here’s an idea that I think could be helpful in curbing confirmatory bias in the social sciences (I don’t know if it’s novel, but maybe you can tell me if you’ve heard of it, or even if it sounds worthwhile): Have teams of researchers with different ex ante beliefs/motivations/political allegiances regarding a given phenomena collectively design experiments and models. I’m envisioning a sort of contract between intellectual “rivals” that stipulates some lack of bias in a given experimental design. If people on both sides of a dispute agree on a methodology and some sort of interpretive framework for the resulting data BEFORE seeing the results of that experiment, they might be harder pressed to cling onto their ideas as you mentioned in your post. To be sure, this won’t necessarily lead to answers or policy solutions that are better — after all, both sides of a dispute could be missing the mark — but it would seemingly make the debate more honest.

    what do you think?

    DED

  • http://vaindesires.blogspot.com Matthew

    Will, (as you may know from my past comments) I agree with your skepticism about what exactly happiness studies are studying. The happiness studies make assumptions about what it is they believe people to be reporting on (usually, the assumption is that these reports reflect a certain *feeling* or beliefs about one’s overall balance of hedonic satisfaction). Such views assume that happiness is the same as pleasure or satisfaction. I find these equations unconvincing. (I’ll argue that elsewhere, however.) The point, which you make, too, is that many of these studies might equally be called “satisfaction studies” – but saying that you study happiness is more glamorous than saying that you study “subjective well-being,” satisfaction, and the like, which may not be the same as the happiness we’re all chasing after (unless we’re hedonists).

  • david

    Or (d) they could go collect a lot more data and try again, and honestly see what happens with new better data.

  • http://felicifia.com Seth Baum

    On these happiness studies, it looks like both sides can be accused of this sort of bias. I’ve seen many economists in particlar dismiss the entire line of inquiry outright. For example, Tyler Cowen from the popular econ blog “Marginal Revolution” wrote “It’s a good thing I don’t believe in that nasty happiness research”. A quick glimpse at the data by country shows that the numbers are at least plausible: USA’s a 7.4; China’s a 6.3; India’s a 5.4; Zimbabwe’s a 3.3. Of course the data’s not perfect, but neither is GDP as a measure of national well-being, and that’s used much more frequently. But economics as we know it seems to have a methodological bias towards observing behavior, and these surveys threaten that, whereas GDP, for all its flaws, is derived from such behavior.

    This could easily just be a proxy for an ideological/cultural debate between those preferring continued general economic growth and those who are nervous about that, whether for environmental reasons, social justice reasons, or lifestyle “take back your time” reasons. I know I find myself tempted to endorse the survey research whenever I find myself on the nervous side.

    My background is as an engineering researcher (electromagnetics). We often dodge this problem because we can back our work up by fairly well-defined experiments, etc: either it works or it doesn’t. The social sciences doesn’t have this luxury so much. But we also have a culture where it’s more OK to be wrong, and where being unbiased is paramount. I think we should all be in the habbit of stating our own biases (as I did above) and in turn should commend others for doing so and scold them when they don’t. This won’t cure all ills, but I think it would help.

    As an example of this from politics, any time a politician is accused of “flip-flopping”, she should attack back and say “you’re darn right I changed my mind when I got a better understanding of the situation, and it’s a big problem that you don’t too!”

    …Off-topic: Felicifia is an online utilitarianism community, which anyone can join/write for. We enjoy Overcoming Bias (and overcoming bias).

  • http://neweconomist.blogs.com/new_economist/2007/03/unhappy_willy.html New Economist

    Unhappy about happiness research

    The always readable Cato Institute gadfly Will Wilkinson has not one, but two, long posts about the supposed evils of trying to measure happiness. I only provide brief excerpts – so read the whol thing. In the first, Effective Policy and the Measuremen…