The Data We Need

Almost all research into human behavior focuses on particular behaviors. (Yes, not extremely particular, but also not extremely general.) For example, an academic journal article might focus on professional licensing of dentists, incentive contracts for teachers, how Walmart changes small towns, whether diabetes patients take their medicine, how much we spend on xmas presents, or if there are fewer modern wars between democracies. Academics become experts in such particular areas.

After people have read many articles on many particular kinds of human behavior, they often express opinions about larger aggregates of human behavior. They say that government policy tends to favor the rich, that people would be happier with less government, that the young don’t listen enough to the old, that supply and demand is a good first approximation, that people are more selfish than they claim, or that most people do most things with an eye to signaling. Yes, people often express opinions on these broader subjects before they read many articles, and their opinions change suspiciously little as a result of reading many articles. But even so, if asked to justify their more general views academics usually point to a sampling of particular articles.

Much of my intellectual life in the last decade has been spent in the mode of collecting many specific results, and trying to fit them into larger simpler pictures of human behavior. So both I and the academics I’m describing above in essence present themselves as using these many results presented in academic papers about particular human behaviors as data to support their broader inferences about human behavior. But we do almost all of this informally, via our vague impressionistic memories of what has been the gist of the many articles we’ve read, and our intuitions about what more general claims seem how consistent with those particulars.

Of course there is nothing especially wrong with intuitively matching data and theory; it is what we humans evolved to do, and we wouldn’t be such a successful species if we couldn’t at least do it tolerably well sometimes. It takes time and effort to turn complex experiences into precise sharable data sets, and to turn our theoretical intuitions into precise testable formal theories. Such efforts aren’t always worth the bother.

But most of these academic papers on particular human behaviors do in fact pay the bother to substantially formalize their data, their theories, or both. And if it is worth the bother to do this for all of these particular behaviors, it is hard to see why it isn’t be worth the bother for the broader generalizations we make from them. Thus I propose: let’s create formal data sets where the data points are particular categories of human behavior.

To make my proposal clearer let’s for now restrict attention to explaining government regulatory policies. We could create a data set where the datums are particular kinds of products and services that governments now provide, subsidize, tax, advise, restrict, etc. For such datums we could start to collect features about them into a formal data set. Such features could say how long that sort of thing has been going on, how widely it is practiced around the world, how variable has been that practice over space and time, how familiar are ordinary people today with its details, what sort of justifications do people offer for it, what sort of emotional associations do people have with it, how much do we spend on it, and so on. We might also include anything we know about how such things correlate with age, gender, wealth, latitude, etc.

Generalizing to human behavior more broadly, we could collect a data set of particular behaviors, many of which seem puzzling at least to someone. I often post on this blog about puzzling behaviors. Each such category of behaviors could be one or more data points in this data set. And relevant features to code about those behaviors could be drawn from the features we tend to invoke when we try to explain those behaviors. Such as how common is that behavior, how much repeated experience do people have with it, how much do they get to see about the behavior of others, how strong are the emotional associations, how much would it make people look bad to admit to particular motives, and so on.

Now all this is of course much easier said than done. Is it a lot of work to look up various papers and summarize their key results as entries in this data set, or just to look at real world behaviors and put them into simple categories. It is also work to think carefully about how to usefully divide up the space of actions and features. First efforts will no doubt get it wrong in part, and have to be partially redone. But this is the sort of work that usually goes into all the academic papers on particular behaviors. Yes it is work, but if those particular efforts are worth the bother, then this should be as well.

As a first cut, I’d suggest just picking some more limited category, such as perhaps government regulations, collecting some plausible data points, making some guesses about what useful features might be, and then just doing a quick survey of some social scientists where they each fill in the data table with their best guesses for data point features. If you ask enough people, you can average out a lot of individual noise, and at least have a data set about what social scientists think are features of items in this area. With this you could start to do some exploratory data analysis, and start to think about what theories might well account for the patterns you see.

Now one obvious problem with my proposal is that while it looks time consuming and tedious, it isn’t obviously impressive. Researchers who specialize in particular areas will complain about your data entries related to their areas, and you won’t be able to satisfy them all. So you will end up with a chorus of critics saying your data is all wrong, and your efforts will look too low brow to cower them with your impressive tech. So I can see why this hasn’t been done much. Even so, I think this is the data set we need.

GD Star Rating
Tagged as: , ,
Trackback URL:
  • Grant

    I’m no social scientist, but in some cases couldn’t you design experiment templates which can be repeated in a variety of circumstances? For example, to test marginalism you might lower the price of many goods and services in many different markets and record the results.

    To test near-far theory, you might ask participants to plan a task X time in advance, requiring they perform the task as planned at the scheduled time. The experiment could be repeated for many different tasks, and record the success rates and plan divergence for different values of X.


  • Stephen Diamond

    But most of these academic papers on particular human behaviors do in fact pay the bother to substantially formalize their data, their theories, or both. And if it is worth the bother to do this for all of these particular behaviors, it is hard to see why it isn’t be worth the bother for the broader generalizations we make from them.

    You’ve argued formalization in the social sciences often isn’t worth the bother…

  • reviews and ratings

    It is true that small changes ultimately bring those big changes which show the focus of the countries and the hard work of their students.

  • David Condon

    “Almost all research into human behavior focuses on particular behaviors.”

    I take it you’re on the “psychology is the study of behavior; not the study of the mind” side of the fence. 🙂

    I’ve been thinking a lot about this sort of problem. We have this massive amount of data that’s being produced and stored on a seemingly endless variety of topics from academic publishing on the web, and we’re unable to translate it into something that’s easily conveyed and understood by the layman. And even experts will struggle to understand problems even slightly outside their field. The number of terms for nearly identical concepts is quite frustrating, for instance.

    I originally got to thinking about this when reading an article about the Stanford Encyclopedia of Philosophy, and thinking how impressive what they managed to accomplish was. I also read into the failures of Citizendium. I think, had I been in Larry Sanger’s shoes, I would have made the same mistakes.

    Organizing a data set isn’t a direction I had thought of however. I was thinking in terms of relying on existing data sets and simply determining the best ones as well as the best explanations of the results of those data sets.

    I agree about the necessity of having multiple layers of review going beyond the existing ones. I also think there is an upper limit on how much content can be created while still being reasonably well organized. So it’s important to think very carefully about a meta-strategy for determining what sort of content does and does not need to be added.

    Another argument out there is Nick Bostrom’s of how a superintelligence might be created by simply finding a way to organize the web to make better use of the collective intelligence of humanity. Such a superintelligence would, I think, be more likely to improve slowly rather than quickly, which would reduce the risk of a doomsday scenario.

    The big trillion-dollar question to me: is there a better method to find information on the web than a search engine? How could such a method be created?