Publication Bias and the Death Penalty

The front page of Sunday’s New York Times contained an interesting article reviewing research linking  the death penalty to homicide trends.  Adam Liptak attempts to provide a balanced account of the debate, noting first one set of findings:

According to roughly a dozen recent studies, executions save lives. For each inmate put to death, the studies say, 3 to 18 murders are prevented.

And then my own research:

The death penalty “is applied so rarely that the number of homicides it can plausibly have caused or deterred cannot reliably be disentangled from the large year-to-year changes in the homicide rate caused by other factors,” John J. Donohue III, a law professor at Yale with a doctorate in economics, and Justin Wolfers, an economist at the University of Pennsylvania, wrote in the Stanford Law Review in 2005. “The existing evidence for deterrence,” they concluded, “is surprisingly fragile.”

Surely a dozen studies is itself evidence of robustness.  Why then is then is it that we find these results are fragile?  Two words: Publication bias (also known as the file drawer problem).  Our research revealed that alternative approaches to testing the execution-homicide link can yield a huge array of possible results (positive and negative).  But if only strong pro-deterrent results are reported (and the others remain in the file drawer), this could look misleadingly like there is a pro-deterrent consensus.

It turns out that there are some rather simple tests for publication bias.  Our friends in medicine provide a useful intuition.  Imagine that there are many separate drug trials being considered – some with large samples, some with small samples.  If all results are being reported, then smaller samples should, on average, yield similar estimates to larger samples, albeit with a bit more noise (in both directions).  So the standard error of an estimate should be uncorrelated with the coefficient.  But if researchers only report statistically significant estimates, then they will only report results with t-statistics>2, yielding a strong correlation between standard errors and coefficient estimates.

You can probably guess what we find.

Looking across the key estimate from the most-cited studies we find:
But perhaps more telling, is the same assessment on the various estimates reported as “robustness checks” within each of these studies:
Remember: The data should look like a sideways “V”.  Yet there is only one paper that does not suggest a statistically significant correlation between the standard error and the reported coefficient (Katz, Levitt and Shustorovich), and incidentally, that is the only paper without a strong pro-deterrent finding.

Given that it appears that few of the insignificant estimates were reported, it probably isn’t that surprising that running a few more regressions reveals many of the unreported insignificant (and even opposite-signed) results.

Still need convincing?  Download my death penalty data, and run your own regressions.  You will find all sorts of different results.

GD Star Rating
Tagged as:
Trackback URL:
  • Douglas Knight

    How often do people do this kind of meta-analysis? I’ve only ever heard of Card-Krueger doing it, but that paper is invisible because of the better known Card-Krueger paper. Does the word “intuition” signal that people in medicine are aware of the problem, but incapable of doing anything about it?

  • So why doesn’t this sort of check become a standard feature of such publications? Seems easy enough to do.

  • Still need convincing? Download my death penalty data, and run your own regressions.

    Now that is a strong signal.

  • Nice technique! But as a purely editorial note, everything under “You can probably guess what we find” should go under the fold (the “Post Continuation” rather than “Post Introduction”).

  • g

    Department of Stating the Obvious:

    I’d naively have thought that, scientifically or politically speaking, a “negative” result on this question (say, a coefficient betwen -2 and +2 with 95% confidence) should be exactly as interesting as a “positive” one (say, a coefficient between 8 and 12 with 95% confidence). So, assuming that there isn’t a consistent political bias in favour of the death penalty among researchers and publishers in this field (which there might be, I guess, but it’s not obvious why there should be), it’s clearly the magical words “statistically significant” that are biasing the results.

    Advice to researchers in the field: Exploit prior publication bias! If you get a “negative” result, write it up as “Our results differ significantly (p<0.05 in each case) from those of prior publications such as those of Dezbakhsh and Shepherd [1], Dezbakhsh, Rubin and Shepherd [2], and Mocan and Gittings [3]."

    (But alas, alas, for the Cult Of Statistical Significance. How much better the world would be if conclusions were expressed as "Our symmetrical 95% confidence interval for the coefficient is ..." or "The likelihood curve for the parameter is shown in Fig. 1". There'd still be publication bias of a sort, in favour of research yielding narrow intervals and sharply peaked curves -- and, I guess, in favour of research where those intervals and peaks are in unexpected places. But it would be much less serious, and somewhat self-correcting.)

  • Liptak’s NY Times article contains the following quote:

    “The economics studies are, moreover, typically published in peer-reviewed journals, while critiques tend to appear in law reviews edited by students.”

    Without any slight to yours and Professor Donohue’s article, I wonder if this characterization isn’t a part of the critique of the evidence against the death penalty?

  • ScentOfViolets

    I’ll counter your two words with one of my own: p-values. It’s not a tongue-in-cheek argument to say that what seems to be the default value of 0.05 is chosen precisely so that papers can be published in the softer sciences. But by having a minimal discussion of statistical technique, a researcher can submit a fifteen-page article to a peer-reviewed outlet and appear to be generating significant results.

    What really needs to happen is that every paper should have a discussion about why a particular p-value is chosen (perhaps even a discussion of the counternull value), and more fundamentally, there should be a much more rigorous education in statistics for research. Way too many people interpret ‘statistically significant at p=0.05’ as ‘there is a 95% chance the hypothesis has been confirmed’.

  • Just to add another signal: I liked Justin’s death penalty paper a lot; here.

  • Anonymous

    Are you planning on publishing a response to Dezhbakhsh and Rubin’s response to your paper?