Peer Review Is Random

Which academic articles get published in the more prestigious journals is a pretty random process. When referees review an academic paper, less than 20% of the variability in referee ratings is explained by a tendency to agree:

This paper presents the first meta-analysis for the inter-rater reliability (IRR) of journal peer reviews [using] … 70 reliability coefficients … from 48 studies. … [covering] 19,443 manuscripts; on average, each study had a sample size of 311 manuscripts (minimum: 28, maximum: 1983). … The more manuscripts that a study is based on, the smaller the reported IRR coefficients are. .. If the information of the rating system for reviewers was reported in a study, then this was associated with a smaller IRR coefficient. … An ICC of .23 indicates that only 23% of the variability in the reviewers’ rating of a manuscript could be explained by the agreement of reviewers. (more: HT Tyler)


The above is from their key figure, showing reliability estimates and confidence intervals for studies ordered by estimated reliability. The most accurate studies found the lowest reliabilities, clear evidence of a bias toward publishing studies that find high reliability. I recommend trusting only the most solid studies, which give the most pessimistic (<20%) estimates.

Seems a model would be useful here. Model the optimal number of referees per paper, given referee reliability, the value of identifying the best papers, and the relative cost of writing vs. refereeing a paper. Such a model could estimate losses from having many journals with separate referees evaluate the each article, vs. an integrated system.

GD Star Rating
Tagged as: , ,
Trackback URL:
  • DK

    No surprise at all. Remember that guy who twenty years ago hit the nail in the head on this subject: “Peer review is just another popularity contest, inducing familiar political games; savvy players criticize outsiders, praise insiders, follow the fashions insiders indicate, and avoid subjects between or outside the familiar subjects. It can take surprisingly long for outright lying by insiders to be exposed. There are too few incentives to correct for cognitive and social biases, such as wishful thinking, overconfidence, anchoring, and preferring people with a background similar to your own.” :-))

    Well said.

  • RJB

    As the end of the article points out, there isn’t a clear reason to prefer high or low inter-rater agreement. High agreement might be evidence of high-quality reviewing, or of a lack of diversity in selected reviewers.

    Most journals I work with provide two reviewers, each of which provide a recommendation (reject, minor revision, major revision, accept). Since about 85% of the first-round submissions are rejected, any meaningful measure of inter-rater agreement would need to consider a asymmetric loss function. Only papers with two positive reviews are likely to make it to the next round.

    Another observation that seems missing from the analysis. Authors self-select the journals they submit to. If they do so efficiently, targeting the highest quality journal they have a sufficiently good chance of getting published in, wouldn’t you expect low agreement between reviewers. After all, most of the papers they receive would be as plausibly accepted as rejected.

  • Tony

    There may be many criteria on which peer reviewers do agree, but which don’t show up in this study because authors already know those criteria and have satisfied them before the paper is even submitted.

    For example, most reviewers agree that a P-value of greater than 0.05 is not acceptable, so papers that don’t meet that standard don’t get written in the first place. This actually indicates that peer review works very well; it exerts its influence through the foreknowledge of review, not the review itself.

    Maybe it’s sort of like predicting stock prices – if most investors agree that a stock is underpriced, the price goes up immediately, erasing their agreement. All that remains is the residual disagreement, making it appear that they can’t agree on anything. Maybe this study points to a kind of EMH for scientific publication.

  • RJB

    @Tony, I also couldn’t help but think of the random walk in understanding the role of reviews.