An April 2006 JAMA reported big effects of switching to anonymous peer review for presenting at the AHA annual meeting. Science News reported: The survey focused on some 67,000 research abstracts submitted to the American Heart Association (AHA) between 2000 and 2004. … Beginning in 2002, AHA changed its review process so that authors’ names and affiliations were stripped from abstracts before they were sent out for peer review. … For instance, during 2000 and 2001, abstracts from U.S. authors were 80 percent more likely to be accepted … After blinding, the U.S.-based papers were only 41 percent more likely to be accepted, … Similarly, the share of abstracts from faculty at highly regarded U.S. research universities dropped by about 20 percent, after blinding. For authors in government agencies, the acceptance rate fell by 30 percent.
Norman, I think the main point is that the paper doesn't offer much evidence one way or the other on the main claim of interest. The likelihood of the data observed seems about the same if author info biases evaluations, and if it improves evaluations. So updating our priors on this evidence just gives us our priors back.
Let's get away from what the paper itself said, and whether Robin was or was not criticizing, and as the underlying question. Robin has proposed a definition of paper quality as resource compensated posterity review. I accept this definition. If our goal is to maximize paper quality, so defined, should referees use the authors' names? I take it that we all accept that the JAMA paper shows that using the authors' names affects acceptance and the dispute is whether the effect increases or decreases quality.
The main objection raised by Robin and Douglas is that the JAMA paper hasn't shown that the effect decreases quality. I don't see why they should have to. As I understand it, there is no data currently which established paper quality as defined by the posterity review standard. This means that one's conclusion will turn on one's prior / null hypothesis / presumption. If we start with the presumption that using the author's name does NOT decrease quality, then it is true, as Robin and Douglas point out, that the JAMA study does not prove the contrary. If we start with the presumption that using the author's name DOES decrease quality, then the JAMA study establishing that there is an effect, and if we combine this with our presumption that the effect is negative, then we can conclude that authors' name should not be used.
Why should our presumption be that using the authors' name reduces quality? I have given theoretical reasons to thing that (1) using the authors' name may reduce reduce quality, and that (2) blind reviewing is likely to be well correlated with quality as defined posterity review. No one has explained why my theoretical argument is flawed on either point. Unless someone can undermine my argument, or at least propose a counter-argument as to why using the authors' name is likely to improve quality, I think that it follows that the correct null hypothesis is that the effect of using the authors' name, if any, is to reduce quality. The JAMA study shows that it does have an effect, thereby eliminating the need for the 'if any' caveat. I think my null hypothesis is also the common sense null hypothesis, which is why the JAMA study apparently didn't make it clear.
If Robin Hanson won't criticize the survey, I will.The survey, in and of itself, is fine. The problem is that the paper asserts the speculative claim that blinding reduces bias. (It is silly to assume that they mean the same by "bias" as I do, but their first sentence is that referees *should* not use the authors' names. This is an assumption of the paper, without reference to the data.)
Norman, I was not criticizing the JAMA survey. I was just pointing out that it does not by itself offer much evidential support for the view that non-anonymous reviews are biased by author and affiliation. Yes affiliation could not be biasing reviews if it were not influencing reviews, but few had much doubts that affiliations influence reviews. Studies of citation counts could correct for the resource of the journal venue, so posterity review isn't the only imaginable alternative.
I agree that resource compensated posterity review would be a good indicator / definition of quality. Has any such posterity review actually been carried out? If not, it strikes me as unfair to criticize the JAMA survey for not having checked their conclusions against it.
I would think a resource compensated posterity review would be extremely difficult to carry out because it requires assessment of a counter-factual, namely, what would have happened has this article been published elsewhere. (A posterity review which assessed actual impact would not be nearly as difficult, though it would still be difficult.) I suppose one could try to assign a weighting value to compensate for publication venue, but it seems to me that would require constructing the hypothetical impact for at least some articles, which gets us back to the counter-factual problem.
I think editors of a journal are likely to be attempting to maximize actual impact, rather than resource compensated impact. But blind reviewers are could reasonably be modeled as attempting to maximize the quality as measured by a future resource compensated posterity review. Even though the editor might like the referee to maximize actual impact, a blind referee doesn't have enough information do so, so as a second best they might try to select the articles that would have maximum impact if they were published in the journal in question -- which is to say, resource compensated posterity review.
So, unless an actual resource compensated posterity review has been carried out that has adequately addressed the problem of estimating counter-factual impact, I think that blind reviews are our best present estimate of future resource compensated posterity review.
Norman, yes an article in Nature would be more influential, but such a venue would be a "resource" given to the paper that posterity review should take into account when evaluating quality. As I said: This future evaluation would estimate the relative accuracy and valued added of each contribution relative to resources used, carefully tracing out where these insights came from and where they led.
I'm afraid I can't go for posterity review of the kind described in the blog post as a better measure of quality. It really measures impact, which is a function of both quality and publication venue. Suppose two papers are published simultaneously with essentially the same idea (independently developed), one in Nature and one in a much lesser known journal. The article in Nature will be much more influential and score much higher in a posterity review, even though the two may be of equal quality, in the sense that if the publication venues had been reversed, the posterity review scores would also have been exactly reversed. I expect that non-blind refereeing would correlate more highly with posterity review than would blind refereeing, since a big name who publishes an idea will cause a bigger impact than an unknown who publishes the same idea.
Norman, great articulation of my exact concern. It's weird to me that it seemed Robin couldn't get there until your articulation. It seemed to me that my poorer articulation of the same concern was invisible to him, as if he was vested in an argumental position rather engaged in an empirical inquiry of the best descriptive model of reality.
Norman, a more ideal measure of quality would be posterity review, or a current betting market estimate of such review. I agree that it is possible that current practice uses affiliations and names to reduce the quality of estimates.
I still want to know what other method of checking quality you have in mind. I'm not saying blind review is perfect, just that's it's better than any other method I can think of. You can't criticize them for not have checked their results against another method unless you can define what you mean by quality and provide some alternative method of measuring it.
I take it that Michael is not arguing that affiliation or prior publication record is part of the definition of quality, but rather that it adds information about quality where quality is defined by an independent measure.
I don't agree that we can assume that adding affiliation adds to accuracy of quality of judgment simply because it adds information. I think this kind of information is very susceptible to 'lock'in' / path dependence bias. Suppose 100 unpublished scholars submit papers to Nature that are equally good, by some objective criterion. Nature only has room for one. 5 more get published in journals with impact factor 3, 20 more in journal with IF =1. etc. If they continue to publish equally good work for the rest of their careers, and we take into account prior publication of the author as a tie-breaker, this pattern of publication will be locked in. If we don't take prior publication record into account, different individuals will get the Nature publication in each round, and the distribution of publications will approach the objectively accurate distribution of equality.
I don't agree that accuracy is improved if the referees take a second look to papers with big name authors. Yes, it improves the quality of that review, but the question is whether it improves aggregate quality. The flip side is that the referees don't take a second look at the papers of unknown authors. This means that the papers of unknown authors systematically get judged more harshly because there is a selection bias against a second look. In other words, Michael's mechanism is a source of error in the assessment process, not a correction.
You could argue that giving a second look to known authors is warranted, despite this bias, because their papers are more likely to warrant a second look, so a disproportionate number of errors are corrected by this mechanism and it outweighs the systematic bias against unknown authors. There are two problems with this argument: how do you know where the balance lies? That's an empirical question, and I see no particular reason a prior reason to believe that the balance will improve rather than increase accuracy. More fundamentally, the argument assumes that known authors are better. But how do we know that? Because they have more publications. But that gets us right back to the problem of lock-in bias. This argument is circular.
I also like Michael's comment, although I'm not so sure about the proposed compromise solution.
Robin: "Hopefully, by definition the reviewers were assigned to evaluate quality, so anything they used is therefore, according to them, a quality clue."
Not necessarilly, in the world I live in. Perhaps anything the reviewers used was, according to them, a quality clue. Perhaps the reviewers did things other than what they were assigned to do. I don't follow your claim, expressed with 100% certainty, that that isn't possible.
Michael, you said what I should have said.
I don't see this happening in harder sciences. First for each article there are only a limited number of people in the world who are competent to read and understand. Secondly, this restricted niche knows full-well what other people in the field are doing, in what style, etc....Third, people post preprints on home-pages, online archives, plus they distribute the preprint among friends before submitting.
Robin's point is surely that even if blind review is a more accurate measure of quality than affiliation, as it very probably is, assuming that both contain some information some method of aggregating the two is probably more accurate than either alone.
To clarify via exaggerated example, if when the author's name is withheld a review committee concludes that a paper by Feynman is of poor quality we are at least seriously inclined to consider the possibility that they have made the mistake, not him.
Hopefully, by definition the reviewers were assigned to evaluate quality, so anything they used is therefore, according to them, a quality clue.
Norman and Sunil, you seem to be presuming that an anonymous evaluation has a higher correlation with "true quality" than an evaluation where the author is known. My point is that this assumption is not obviously true.
Norman, no quality measure will ever be perfectly reliable, true, but that doesn't stop us from checking some measures against others.
What "other indicators of paper quality" would be more reliable that blind review? In principle I suppose the fundamental measure of quality is the long-term influence of a paper. But no post-publication metric, e.g. number of times cited, is reliable because these are all clearly affected by author name and publication venue.