Pier and colleagues recently published a study purporting to address the reliabiliy of the NIH peer review process. From the summary:
We replicated the NIH peer-review process to examine the qualitative and quantitative judgments of different reviewers examining the same grant application. We found no agreement among reviewers in evaluating the same application. These findings highlight the subjectivity in reviewers’ evaluations of grant applications and underscore the difficulty in comparing the evaluations of different applications from different reviewers—which is how peer review actually unfolds.
This thing is a crock and yet it has been bandied about on the Twitts as if it is the most awesome thing ever. "Aha!" cry the disgruntled applicants, "This proves that NIH peer review is horrible, terrible, no good, very bad and needs to be torn down entirely. Oh, and it also proves that it is a super criminal crime that some of my applications have gone unfunded, wah."
A smaller set of voices expressed perplexed confusion. "Weird", we say, "but probably our greatest impression from serving on panels is that there is great agreement of review, when you consider the process as a whole."
So, why is the study irretrievably flawed? In broad strokes it is quite simple.
Restriction of the range. Take a look at the first figure. Does it show any correlation of scores? Any fair view would say no. Aha! Whatever is being represented on the x-axis about these points does not predict anything about what is being represented on the y-axis.
This is the mistake being made by Pier and colleagues. They have constructed four peer-review panels and had them review the same population of 25 grants. The trick is that of these 16 were already funded by the NCI and the remaining 9 were prior unfunded versions of grants that were funded by the NCI.
In short, the study selects proposals from a very limited range of the applications being reviewed by the NIH. This figure shows the rest of the data from the above example. When you look at it like this, any fair eye concludes that whatever is being represented by the x value about these points predicts something about the y value. Anyone with the barest of understanding of distributions and correlations gets this. Anyone with the most basic understanding grasps that a distribution does not have to have perfect correspondence for there to be a predictive relationship between two variables.
So. The authors claims are bogus. Ridiculously so. They did not "replicate" the peer review because they did not include a full range of scores/outcomes but instead picked the narrowest slice of the funded awards. I don't have time to dig up historical data but the current funding plan for NCI calls for a 10%ile payline. You can amuse yourself with the NIH success rate data here, the very first spreadsheet I clicked on gave a success rate of 12.5% for NCI R01s.
No "agreement". "Subjectivity". Well of course not. We expect there to be variation in the subjective evaluation of grants. Oh yes, "subjective". Anyone that pretends this process is "objective" is an idiot. Underinformed. Willfully in denial. Review by human is a "subjective" process by its very definition. That is what it means.
The only debate here is how much variability we expect there to be. How much precision do we expect in the process.
The most fervent defenders of the general reliability of the NIH grant peer review process almost invariably will acknowledge that the precision of the system is not high. That the "top-[insert favored value of 2-3 times the current paylines]" scoring grants are all worthy of funding and have very little objective space between them.
Yet we still seem to see this disgruntled applicant phenotype, responding with raucous applause to a crock of crap conclusion like that of Pier and colleagues, that seem to feel that somehow it is possible to have a grant evaluation system that is perfect. That returns the exact same score for a given proposal each and every time*. I just don't understand these people.
Elizabeth L. Pier, Markus Brauer, Amarette Filut, Anna Kaatz, Joshua Raclaw, Mitchell J. Nathan, Cecilia E. Ford and Molly Carnes, Low agreement among reviewers evaluating the same NIH grant applications. 2018, PNAS: published ahead of print March 5, 2018, https://doi.org/10.1073/pnas.1714379115
*And we're not even getting into the fact that science moves forward and that what is cool today is not necessarily anywhere near as cool tomorrow