Now, a new computer simulation explores just how sensitive the process might be to bias and randomness. Its answer: very. Small biases can have big consequences, concludes Eugene Day, a health care systems engineer at the Children's Hospital of Philadelphia, in Research Policy. He found that bias that skews scores by just 3% can result in noticeable disparities in funding rates.
T. E. Day, The big consequences of small biases: A simulation of peer review, 2015, Research Policy [epub ahead of print 28 Jan] [Publisher Site]
from the paper Abstract:
When total review bias exceeds 1.9% of grant score, statistically significant variation in scores between PC and NPC investigators is discernable in a pool of 2000 grant applications. When total review bias exceeds 2.8% of total grant score, statistically significant discrepancies in funding rates between PC and NPC investigators are detectable in a simulation of grant review.
Day generated a Preferred Class of applications and a NonPreferred Class of applications and ran a bunch of 3-reviewer scenarios with and without reviewer bias against the NPC applications. As far as I can tell the takeaway conclusion about funding here refers to a situation in which the effective payline is 10%. You will immediately grasp that NIH grant review was a strong contributor to the model parameters.
I will admit I am only able to grasp the main points here and I am in no way able to evaluate the nitty gritty.
But it appears to have a very strong message. Namely, that our introspections that "well, if there is bias it is very tiny so we don't have to be worried about it" needs to change.
There is something even scarier in this paper. From the Discussion:
The threshold level of bias in this environment seems to be 2.8% of the total possible score of the grant; this is the level at which the 95% CI of the odds ratio “kisses” 1.00. This represents a single reviewer with a bias of 0.75 points (or three reviewers each with biases of 0.25 points), which is less than half (44.4%) of the standard deviation in a single reviewer’s score. What this suggests is that levels of bias which are sub-noise – that is, that are dramatically less detectable than normal variation in reviewer scores – are sufficient to substantially bias the number of funded applications in favor of preferred investigators.
RIGHT???? The bias can be of smaller effect size than many "normal" sources of variability in scoring that we accept as the resolution of the system. And it still leads to a statistically significant bias in funding outcome.
We are talking in recent days about bias in favor of highly established, older scientists. It has been longer but the Ginther report indicating disparity of grant review outcome for African-American PIs is clearly relevant here.
What this simulation cannot do, of course, is to model the cumulative, iterative effects of review bias. Namely, the way that selection of PC applications for funding has a tendency to increase the bias in the reviewer pool, since those beneficiaries become the next reviewers. Also, the way that over the long haul, disparity of the first award can lead to actual quality differences in the subsequent applications because PI #1 had the money to pursue her science and PI #2 did not have as easy of a time generating data, publishing papers and recruiting postdoc talent.