A tiny bias goes a long way when it comes to grant review

Feb 11 2015 Published by under Fixing the NIH, NIH

From ScienceInsider:

Now, a new computer simulation explores just how sensitive the process might be to bias and randomness. Its answer: very. Small biases can have big consequences, concludes Eugene Day, a health care systems engineer at the Children's Hospital of Philadelphia, in Research Policy. He found that bias that skews scores by just 3% can result in noticeable disparities in funding rates.

T. E. Day, The big consequences of small biases: A simulation of peer review, 2015, Research Policy [epub ahead of print 28 Jan] [Publisher Site]

from the paper Abstract:

When total review bias exceeds 1.9% of grant score, statistically significant variation in scores between PC and NPC investigators is discernable in a pool of 2000 grant applications. When total review bias exceeds 2.8% of total grant score, statistically significant discrepancies in funding rates between PC and NPC investigators are detectable in a simulation of grant review.

Day generated a Preferred Class of applications and a NonPreferred Class of applications and ran a bunch of 3-reviewer scenarios with and without reviewer bias against the NPC applications. As far as I can tell the takeaway conclusion about funding here refers to a situation in which the effective payline is 10%. You will immediately grasp that NIH grant review was a strong contributor to the model parameters.

I will admit I am only able to grasp the main points here and I am in no way able to evaluate the nitty gritty.

But it appears to have a very strong message. Namely, that our introspections that "well, if there is bias it is very tiny so we don't have to be worried about it" needs to change.

There is something even scarier in this paper. From the Discussion:

The threshold level of bias in this environment seems to be 2.8% of the total possible score of the grant; this is the level at which the 95% CI of the odds ratio “kisses” 1.00. This represents a single reviewer with a bias of 0.75 points (or three reviewers each with biases of 0.25 points), which is less than half (44.4%) of the standard deviation in a single reviewer’s score. What this suggests is that levels of bias which are sub-noise – that is, that are dramatically less detectable than normal variation in reviewer scores – are sufficient to substantially bias the number of funded applications in favor of preferred investigators.

RIGHT???? The bias can be of smaller effect size than many "normal" sources of variability in scoring that we accept as the resolution of the system. And it still leads to a statistically significant bias in funding outcome.

We are talking in recent days about bias in favor of highly established, older scientists. It has been longer but the Ginther report indicating disparity of grant review outcome for African-American PIs is clearly relevant here.

What this simulation cannot do, of course, is to model the cumulative, iterative effects of review bias. Namely, the way that selection of PC applications for funding has a tendency to increase the bias in the reviewer pool, since those beneficiaries become the next reviewers. Also, the way that over the long haul, disparity of the first award can lead to actual quality differences in the subsequent applications because PI #1 had the money to pursue her science and PI #2 did not have as easy of a time generating data, publishing papers and recruiting postdoc talent.

5 responses so far

  • toto says:

    Small differences in the means = large differences at the tails. A known property of the normal distribution.

    The silver lining in this is that even the slightest individual biases would be easy to detect by simply looking at the outcomes in a controlled situation.

    Didn't you make a post sometime last year about the NIH running an experiment to measure the amount of race/gender bias in grant review? I can't find it. 🙁

  • drugmonkey says:

    Maybe. I can't remember. It is going to be difficult to measure racial biases, gender biases, etc. What I really want them to do is measure the sort of "normal" variance that is mentioned in this paper by running parallel study sections on the same collection of applications. Calculate inter-rater reliability, etc.

    That would be informative and can absolutely be done.

  • Busy says:

    >> That would be informative and can absolutely be done.

    Not quite for NIH grant panels, but here's a study with parallel review committees for a peer-reviewed conference in Computer Science.

    The NIPS Experiment


  • toto says:

    I have found the post I was referring to:


    "One basic issue that the NIH will address is whether grant reviewers are thinking about an applicant’s race at all, even unconsciously. A team will strip names, racial identification and other identifying information from some proposals before reviewers see them, and look at what happens to grant scores."

    If anyone has any news about this, I'd really like to know more.

  • […] aren’t vaccinating their kids (one way to solve the income inequality problem…) A tiny bias goes a long way when it comes to grant review Dropping Science Why it’s important to ask Scott Walker about evolution Rinderpest, Measles and […]

Leave a Reply