Impact Score Versus Percentile Scatter Plot

Aug 09 2010 Published by under NIH, NIH Careerism, NIH funding, Peer Review

Jeremy Berg--Director of the National Institute of General Medical Sciences--just posted this scatterplot of impact score versus percentile for hundreds of R01s assigned to NIGMS at his blog:

Overall Impact Score is the average final score from 1-9 given to the grant application by all the members of the study section, x10, and rounded (up?) to the nearest whole number. The percentile is the percent of applications reviewed by that same study section over the current round and the previous two that had better impact scores than that, and rounded up to the nearest whole number.

For those of us fascinated by inside NIH grants baseball, this is some serious fucken catnip, as there is a fuckton of interesting stuff in there. One of the most fascinating to me is the differences it reveals in scoring behavior for different study sections.

For example, looking at "milestone" impact scores of 20 and 30 reveals dramatic differences in "score inflation" in different study sections. One study section only scored 3% of its grants better than 20, while another study sections scored 20% of its grants better than 20. And one study section only scored 10% of its grants better than 30, while another scored 38% of its grants better than 30.

What would be truly fucken fascinating would be to redraw this scatterplot, with the dots representing funded grants drawn in green (for money!) and the dots representing unfunded grants in brown (for poop!).

10 responses so far

  • DrugMonkey says:

    Agreed, this is one of the most fantastic datasets out of NIGMS yet. Along with other implications, it should be used with all new investigators just getting their first sets of scores to show them how things vary from section to section. Absolutist concepts of a "good" or "bad" score are simply the wrong way to think about it.

    It is also worth a good BS session about whether all sections *should* have exactly the same calibrations or not.

  • physioprof says:

    The nearly thirty percentile range for impact score of thirty is mindblowing. That kind of huge variation in group scoring activity should be very frightening to those whose applications are reviewed in Special Emphasis Panels and percentiled against all-CSR.

  • DrugMonkey says:

    whose applications are reviewed in Special Emphasis Panels and percentiled against all-CSR

    word. I've had SEP scores percentiled against the CSR base as well as against the parent study section. I never understood why the difference. Any thoughts? Could have been a change in policy enacted at some point but if so, I missed the notice on this....

  • DrugMonkey says:

    I am also amused by the number with overall scores of 10-15 or so. When the new instructions came down my Chair claimed that Scarpa or somebody emphasized in the chairs meeting that 1.0 scores were to be reserved for the best evah, once in a reviewing lifetime apps. Looks like not every study section got this message since these are scores from a single round of review and a mere 654 apps and all...

  • physioprof says:

    Many SEPs don't have a single "parent" study section.

  • DrugMonkey says:

    Sure. and some do. are you thinking this is the only difference? When it is all applications from a single parent study section then the percentile is against the study section base, else against the CSR base?

  • physioprof says:


  • So what's the long term solution? Some kind of normalization of the scores relative to the median score meted out by the study section?

  • pinus says:

    Is there a way to get linear regressions from different study order to confirm that there are greatly differing scoring standards being used. I am not sure that this would be useful or even make sense.

  • DrugMonkey says:

    ICs could try z-scores compared to percentiles but one suspects a big part of the issue is differing shapes of the score distributions...

Leave a Reply