Rockey explains the percentiling of tied NIH scores

Jun 24 2013 Published by under NIH, NIH funding

Admittedly I hadn't looked all that hard but I was previously uncertain as to how NIH grants with tied scores were percentiled. Since the percentiles are incredibly important* for funding decisions, this was a serious question after the new scoring approach (reviewers vote 1-9 integer values, the average is multiplied by 10 for final score. lower is better.) which was designed to generate more tied scores.

The new system poses the chance that a lot of “ranges” for the application are going to be 1-2 or 2-3 and, in some emerging experiences, a whole lot more applications where the three assigned reviewers agree on a single number. Now, if that is the case and nobody from the panel votes outside the range (which they do not frequently do), you are going to end up with a lot of tied 20 and 30 priority scores. That was the prediction anyway.
NIAID has data from one study section that verifies the prediction.

As a bit of an aside, we also learned along the way that percentile ranks are always rounded UP.

Percentiles range from 1 to 99 in whole numbers. Rounding is always up, e.g., 10.1 percentile becomes 11.

So you should be starting to see that the number of applications assigned to your percentile base** and the number that receive tying scores is going to occasionally throw some whopping discontinuities into the score-percentile relationship.

Rock Talk explains:

However, as you can see, this formula doesn’t work as is for applications with tied scores (see the highlighted cells above) so the tied application are all assigned their respective average percentile

In her example, the top applications in a 15 application pool scored impact scores of 10, 11, 19, 20, 20, 20.... This is a highly pertinent example, btw. Since reviewers concurring on a 2 overall impact is very common and represents a score that is potentially in the hunt for funding***.

In Rockey's example, these tied applications block out the 23, 30 and 37 percentile ranks in this distribution of 15 possible scores. (The top score gets a 3%ile rank, btw. Although this is an absurdly small example of a base for calculation, you can see the effect of base size...10 is the best possible score and in an era of 6-9%ile paylines the rounding-up takes a bite.) The average is assigned so all three get 30%ile. Waaaaay out of the money for an application that has the reviewers concurring on the next-to-best score? Sheesh. In this example, the next-best-scoring application averaged a 19, only just barely below the three tied 20s and yet it got a 17%ile for comparison with their 30%ile.

You can just hear the inchoate screaming in the halls as people compare their scores and percentiles, can't you?

Rockey lists the next score above the ties as a 28 but it could have just as easily been a 21. And it garners a 43%ile.

Again, cue screaming.

Heck, I'm getting a little screamy myself, just thinking about sections which are averse to throwing 1s for Overall Impact and yet send up a lot of 20 ties. Instead of putting all those tied apps in contention for consideration they are basically guaranteeing none of them get funded because they are all kicked up to their average percentile rank. I don't assert that people are intentionally putting up a bunch of tied scores so that they will all be considered. But I do assert that there is a sort of mental or cultural block at going below (better than) a 2 and for many reviewers, when they vote a 2 they think this application should be funded.

In closing, I am currently breaking my will to live by trying to figure out the possible percentile base sizes that let X number of perfect scores (10s) receive 1%iles versus being rounded up to 2%iles and then what would be associated with the next-best few scores. NIAID has posted an 8%ile payline and rumours of NCI working at 5%ile or 6%ile for next year are rumbling. The percentile increments that are permitted, based on the size of the percentile base and their round-up policy, become acute.
__
*Rumor of a certain IC director who "goes by score" rather than percentile becomes a little more understandable with this example from Rock Talk. The swing of a 20 Overall Impact score from 10%ile to 30%ile is not necessarily reflective of a tough versus a softball study section. It may have been due to the accident of ties and the size of the percentile base.

**typically the grants in that study section round and the two prior rounds for that study section.

***IME, review panels have a reluctance to throw out impact scores of 1. The 2 represents a hesitation point for sure.

16 responses so far

  • meshugena313 says:

    This is insane. But the last 2 rounds have supposedly had the new uncompressed scoring, with the percentile "calculated" (not sure that's the appropriate word, with the insanity described above) using only 1 or 2 rounds of data. I wonder if it reduced the tie scores and the strange results?

    From my experience with my latest proposal just reviewed, I was scored with a seriously uncompressed #, but percentiled reasonably lower. Not fundable, but whatever instruction they gave reviewers seems to have spread out the range.

  • meshugena313 says:

    WTF: In my quick read of Rockey's post, it also seems like the percentile is calculated with only the study section ranking, with the score having no influence:

    Percentile = 100 * [ (Rank - 0.5) / N ]

  • meshugena313 says:

    So I guess the scores from a few rounds are used to adjust the rank of proposals? Alright, that's a reasonable way to do it.

  • drugmonkey says:

    Normally it is the scores from a given study section for a rolling three rounds. The present round and the two prior rounds. there was recently a re-start which you noted, but as far as I know they'll be back to three rolling rounds soon enough (next round, right?).

    and yes, that is the point of the percentile. to take mean scoring variation across study sections out of the picture.

  • Physician Scientist says:

    Does anyone actually understand what the "re-start" actually did? Did they more forcefully tell reviewers to spread the score? Did they multiply the initial priority score by some factor to correct for compression? What exactly was done?

    note: I got a 36 Priority score with a 17th percentile on my last grant application, so I know the scores were better spread.

  • drugmonkey says:

    yeah, from what I've been seeing lately, a 4 average on initial scoring is right around the triage line. So a 36 eventual score getting a 17%ile does seem like a recalibration to me.

    ymmv, different sections already differed, etc, etc

  • Grumble says:

    Before we all start wailing and gnashing our teeth, please reflect for a moment about whether you (or anyone you know) has *ever* received a score that indicated that reviewers really liked the application (say, 25 or lower) coupled with a surprisingly high percentile (say, 35 or higher for a score of 20-25).

    Not that I'm the world's expert, but I've simply never heard the complaint that "the score was great but my grant didn't get funded because the percentile was too high."

  • DrWorms says:

    Grumble,

    Yes, I've gotten a 15 on a grant and been in the 11th percentile, and the grant missed the funding line. While not exactly what you are asking, that still rankled. When I spoke to the Program Officer, he sighed and said that study section routinely compressed their scores. This was about 3 years ago when they switched over to the new score system, but still it can happen.

    Study sections are certainly an imperfect tool, but I have yet to hear of a compelling alternative. It's impossible to ask a single person to read 7-10 grants, but then be able as a group to differentiate the 10th best from the 11th best grant out of 100. Especially since grants get assigned to a specific study section by people who are not necessarily experts.

    Still, overall, I think the NIH system is vastly better than NSF. Most of my colleagues who have submitted to NSF are livid about the 1x/year submission deadline. Largely because that means that only people who haven't submitted a grant that cycle can review, which leaves them with a small group from which to select peer reviewers.

  • DrugMonkey says:

    This is an exaggerated example, Grumbie. Clearly. But 20s that vary from 9%ile to 15%ile? Common and a critical gulf.

  • Grumble says:

    Yeah, DM, but there is absolutely no difference in potential impact - zero, zilch, none, nichts, nada - between a 9th and 15th percentile grant, or for that matter between a score of 19 and 21. Yes, it's an important distinction when the funding cut-off is 10th percentile, but as DrWorms points out, the system is incapable of making fine distinctions. Therefore, who gets funded is up to random chance already (at least within those fine distinctions). Adding more significant digits to the ranking system wouldn't change that randomness.

    In other words, it doesn't matter if, under the current system, your 9th percentile gets funded and my 11th percentile doesn't, whereas under an alternative system, the outcomes would have been reversed. They were both equally good grants.

    This is actually a reasonably good argument in favor of what some ICs do, which is fund everything up until a certain percentile, and then the program staff picks and chooses in a "grey zone" up until the 20th percentile or so.

  • Pinko Punko says:

    Presumably there will never be as small a base as in her example for most regular mechanisms, but a huge bunch up at 3 is killer. You really have to pray for a 29 and not a 31.

  • AP says:

    I had a 20, which was a 13th percentile which ended up missing the payline by 1%.

    I heard that there was a grant that a got a 10 in the same study section, but was 6th (!) percentile . So basically, you had to get a perfect score to be funded.

  • drugmonkey says:

    This is actually a reasonably good argument in favor of what some ICs do, which is fund everything up until a certain percentile, and then the program staff picks and chooses in a "grey zone" up until the 20th percentile or so.

    Yes and I have defended the role of Program in this process many times on this blog.

    Just so long as we all understand that the "picks and chooses" process remains highly influenced by the priority score, going by the available information.

  • drugmonkey says:

    You really have to pray for a 29 and not a 31.

    One vote outside the range will do it when the assigned reviewers all agree...

  • This is a highly pertinent example, btw.

    Dude, you are fucken high. This example is completely non-illustrative of *anything* having to do with scoring patterns and percentiles other than the arithmetic.

    First, percentile bases always have a *ton* more than 15 applications. Second, the absurd score compression in this example–with 11/15 applications (73%) receiving impact scores of 30 or better–is never, ever, ever going to happen in a real study section. Rather, impact scores of 30 generally are centered around 15%ile, at least in properly functioning study sections. (This is anecdotal, and also fits with the data published by some ICs on scoring patterns.)

    The only point of this example is to explain how percentiles are calculated, not to provide a realistic example of how scoring patterns and percentiles intersect. At least I sure hope that is the only point of this example! Because otherwise it is grossly misleading.

  • DrugMonkey says:

    I was referring to ties at 20 due to post discussion scores in complete agreement at 2. It happens a lot.

Leave a Reply