Your Grant In Review: Voting Outside of the Range

Sep 29 2008 Published by under Grant Review, Peer Review

In a footnote to a prior post I noted that a single grant reviewer was unlikely to have a very large impact on the fate of a specific NIH grant proposal. I've been thinking about this in terms of one of the more technical aspects of grant review as conducted by the NIH study section: voting outside of the range.


As a very brief overview, the NIH grant application is assigned to three reviewers who assess the merit, write a critique and submit a suggested score prior to the study section meeting. Permissible scores range from 1.0 (the most meritorious) to 5.0 (the least meritorious). Only the applications for which the three assigned scores average in the top X% of the entire panel's allocation for a given round are discussed at the actual meeting*. (At present the X is running about 40% in my section and from what I hear, many other normal CSR sections.) If an application is to be discussed the three reviewers start by declaring these preliminary scores, discuss the application and then finish with declaring a post-discussion score.
Following the discussion and declaration of post-discussion scores from the three reviewers, the entire panel votes a score for the application. The panel is expected to vote scores within the post-discussion range. In some cases however, other members of the panel may choose to vote a score that is either better or worse than the range. Any member of the panel is free to do so. The CSR instructs reviewers (through the SRO) that they must declare their intentions if they are going to vote outside of the range.
I've seen some variability on this within one panel. Sometimes it seems the expectation is any deviation requires a declaration. Then we were told 0.2 outside was the criterion. The point of this declaration seems to be that the CSR wants to make sure that any points that are relevant for the decision of scientific merit have been discussed openly. Makes a certain sense that you don't want part of the panel to be voting on one issue that only they have thought about. There also appears to be an error-correction role in the event that a reviewer puts an unintended score down for a given application.
In practical terms, I have seen outside-the-range declarations occur maybe 5 times per meeting max and in (to my recollection) all cases the reviewers in question felt that the issues had been raised during the discussion, they were just coming to a different conclusion regarding the merit score. I should note that I see this happening in both the more-meritorious and less-meritorious directions. Now I am hearing that there may be a new directive coming down from on high insisting on additional formality and back-checking each outside-the-range score to make sure it was actually intended and a reason supplied.
This kind of thing gets me wondering about why? What is the magnitude of the problem and what are the sources of complaint. Are there really that many unintended scores gumming up the works and having a categorical effect on the score of an application? Is this part of answering suspicions that covert reviewer behavior is torpedoing (or saving) grants?
I scared up a very brief invented score set to give me a toehold. No doubt there are fancy dummy analyses that can be run but don't look at me for that. I started with a 25 member panel in which the postdiscussion range was 1.3 - 1.5 which resulted in ten 1.3 votes, six 1.4 votes and nine 1.5 votes. This would average to 1.4, resulting in a 140 priority score on the summary statement.
Suppose one person assigns an out-of-range 180: priority goes to 141
Okay, how about four people were feeling a little more negative at 180: 144.
Sure this score difference can be critical but it really isn't that big of an effect compared with some other grant review / grant preparation issues/mistakes. Furthermore I must note that while it is not uncommon for multiple people to say they are going outside the range, I can't recall hearing more than four for a single application.
Now, how about two real jerks who take it to the triage line of 250 : 148
Much more likely to be a qualitative effect on outcome. Now I have little insight beyond myself as to how far outside reviewers will go in a situation like this but based on a feel for the panel members, I would expect this to be rare. They are not generally mad and punitive in these situations. Just feeling that the discussion and critiques did not match the eventual scores or were out of whack with other discussed applications in that round or something. This results in modest changes in assigned scores, I would think. Still, it is possible that I am too optimistic in this regard.
Harder to move in the good direction, of course. Two 100 (perfect) scores moves the average to 137, four of them to a 135. And I think it would be vanishingly rare when four 100 votes resulted from a post-discussion low end of 130!
Okay, I realize the permutations of score ranges and hypothesized out-of-range votes is enormous. And anyone is free to conduct some similar analyses with your own "plausible" scenarios. But at least for me I am satisfied that the qualitative impact of outlier votes on eventual grant disposition (funded/not funded) is not huge.
This reiterates for me that when one suspects that one voter (not an assigned reviewer, that's another question) is ruining one application's chances by a covert out-of-range vote, well, this seems unlikely to happen. Among other issues, it should be clear that CSR pays attention to out-of-range voting. This analysis does little, however, to comfort me as to why there needs to be an increased focus on this issue. Now it may be the case that there are some unusual shenanigans going on because my analysis suggests to me that it has to be pretty egregious in terms of a mini-conspiracy of like-minded voters or a very divergent score to have an effect. If so, there are some serious problems with the probity of the people reviewing. Alternately it may be that this effort is resulting more from heat (senior PI complaints) than from light (CSR reviewing data). In which case, as always, I would like to see the data before we do anything too drastic**.
__
*Any panel member can insist on discussion of any proposal, no matter how dismal the score. There are also some additional niceties having to do with nominating or de-nominating an application for discussion or triage/streamlining. Also, in some cases there may be a different number of reviewers assigning scores.
**I don't like the idea of increasing the social burden for voting outside the range, myself. It occurs rarely enough that I would think that there would be a hurdle, especially for the less-experienced reviewer. I mean, think about it, your first time on a panel and nobody has voted outside the range on the first dozen or two applications? Are you going to be the one? After receiving a set of severe-sounding instructions from the SRO? Ha!

10 responses so far

  • dajokr says:

    Over at our shop, the SRO allows us to go 0.4 on either side of the range instead of 0.2. We usually get to consensus so even outliers will have little effect on the final priority score as you describe. There have only been a couple of cases in several cycles that anyone has declared they were going out of the range. And our SRO looks very closely to police that - I once received a post-meeting phone call that I had accidentally gone outside the range without declaring such (for the record, I was more enthusiastic than the range).
    The only issue I've seen arise that may be of interest to your readers or other SS panelists is when a panel cannot come to consensus. I've witnessed the range staying at 1.8-2.8 and panelists were urged to "vote their conscience" at either end of the spectrum rather than vote the average. I have no data to indicate whether or not the final score ends up at 240 or whether it gets closer to 180.
    I'll also routinely take up your first asterisk: I often will request that a just-barely unscored application be discussed, especially if from a new or junior investigator. I feel that these folks need the benefit of the discussion and resumé provided by the SRO.

  • Another biomedical researcher says:

    Like dajokr, I similarly have seen 0.5 on either side of the range being considered OK. Which means that voters can stay within the allowable range and have a substantial impact by being strategic. If two proposals have a 1.5-1.7 range, clearly all voters will have an impact on funding, and some 2.2s (or 2.0s) for one combined with several 1.2s for the other will have a significant chance of determining funding (and these games will always be played both positively and negatively).
    My main study section (the one I send most proposals to, as opposed to the one I have served on) has a rather dysfunctional composition of about half each of two fields that do not exhibit substantial mutual respect - think a "classical" scientific discipline versus an "upstart" subdiscipline (the latter more directly biomedical (but more academic in nature), but the former having had an enormous biomedical impact indirectly (through subsequent application by industry)). Both fields have vocal partisans that think that the other field is not relevant in the 21st century (and the partisans may even constitute a majority of the field). I have been told that these "my discipline is more important than your discipline" games, while not universal, take place routinely among a subset of reviewers (on both sides) (told by colleagues on this study section). These conflicts pose problems both in expanding the range, and in gamesmanship in voting. But CSR apparently doesn't see a need to resolve these conflicts, since we are usually in the same academic divisions within the same academic departments.
    One other note: I have been triaged, as a new investigator, when my application was at the 30th percentile (as related to me by a close friend who is the relevant program officer at NIH). It's nice that some study sections discuss up to the 40th percentile, or that some study sections recognize that new investigators will particularly benefit from a discussion, but certainly don't assume this situation to be universal.
    And if you're in one of those study sections, well, it sucks to be you. I am fully amused that NIH saw fit to award three of the 2008 Pioneer/New Innovator (DP1/DP2) awards to youngish faculty (late-game assistant or newly minted associate professors) that routinely submit applications to this study section. All three are also in that "upstart" subdiscipline. The one considered irrelevant by half of a study section.

  • I've witnessed the range staying at 1.8-2.8 and panelists were urged to "vote their conscience" at either end of the spectrum rather than vote the average. I have no data to indicate whether or not the final score ends up at 240 or whether it gets closer to 180.

    Which, of course, makes no fucking difference AT ALL, because neither of those scores is even close to fundable nowadays.
    Nothing pissed me off more than the last study section I served on--reviewing K99s--in which we spent an inordinate amount of time discussing and arguing about distinguishing the scores of applications that were NOT EVEN FUCKING CLOSE TO BEING FUNDABLE, and spent almost no time at all trying to distinguish among the scores of the top six applications, only two or three of which were gonna get funded.
    Hello, asshole study section chair! You're playing a game in which the top six out of 30 apps are obvious to everyone, but in which only the top three of those six are gonna get funded. Spending more than one motherfucking millisecond trying to distinguish between the second six and the third six is FUCKING RIDICULOUS. And FAILING to carefully distinguish within the top six between the top three and the second three is a DISGRACE!!!!!

  • whimple says:

    And FAILING to carefully distinguish within the top six between the top three and the second three is a DISGRACE!!!!!
    Yes, but is it even possible to reliably pick the best three of the top six?

  • DrugMonkey says:

    It may not be perfect but KomradePhysioProf is right that we might as well spend the majority of the discussion time on the ones that are clearly in the decision zone. I concur with the frustration that sometimes we spend endless amounts of time on something that is clearly not going to be competitive. I never get this.
    Well, actually I do. People want to be involved when they travel all that way. Now suppose their best application is going to be a 180 and the rest are triaged. Are they going to simply say nothing or take their chance to DoSomething...?

  • whimple says:

    It's also not totally clear how far down the barrel program might pick up applications, so stuff out of the fundable range might nevertheless wind up funded, with some frequency determined by the out-of-range score. There's also the consideration that some (most?) of these applications are going to be seen again, so the initial discussion can serve as a familiarization exercise, if nothing else.

  • DrugMonkey says:

    whimple@#6, a very excellent point. I have no specific data other than the odd anecdote. However, I have heard on more than one occasion of Program saying something to the effect that a 200 was a big watershed in terms of their ability to pick up grants "If you'd just come in under 200 we might be able to do something" and the like. I am certain the actual cutoffs vary from IC to IC, from round to round and possible from IC subdivision to subdivision.
    This is pretty meta though. Given the pronounced tendency of the panels to clump scores around the perception of what is necessary to get funded it is no leap to think that if reviewers appreciated this that there would be another blip just under 200 say. I haven't been paying attention to this but I certainly haven't noticed anything glaring about people fighting hard in the zone of scores that are higher than the perceived fundable line.

  • PhysioProf says:

    People want to be involved when they travel all that way. Now suppose their best application is going to be a 180 and the rest are triaged. Are they going to simply say nothing or take their chance to DoSomething...?

    Do we want to talk about the fact that the members who punish the rest of the study section--and the applicants--with the futile exercise of "saying their piece" and "making their points" even when it is a complete utter absolute waste of time seem to almost always be the ones from Eastern Southern Gipip State University for whom study section is probably the only time in their lives that anyone ever listens to what they have to say?

  • >>a single grant reviewer was unlikely to have a very large impact on the fate of a specific NIH grant proposal
    DrugMonkey, your comment is correct once the discussion starts - but if the single reviewer is one of the 3 assigned to read the proposal and review it, then his/her score has a huge effect. If the score is below threshold, as you write, then the proposal won't even be discussed. The threshold is determined as the average of just 3 reviewers. And in some cases only 2 reviewers submit their scores on time [I've seen this on the study section of which I'm a member], and the 3rd reviewer just goes along with the suggestion to "streamline" the proposals that don't make it.
    So I think a single reviewer can have a huge impact - can kill a proposal, in fact.

  • DrugMonkey says:

    Indeed, SS, which is why I included a caveat.

    This reiterates for me that when one suspects that one voter (not an assigned reviewer, that's another question) is ruining one application's chances by a covert out-of-range vote, well, this seems unlikely to happen.

    I was trying to get at voter behavior outside of the range here, not at the recommended range itself. When the three reviewers do not agree to a substantial extent, sure, things get crazy and usually to the detriment of the application.

Leave a Reply