Your Grant In Review: Discussion Order by Preliminary Score

Sep 10 2009 Published by under Grant Review, NIH

In a recent discussion over at Medical Writing, Editing & Grantsmanship, Comrade PhysioProf observed:

Much more important than the change in application format-and implemented with absolutely no community input that I am aware of-is the new study section policy that applications are discussed in the order of the preliminary scores, starting with the best and stopping when about 40% of the apps have been discussed or time runs out. This gives even more power to the assigned reviewers, as there is no longer even lip service given to the decision to triage, and no opportunity for a non-assigned reviewer to rescue an application from triage.

Prior to this new initiative, applications were reviewed by the study section in an order that did not depend on the initial priority score. This always seemed to be a good thing to me. My thinking was based on generic ideas that randomization of conditions would prevent any consistent biases related to review order. The underlying hypothesis being, on reflection, that the discussion of a given application would be influenced by the discussion of the prior application(s) and the timing within the two days allocated for discussion (Would you request that your application be reviewed at the end of the first long day?)
The new procedure is to review grants (grouped by mechanism or type) in the order of the initial priority score. CPP apparently thinks this is a bad thing.


As a reminder, research grant applications to the NIH are generally assigned to 3 reviewers who supply detailed commentary and analysis and a priority score on week before the committee meeting. The average of these three scores becomes the initial priority score. In the course of discussion (led by the assigned reviewers but with input from the rest of the ~20-30 member panel), the three reviewers may adjust their views in supplying the post-discussion scores which then define the post-discussion range. The entire panel then votes priority scores; generally within the post-discussion range but with variation permitted*. The average of the entire panel then turns into the priority score for the application.
Obviously, the opinion of the three assigned reviewers has a very influential bearing on the ultimate score of the application. There are essentially three ways in which the initial (and independent) evaluation of the three assigned reviewers can be changed from the approximate middle of the range.
First, you can have a panel which sides strongly with one reviewer over another or out and out revolts against all three of the reviewers when it comes time to vote. I have seen this latter happen, btw. Infrequently, yes, but sometimes a large number of panel members indicate they are voting outside of the post-discussion range.
Second, you can have reviewers being significantly swayed from their pre-discussion score during the course of discussion. I would hate to put any estimate on the frequency and the magnitude of swaying but it does happen fairly often. Based on my experiences on panels, anyway.
Third, you can have the reviewers being influenced during the "read" phase in the week prior to the meeting when all the initial scores and critiques are made available online. Reviewers may be influenced either by the other reviewers' critiques and scores for the application in question or by a general re-calibration through reading the criticisms and scores for other applications assigned to the panel.
I do not have any firm way to determine how often these shifts from the very initial, independent priority score of the three assigned reviewers occurred. All I can offer is my subjective experience of "quite frequently". And as to whether modification from the initial stance made the essential qualitative difference (funded, not funded)? We're not permitted to keep notes so I have to rely on memory but I'd say heck yes. In both directions. (Including in a more indirect way: i.e., one application being reviewed or scored a few score of points better which likely positioned the subsequent revision for a fundable score.)
The present drive to review the applications in the order of initial priority (best to worst) has the potential to minimize score movement**, in my view. So I agree with CPP. How so?
The SRO has to harden the review order several days in advance of the meeting so that the attached program officers can be notified of approximately when to listen in on the discussion. So any movement of the assigned reviewers during latter part of the read phase is not able to affect the ordering. Therefore if people start adopting a mindset of the review order matching the initial priority score, the other panel members are going to have an internal ranking that is not necessarily sensitive to the actual pre-discussion scores as altered (potentially) from the initial, independent scores.
The next issue gets at CPP's speculation and here I am talking out of my hat. The ordering of review makes it obvious to all where a given application being discussed stands. If you are down at about the 30th percentile of discussed apps, it is unlikely that anything you say is going to move that sucker into the fundable range. This increases the "why bother" demotivating. Once you get past about 2pm of the first day you have this demotivator correlated with the low-blood-sugar and exhaustion demotivators. I just think that the prior discussion ordering which did not pay attention to initial priority score permitted these factors to be less influential. Sure, one can have a running mental tally of scores and score ranges but this takes some mental effort. It will not be automatically clear that the range being discussed puts the present app in the 25th percentile versus the 20th percentile..perhaps critical if you think your comments may sway the panel about 5 percentile points in the good direction. A move from 25th to 20th percentile is probably meaningless but 20th to 15th may make all the difference in the world of programmatic pickups. It might make the difference for the revised application as well..even though reviewers are not supposed to anchor by the priority score of the prior version, this is hard to escape.
As CPP notes, this is one of those review things that was just put in place without much announcement or discussion. Kind of like a prior push to sharply reduce the number of assistant professors on panels. It contrasts in that way with certain other initiatives that were discussed extensively or at least announced with great fanfare and rationale. One hopes that it was based on a good reason and the limitations, such as I outline here, were considered.
__
*the intent to vote outside the range has to be declared.
**there have been some mutterings over the past couple of years about doing away with discussion altogether. Essentially sticking with the initial scores of a limited set of reviewers. It is not inconceivable to me that the ordered-discussion move is actually intended to minimize movement of scores through discussion.

13 responses so far

  • One SRO told me that part of the reason for using this order is so that applications with similar merit will be discussed close together in time, with the intent that this will facilitate making the finer distinctions of merit.
    In my opinion, this is a total crock, as it relies on the totally bogus assumption that there *exist* any fine distinctions of merit between grants. I am convinced that any distinction of merit finer than decile ranking is 100% arbitrary.

  • neurolover says:

    "In my opinion, this is a total crock, as it relies on the totally bogus assumption that there *exist* any fine distinctions of merit between grants. I am convinced that any distinction of merit finer than decile ranking is 100% arbitrary."
    A very important statement, one that even has psychological support, no? And yet, everyone in charge of making make-or-break decisions insists on thinking that they can rank merit more finely than that (grants, college admissions, paper reviews, . . .).
    I think decile rankings are very much the best that we can do, and sometimes even think that quintiles are really the best.
    We insist on thinking we can do better that (at least institutionally) because to do otherwise means admitting that 1M+ awards get made based on various forms of bias (some potentially intentionally, but lots of unintentional biases about field, method, individuals, universities, writing styles, personalities, . . . ).
    And, if we really admit that we can't tell the difference between 8th & 18th percentile, what's the solution? We can try to assume that biases are random, affecting different people, different fields, different ideas, different styles at different times. But, that's not true.

  • DrugMonkey says:

    And, if we really admit that we can't tell the difference between 8th & 18th percentile, what's the solution?
    Competing biases is the solution here, as with many other areas in which personal bias is part of the process.
    The CSR knows this and has it institutionalized in the requirements for reviewer diversity. It is just the junior folk who are systematically excluded. And of course we can always argue about the application of the existing diversity rules...

  • whimple says:

    There's no data that suggests better scoring grants produce objectively better science so from the perspective of the general public funding the work it doesn't really matter how grants are scored. Grant ranking QC is one big circular argument so it's hard to get very excited about this kind of change one way or the other.

  • whimple says:

    Actually, the people who should be upset are the people that are already funded. For winners under the old system, all change is bad since it represents a chance to become a loser under the new system.

  • Beaker says:

    Doesn't this new system also create problems connected to the re-calibration of reviewer initial scores until reviewers learn the new scoring system? I had a colleague tell me about going to a recent study section where a couple of "very well seasoned" reviewers (read: codgers) arrived with their stack of applications all with preliminary scores like 1.6. 1.8, 2.2, 2.4, 3.0. It was pointed out to them that it was highly unlikely that all of those grants in their stack were between "exceptional" (a perfect grant) and "excellent." They re-calibrated during the course of the study section, but anybody who has not done so beforehand may be screwing over (or helping out) the particular grants they are asked to review.

  • qaz says:

    At the last study section I was on, my SRO said that they did the reordering because they had found that there was a trend that proposals discussed at the end tended to lose range (get scored worse) relative to their original scores than scores early on. The thought was that this was due to exhaustion. So they wanted the grants which were more likely to get funded to get the discussion while people were fresh and less likely to say, "whatever, just trash it."
    PS. I'm not defending this. Just reporting what I was told. At this point, I don't like anything about the new system. I found that the more I used the old system, the more it made sense to me. (Sure, it had it's problems, but they were consistent problems that we understood as reviewers and as applicants.) But the more I use the new system, the less it makes sense to me.

  • Neuro-conservative says:

    I heard an idea floating around NIH corridors last year: quickly revisit the scores at the end of session to make sure that there has been no calibration drift over the course of the study section. I suppose that idea is now off the table?

  • msphd says:

    What Neuro-conservative said.
    I've also noticed the reverse trend, similar to gymnastics scores at the Olympics, judges/reviewers tend to hold back at the beginning, no matter how good the grant is, to leave a little room at the top in case there are any surprise standout performances. This usually means that if your grant really is outstanding, but discussed first, your score will be artificially lower than it should have been.
    I'm not sure I see how merit can be decided without discussion. In my experience, no one person is really qualified to fully evaluate a grant - except in the rare cases where they also have a conflict of interest.
    If that's really true generally, and we assume that everyone properly recuses themselves (ha ha ha) then discussion is required to decide merit. Another reason is that whole subjectivity aspect that everyone likes to pretend scientists are immune to (but we're not).
    To rank-order the grants prior to discussing them, with no chance to re-calibrate, seems, as someone else put it, to be a system for "screwing over (or helping out) the particular grants they are asked to review."

  • qaz says:

    By the way:

    Kind of like a prior push to sharply reduce the number of assistant professors on panels.

    Does anyone know what's up with this? I was recently told that there's a new rule that NRSA panels cannot include untenured professors. Anyone know why?

  • tatil says:

    thank you very good.

  • microfool says:

    N-c:

    quickly revisit the scores at the end of session to make sure that there has been no calibration drift over the course of the study section.

    This only works if there are no conflicts of interest in the room, otherwise there would have to be a whole out of the room/into the room rigmarole. (On the other hand, NIH seems to have gotten over the chance for leak of info to COI just by placing the review in this order.)
    And if the scores did drift, what does the panel do? Re-score everything? Nominate grants for rescoring with a super-majority required to rescore a grant? Sounds like an administrative nightmare ripe for exploitation by the old boys network (let's make sure our boys are at the top), but it might allow for more accurate transmittal of the panel's intent for ranking.

  • DrugMonkey says:

    Qaz, you are kidding, right?
    Senior folks who found they had to revise a grant for the first time ever show a tendency to blame 'all those asst profs on panels' (or worse a specific person) without evidence or understanding of the task facing reviewers at present. Throw in some legit diff of opinion about the grant review/selection process (see this blog) and baddabing the whining goes supersonic.
    The real question is why Scarpa ate up the complaints hook, line and sinker...
    [added: I have observations on this topic in this prior post ]

Leave a Reply