Revision strategy, with an eye to the new A2 as A0 policy at NIH

Occasionally, Dear Reader, one or another of you solicits my specific advice on some NIH grant situation you are experiencing. Sometimes the issues are too specific to be of much general good but this one is at least grist for discussion of how to proceed.

Today's discussion starts with the criterion scores for an R01/equivalent proposal. As a reminder, the five criteria are ordered as Significance, Investigator, Innovation, Approach and Environment. The first round for this proposal ended up with

Reviewer #1: 1, 1, 1, 3, 1
Reviewer #2: 3, 1, 1, 3, 1
Reviewer #3: 6, 2, 1, 8, 1
Reviewer #4: 2, 1, 3, 2, 1

From this, the overall outcome was.... Not Discussed. Aka, triaged.

As you might imagine, the PI was fuming. To put it mildly. Three pretty decent looking reviews and one really, really unfavorable one. This should, in my opinion, have been pulled up for discussion to resolve the differences of opinion. It was not. That indicates that the three favorable reviewers were either somehow convinced by what Reviewer #3 wrote that they had been too lenient...or they were simply not convinced discussion would make a material difference (i.e. push it over the "fund" line). The two 3s on Approach from the first two reviewers are basically a "I'd like to see this come back, fixed" type of position. So they might have decided, screw it, let this one come back and we'll fight over it then.

This right here points to my problem with the endless queue of the revision traffic pattern and the new A2 as A0 policy that will restore it to the former glory. It should be almost obligatory to discuss significantly divergent scores, particularly when they make a categorical difference. The difference between triaged and discussed and the difference between a maybe-fundable and a clearly-not-fundable score is known to the Chair and the SRO of the study section. Thee Chair could insist on resolving these types of situations. I think they should be obliged to do so, personally. It would save some hassle and extra rounds of re-review. It seems particularly called-for when the majority of the scores are in the better direction because that should be some minor indication that the revised version would have a good chance to improve in the minds of the reviewers.

There is one interesting instructive point that reinforces one of my usual soapboxes. This PI had actually asked me before the review, when the study section roster was posted, what to do about reviewer conflicts. This person was absolutely incensed (and depressed) about the fact that a scientific competitor in highly direct competition with the proposal had been brought on board. There is very little you can do, btw, 30 days out from review. That ship has sailed.

After seeing the summary statement, the PI had to admit that going by the actual criticism comments, the only person with the directly-competing expertise was not Reviewer #3. Since the other three scores were actually pretty good, we can see that I am right on the assumption of what a reviewer will think of your application based on perceptions of competition or personal dis/like. You will often be surprised that the reviewer that you assume is out to screw your application over will be pulling for it. Or at least, will be giving it a score that is in line with the majority of the other reviewers. This appears to be what happened in this case.

Okay. So, as I may have mentioned I have been reluctantly persuading myself that revising triaged applications is a waste of time. Too few of them make it over the line to fund. And in the recently past era of A1 and out....well perhaps time was better spent on a new app. In this case, however, I think there is a strong case for revision. Three of four (and we need to wonder about why there even were four reviews instead of three) of these criterion score sets look to me like scores that would get an app discussed. The ND seems to be a bit of an unfair result, based on the one hater. The PI agreed, apparently, and resubmitted a revised application. In this case the criterion scores were:

Reviewer #1: 1, 2, 2, 5, 1
Reviewer #2: 2, 2, 2, 2, 1
Reviewer #3: 1, 1, 2, 2, 1
Reviewer #4: 2, 1, 1, 2, 1
Reviewer #5: 1, 1, 4, 7, 1

I remind you that we cannot assume any overlap in reviewers nor any identity of reviewer number in the case of re-assigned reviewers. In this case the grant was discussed at study section and ended up with a 26 voted impact score. The PI noted that a second direct competitor on the science had been included on the review panel this time in addition to the aforementioned first person in direct competition.

Oh Brother.

I assure you, Dear Reader, that I understand the pain of getting reviews like this. Three reviewers throwing 1s and 2s is not only a "surely discussed" outcome but is a "probably funded" zone, especially for a revised application. Even the one "5" from Reviewer #1 on Approach is something that perhaps the other reviewers might talk him/her down from. But to have two obviously triage numbers thrown on Approach? A maddening split decision, leading to a score that is most decidedly on the bubble for funding.

My seat of the pants estimation is that this may require Program intervention to fund. I don't know for sure, I'm not familiar with the relevant paylines and likely success rates for this IC for this fiscal year.

Now, if this doesn't end up winning funding, I think the PI most certainly has to take advantage of the new A2 as A0 policy and put this sucker right back in. To the same study section. Addressing whatever complaints were associated with Reviewer #1's and #5's criticisms of course. But you have to throw yourself on the mercy of the three "good" reviewers and anyone they happened to convince during discussion. I bet a handful of them will be sufficient to bring the next "A0" of this application to a fundable score even if the two less-favorable reviewers refuse to budge. I also bet there is a decent chance the SRO will see that last reviewer as a significant outlier and not assign the grant to that person again.

I wish this PI my best of luck in getting the award.

29 responses so far

  • Fred says:

    Also, it seems to met that this "new A2 as A0 policy at NIH” could also be a good “new A1 as A0 policy”, to get feedback from Study Section A but not go back to them and switch to a different Study Section B with a fresh start.

  • Susan says:

    The cumulative amount of time and work wasted (on both the PI and the other reviewers' parts) by reviewer #3 is astounding.

  • The two 3s on Approach from the first two reviewers are basically a "I'd like to see this come back, fixed" type of position.

    Criterion scores on a triaged application are uninterpretable. If the study section is doing its job correctly, then by the time all of the scores have been spread among the discussed applications, initial 3s turn into 6s for those applications. But no one recalibrates criterion scores of triaged apps.

    In this case the grant was discussed at study section and ended up with a 26 voted impact score. The PI noted that a second direct competitor on the science had been included on the review panel this time in addition to the aforementioned first person in direct competition.

    Oh Brother.

    If a study section is doing its job correctly, a 26 impact score should be around 7%ile. What was the %ile for this app?

  • drugmonkey says:

    The cumulative amount of time and work wasted (on both the PI and the other reviewers' parts) by reviewer #3 is astounding.

    Agreed. At the least, there would be a five month delay but more likely 10 month (depending on when the summary statement showed up, but triaged are delayed after scored summary statements so, probably 10 mo) delay. And that is IF the A1 funds. If it does not, the PI is facing another 5 mo minimum.

    Although, to be fair, it is not obvious that the two original 3s on Approach would have resulted in a fundable score the first time. Maybe, maybe not.

  • Although, to be fair, it is not obvious that the two original 3s on Approach would have resulted in a fundable score the first time.

    Dude, if the study section is doing its job correctly, and if one assumes that the preliminary approach scores were to drive preliminary impact scores, then those 3s would translate into 6s after discussion.

  • drugmonkey says:

    You have been reporting a type of score spreading behavior that I have yet to see from any study section PP.

  • CMDA says:

    This should be of interest to the insightful readers of this blog: The NIH is offering prizes for the best ideas for fixing two aspects of peer review. $10K first prize for each of the two topic areas! Enough for a new -80 or a family trip to Tahiti, all courtesy of the Center for Scientific Review.

    Challenge #2 is pertinent to the present discussion (making peer review more fair):

    Challenge #2
    Strategies to Strengthen Fairness and Impartiality in Peer Review
    Submit your idea on how to strengthen reviewer training methods to enhance fairness and impartiality in peer review. First and Second prizes will be offered for the best overall ideas. Additional details can be found at FRN Doc.2014-10203.
    See: http://public.csr.nih.gov/Pages/challenge.aspx

    (Many thanks to DM for the time invested in this blog and for hosting these discussions.)

  • Busy says:

    This is a bad consequence of highly competitive programs. They end up requiring unanimity for approval since usually the evaluating committee has enough proposals that have all perfect scores that they don't even need to look at the ones were a single person dissented.

  • CMDA says:

    Update and clarification:
    Upon further reading, I see this contest (see my post above) does not address fairness in general, but rather fairness specifically towards minority applicants.

    (I was incorrect when I said this challenge was directly pertinent to the present discussion. Nevertheless, this issue has been discussed previously on this blog.)

  • drugmonkey says:

    It may be coming, like Winter, PP but it is not yet here for my study sections of interest. 26s are not coming with sub-10%iles.

  • Eli Rabett says:

    The tea leaves say this is a highly respected PI at a top notch place, one of the chosen people, with an approach that people love or think sucks. Something like that is hard to get funded.

  • drugmonkey says:

    I never think Environment and Investigator are all that diagnostic. Severe compression of range, usually.

  • qaz says:

    In the study section that I usually serve on, there is no longer any direct relation between the specific scores (signif, approach, env, etc.) and the rank because we actually rank our scores. So most people use the specific scores as a means of attempting to communicate with the investigator about the general quality of the grant, but if all the grants in the section that cycle are excellent (say all have 1-2 signif) they will still cover the complete range from 1-8 (9 is generally still very rare - and only really appears on really screwed up proposals). So (at least in my section), specific scores and overall scores are correlated, but you'll only see that correlation if you have access to all the proposals in the study section.

    Our SRO has promised to explain to applicants why they are getting 5's overall, but 2's in specific. Whether ze has or not, I don't know.

  • eeke says:

    I presume your PI knows this, but if he/she is concerned about competitors reviewing the application, this person can request in the cover letter that certain individuals be excluded from the review process.

    Also, the worst priority scores corresponded to "approach". I remember a blog by either you, Jeremy Berg, or Sally Rockey (sorry, I can't remember which) showing data which indicated that criterion scores for approach was the strongest predictor of whether the grant gets funded. Maybe Eli is right - some people think this PI's approach is great, and there is another group that thinks it sucks. That is extremely hard to overcome. I suffer the same problem.

  • This is a bad consequence of highly competitive programs. They end up requiring unanimity for approval since usually the evaluating committee has enough proposals that have all perfect scores that they don't even need to look at the ones were a single person dissented.

    This is exactly the problem that spreading the scores of all of the discussed applications across the entire range of 1-9 ameliorates. It has the unfortunate side effect of rendering the criterion scores of undiscussed applications meaningless, which is exactly why the criterion scores should be completely eliminated.

    As a member of a study section, you should want your scoring to have as much influence as possible on the ultimate funding decisions. Otherwise, why bother in the first place? If 20 percent of the applications receive "perfect scores", then all you are doing is throwing your hands in the air and saying to program staff, "Fucke itte. You figure out which half of these to fund." If you take your best 20 percent of grants and spread them across about half of the scoring range, then you are deciding which half of those will be inside the payline, and which half will not.

    To me it seems perverse for a study section to willingly abdicate their own scientific judgment to program staff. If that is how you feel, then why would you even volunteer to serve?

  • Wowchem says:

    EE is right, the only score that matters from the individual reviews is the approach section. Anything more than 2s kills you and indicates a flaw in the science. 2/5 reviewers saying your science if flawed is not good.

  • drugmonkey says:

    PP and qaz, if this score spreading system is being used evenly, great. I agree totally that study sections need to provide discriminable scoring. However, if POs are getting different types of calibrations across different sections it encourages them overriding scores/percentiles. They need to be on board too. I've heard rumor of higher level Program staff that "don't go by percentiles". This indicates to me that they are being confused about scoring schemes.

  • meshugena313 says:

    I can confirm CPP's claim of score spreading in 2 sections, the PO on my proposals was telling me about it on Friday. Impact scores in the high 40s are 20-30%, from personal experience.

  • Comradde PhysioProffe says:

    The last time I had an R01 scored by a study section doing this correctly, a 46 impact was 18%ile. POs who are ignoring %iles and relying on impact scores when comparing R01s scored by a variety of different study sections should be fired.

  • drugmonkey says:

    That's like saying a program officer that doesn't really like to give ESIs the break should be fired. Or those that look for any excuse to sustain youngsters at the expense of oldsters should be. Or those that only listen to Ivy League PIs. It remains the case that study sections *advise* Program. They do not dictate funding. Thus, I fail to see how using the advice in one way or another is a firing offense.

  • drugmonkey says:

    Criterion scores on a triaged application are uninterpretable.

    You never addressed the real issue for the A0 review- which is that initial criterion scores can be presumed to have at least some relevance to where the triage line was set. Also that the frustration here comes from the apparent score disparity. Do you not agree that significant pre-discussion score disparity should trigger discussion?

  • eeke says:

    I've read that applications that reviewers choose not to discuss are reported a week in advance of the study section meeting. Is this true? How is this dealt with when there is a disparity like this? Does the one reviewer who didn't like it convince the others of the deal-breaker?

    Also, @ CPP, some institutes do not rely on percentile scores for whether a grant is funded. For example, see the paylines for NIAID: http://www.niaid.nih.gov/researchfunding/paybud/pages/paylines.aspx. Percentile scores are considered only for R01 applications, and criterion scores are used for all others.

  • drugmonkey says:

    eeke- reviewers are generally supposed to submit their preliminary scores a week in advance (along with their written critique). This is called the Read phase and the rest is the panel can read and consider ALL of the reviews and scores, should they choose to do so. Many will just focus on the other reviews for their assigned applications. At the meeting itself any member of the panel can insist that any application be discussed.

    In my view, significant and qualitative disparities should be resolved by bringing it up for discussion OR by the reconsideration of the assigned reviewers during the Read phase. The latter happens more frequently than the former in my experience.

    Several years back they started reviewing grants in the order of preliminary scores. I think this had a major quelling effect on willingness to raise triaged applications for discussion. One of the reasons I think that change was a bad idea.

  • drugmonkey says:

    Note: I strongly advise all grant review newcomers to read as many of the panel's critiques as they possibly can. Great way to get up to speed.

  • qaz says:

    DM - There is no question at this point that all of NIH is confused about scores. Proof is the exhortation to "spread scores" and the offering of "what scores mean". These are fundamentally incompatible goals.

    I still say that the answer is to go back to the old system where you had 9 categories between a 2 and a 3. Was a 1.5 better than a 1.6? Yes, marginally. Was a 1.5 better than a 3.0? Almost certainly.

    In the current system, score spreading is definitely not being used evenly. I can say this from serving on multiple study sections in the last few years, each of which had a completely idiosyncratic solution to the incompatible nature of score spreading and score meaning.

    Another real concern is situations where there isn't enough data for percentages, like special emphasis panels that combine percentages across effectively different study sections with effectively different solutions to the spreading/meaning problem.

  • qaz says:

    PS. As a triage note, in my experience, most panelists check the scores and at least skim (1) the proposals assigned to them, (2) the proposals in their area, and (3) the proposals written by friends and enemies. That lets them participate in the discussion and bring proposals out of triage if they feel they need/want to.

    The other thing that quelled the willingness to bring grants out of triage has been the push to "only bring things out of triage if you think you might bring them into the fundable range". (Both chairs and SROs have said as such to me and others in the study sections I've been on.) Bringing something out of triage just to discuss it is seen as a definite faux pas. I think this stems in large part from the shift away from the view that from grant review should help a PI write a better grant or do better science and towards the view that the only purpose of grant review is to identify fundable grants.

    Personally, I don't think it was so horrible when we discussed all the grants. It cost an extra day, but I felt it was a better process.

  • old grantee says:

    Well, some people had predicted that after Scarpa's departure everything will return to the way it was before. It seems that little by little the status quo has been and will be fully restored.

  • DrugMonkey says:

    Given that it is easier for program to pull a grant out of order if it is scored versus triaged, qaz...

Leave a Reply