More Maddening "Instruction" for NIH Reviewers

Feb 04 2010 Published by under Grant Review, Peer Review

The Feb 2010 edition of the Office of Extramural Research Nexus contains yet more explanation of the way reviewers are supposed to use the "overall impact" scoring field.

What we call the "Overall Impact" of the application is the compilation of the evaluation of the review criteria. As reviewers assess Overall Impact, they take into account the individual review criteria and provide an overall evaluation of the likelihood for the project to exert a sustained, powerful influence on the research field(s) involved. The following points provide some clarification between Significance and Overall Impact:
Overall Impact
* Takes into consideration, but is distinct from, the core review criteria (significance, investigator(s), innovation, approach and environment).
* Is not an additional review criterion.
* Is not necessarily the arithmetic mean of the scores for the five scored review criteria.
* Is the synthesis/integration of the five core review criteria that are scored individually and the additional review criteria, which are not scored individually.

emphasis added but I could just bold the whole damn thing. This is supposed to help reviewers? More importantly (I presume and assert) it is supposed to help reviewers to act more consistently with each other?
This is exactly the sort of thing that drives me crazy about CSR and their complete and utter mess which is reviewer instruction.


This does not help me to resolve how this new approach is supposed to work. In fact, deciding to make this Overall Impact score detached from the individual scoring totally undercuts the point of the new system.
The way they keep harping about the Overall Impact communicates to me that this is no different whatsoever from the prior approach in which the Preliminary Score suggested by a reviewer was supposed to by a "synthesis/integration of the five core review criteria and the additional review criteria". All the hoopla about a new scoring approach was supposedly justified on the basis of re-orienting reviewer behavior away from obsession with the minutia of the experimental plan and towards Impact and Significance. Now they seem to be saying "yeah, never mind all that you are still free to express whatever balance you see fit". Whut?
Look. CSR. Can we talk? Reviewers, especially when just starting out, are happy to do the job you ask of them. They are looking to do a good job and they are very smart people who are able to follow instructions. If you give vague and often contradictory sources of information, they are left with their own biases of circumstance. Mostly having to do with how they have been treated by study sections over the initial years of their own grant seeking. Modified by what the geezertariat in their home department has to say about proper grant reviewing behavior. Leavened by random insanity. In a narrow subset of cases, properly educated on grant review approach because they are devotees of YHN.
This all adds up to variance. Which is great for some aspects of review- we want some diversity on the science part. But what we don't want is a lot of diversity over essentially irrelevant stuff like grantsmithing decisions over what to emphasize and what to not emphasize in the application. Over hard and fast StockCritique "rules" like "too many notes" "did not exhaustively review potential pitfalls".
Do you think they ever test whether their initiatives, instructions and what not actually change reviewer behavior?
[h/t: PhysioProf]

34 responses so far

  • I wonder what kind of perceived reviewer fuck-ups led to this "guidance" from CSR?

  • neurolover says:

    Is this an instruction for the reviewers, or the reviewees? It seems to me that someone was complaining because their "overall impact score" didn't make sense in light of the individual 5 scores, and this "explains." (basically by saying that the overall impact score contains some additional magic criterion).
    But, that explanation presumes that overall impact scores aren't a new thing.

  • That's a good point, neurolover. Maybe this was a response to applicant whining: "My criterion scores were all 2s and 3s, but I got a 41 final score!"

  • DrugMonkey says:

    could be neurolover. An entirely predictable outcome, let me just say. I was certainly predicting this was going to be a big new window for recipient whining about the individual scores not matching the outcome...
    It's as if these people at CSR never ever talk to anyone on the front lines of either review or being reviewed...

  • What confuses me is the dichotomy of Overall Impact in that it,"takes into consideration, but is distinct from, the core review criteria," but then, "is not an additional review criterion." (Actually, the rest of it that you wanted to bold also makes no sense to me.).
    Erm, if it is "distinct from" then isn't it an additional criterion?
    I grok the point that "Approach" should not be the major criterion for scoring but, sadly or rightly so, it still seems to be in practice. Currently, I find that Overall Impact is still used as the Summary. Nothing has really changed except for the pleading that reviewers not solely score based on Approach.
    The guidance is so non-committal and ambiguous that it provides no helpful information to either the reviewer or the applicant.

  • neurolover says:

    "It's as if these people at CSR never ever talk to anyone on the front lines of either review or being reviewed..."
    I think it's more fundamental than that. Reviewers are being asked to make rankings within the variability of their ability to make rankings (what is, that we've said that you can tell the top 25% or so, but not much detail beyond that?). If that's the case, the system is essentially going to be "unfair"/"biased"/"whatever you want to call it." Then, Reviewees are going to complain, because, essentially, for some, arbitrary decisions have been made. But, CSR has to have it all make sense, so they send out directives, assuring everyone it makes sense. No one's willing to just admit that the reviewing system cannot make the merit rankings to the decimal spaces required for the funding decisions.
    Can't remember what our blog owners think of this logic, but it's what I think. With college/undergraduate admissions, I actually think the system would be *improved* by randomizing the admissions after selecting the top 25% (i.e randomly pick the top 10% of applicants after limiting the pool to the top 25%). I'm not sure I'm willing to go that far with grant applications, 'cause I think that's where the bias might be set by programmatic directions (i.e. establishing new researchers, or new research directions, or buttressing old ones.). But, random might be OK, too.

  • whimple says:

    They should explain it as, "Overall Impact = How badly do you want this to get funded?"
    1 = Really, really badly. If this doesn't happen, everyone can just quit.
    2 = Definitely fund this.
    3 = Fund if there's enough money (which there usually won't be). Fund the resubmission.
    4 = Don't fund this, but encourage a resubmission.
    5 = Don't fund this, but I'd entertain the possibility of a resubmission.
    6 = Don't fund this. There is some potential, but learn how to write a grant before you come back.
    7 = Don't fund this. You suck.
    8 = Don't fund this. You really suck.
    9 = This is total garbage. Don't ever waste my time with anything from this person again.

  • neuropop says:

    "That's a good point, neurolover. Maybe this was a response to applicant whining: "My criterion scores were all 2s and 3s, but I got a 41 final score!""
    That's exactly what happened to my first submission

  • Anonymous says:

    Hey guys,
    I think that the explanatory documents provided by the OER are helpful in that they provide the basis for a common language and understanding. They might not be perfect but certainly don’t appear maddening to me.
    What about the “overall impact”?. Is it a maddening addition?. I think that we all use it, without perhaps being aware of it, in our everyday life. For example: “I just finished reading this paper”…”. I attended this Seminar”…., I heard the President State of the Union, so
    what’s my general assessment ?. How do I feel about it ?.
    A general assessment/overall impact reflects PRIORITIES and, I think, could be helpful in a selection process or selection setting (if the priorities are clearly defined). Reviewers come to review with their field specific scientific priorities, as well as those established by NIH. I assume that when a scientist is asked to review for NIH or generously offers herself to do so, there is an implicit and honest acceptance of NIH priorities. Personally, I think that this acceptance is critical to overcome our human natural tendency to be over guided by personal priorities and should help eliminate, to a certain degree, the somehow unavoidable subjectivity in evaluating/scoring a grant.
    (It is my understanding that the NIH establishes priorities/programmatic directions in consultation with the scientific community, diverse Advisory Committees and guided by Congressional mandates).
    Yes, I agree that the bottom line for overall impact is what wimple said. He/she said it just much better.

  • (1) The main point of the distinction between Overall Impact and the criterion scores is that each reviewer is being asked to use her own judgment in relative weighting of each of the criteria in arriving at an Overall Impact. Even for a given reviewer, this weighting could be different for each grant being reviewed.
    (2) When I was at one of the Enhancing Peer Review confabs being lead by Keith Yamamoto and that dentist dude (I forget his name), a very senior neuroscientist stood up and said that peer review should be used to identify the top 25% of grants, and then a lottery to decide which grants among that select group get funded. The room burst into applause. The problem with this idea is that it almost certainly violates Federal Law.

  • DrugMonkey says:

    Anonymous@#9-
    I think you are missing the point that the "Overall Impact" is precisely what was being generated as the reviewers' scores in the immediate past. The sub-scores are new.
    It is my contention that nothing new has been added, just additional confusion, by inventing these criterion scores. Keep in mind that reviewers were supposed to consider 5 key criteria before as well and [somehow, through an uninstructed mysterious black box] integrate their review of the criteria into a single priority score.

  • Solomon Rivlin says:

    How the hell do you expect a score on an outcome predicted for a project that has not been performed yet? Even worse, how the hell can you predict the impact? Reminds me of the banner "MISSION ACCOMPLISHED."

  • My understanding is that the separate criterion scoring was supposed to improve "guidance" from the study section to the applicant. In other words, it was supposed to reduce this old kind of applicant reaction: "All the reviewers said that I am an 'oustanding' investigator, my approaches are 'very well supported by preliminary data', and the hypotheses being tested are 'extremely interesting', but I only got a 183!?!? WTF!?!?"
    All it did, of course, is substitute a new kind of WTF!?!? for that old one.

  • DrugMonkey says:

    How the hell do you expect a score on an outcome predicted for a project that has not been performed yet? Even worse, how the hell can you predict the impact?
    See, we agree on some things Sol.
    Now why don't you tell us how you would like to structure grant review at the NIH. What should reviewers be looking at?

  • Solomon Rivlin says:

    DM,
    Let me try to illstrate to you why I hold that the whole NIH review system is crooked and there is really not a good way to review grant proposals on either there predicted outcome or impact.
    Think Mama Bell pre 1980; a monopoly on phones and phone service in the US. Think of those who argued that this monopoly must be broken, that more, smaller companies should get involved in the game, such that:
    1. Phones will be cheaper; people will be able to buy them rather than rent them from Mama Bell; phones will be smaller and lighter and of different colors; and a prediction that could be described as 'going on a limb,' phones could be wireless such that they will not be tied to their base with a long curely wire.
    2. phones will be wireless; phones will have their own memory, caller ID, call waiting, answering service; more then two phones will be able to converse between themselves; phones will travel with you anywhere in the world, while using satellites for communication; cell phone will have cameras, videocam, personal computers; add here anything your imagination can come up with.
    The NIH review process, of course, will approve and fund the proposal with the predictions listed in 1. It will completely reject those listed on 2 as outrageous, looney, absolutely undoable.

  • Scumlin, you fucking dumbshit, DrugMonkey isn't asking you for yet another sniveling whinefest "illustrating" how everything in the entire universe sucks infinitely except for you. We are all well aware of your megalomaniacal paranoid delusions by now.
    He is asking you to tell us HOW YOU WOULD STRUCTURE PEER REVIEW TO MAKE IT BETTER. Capisce?

  • DrugMonkey says:

    there is really not a good way to review grant proposals on either there predicted outcome or impact.
    To reiterate, we agree.
    Now what would you suggest as a strategy by which the NIH/CSR should review proposals and award research funds?

  • Namnezia says:

    Now what would you suggest as a strategy by which the NIH/CSR should review proposals and award research funds?

    How about having smaller grants be the norm, say 3/4 of the size of the original awards (still for 5 years) but then support a greater number of awards so that a much higher amount of proposals are funded. Then use the investigator's productivity during the last grant cycle as evidence of "return on investment" as a major indicator for renewing these grants. You know... spread the wealth. More grants would be funded and a more forgiving review system could be put in place.
    Or you could make the review process completely opaque like they do at the NSF where you have no idea who is scoring your grant and how they develop their rankings. Somehow I hear much less complaints about their review process. Having received funding from both sources, although more mysterious, it seems like the NSF is run more efficiently as far as reviews, and the program officers seem to have a lot more power in making funding decisions. But again this is just my perception, I may be totally off.

  • neurolover says:

    I prefer NSF's system, where the section actually ranks the applications in numerical order (and, the groupings actually fall into the categories whimple suggests (except with more G-rated language).
    I guess I'd go further than NSF, and not rank within the groups, and allow people to just say they can't tell the difference between the grants.
    Not sure what to do about how much the ranking/scoring is affected by the primary reviewers, and what their biases are about the kinds of work that should be done in broadly directed sections.

  • microfool says:

    I will suggest that this sort of language and policy can only be the result of design-by-committee, and almost certainly arose during the Enhancing Peer Review Process, which is probably why this is being communicated by OER, which contains within it an office for review policy and procedures.
    IIRC, the major feature of criterion scores was to provide numerical feedback on unscored applications. As CPP so aptly put:

    All it did, of course, is substitute a new kind of WTF!?!? for that old one

  • microfool says:

    categories whimple suggests (except with more G-rated language).

    .
    NO KIDDIN!! He used the F-word over and over again in the context of review!

  • qaz says:

    As someone who had only just come in to the old system recently when they threw this new 1-9 cr*p at us, I found my new reactions to the two systems to be very different WTFs. With the old system, my first reaction was all of the usual WTF, but as I learned to deal with it, it made more sense to me. On the other hand, with the new system, the more I see of the reviews (on both sides), the less it makes sense to me. It is more quantized, making it harder for committees to make fine judgments (the committee I was most recently on just put about 1/4 of the grants into the "3" category). But it gives the illusion of detail, by giving this numerical score. The simplest fix would be to drop the new scores as a failed experiment and go back. The better fix would be to truly divide the system into four categories "Must fund", "Fund if possible", "I want to see it again", and "Triage" (or "Trash"). After that, it's all up to program.
    PS. I actually like Sol's example of breaking up AT&T. It was broken up (grant was funded) because the applicant proposed that we would get a small, but significant advance. In fact, the applicant went off and made some wonderful breakthroughs. That's pretty much a description of the current grant process. You promise small, but significant, and then you go change the world. Partially, that's because if you told me you would have wireless phones that surf the web (what's the web? this is 1974.), they'd laugh at you. The problem is it's hard to tell the promises that "phones will surf the web" from "everyone will have jet-packs". But it's definitely convincing that "phones will be cheaper". And everyone in the process knows that in addition to "phones will be cheaper", it's likely lots of cool other stuff will happen too.

  • Solomon Rivlin says:

    "Now what would you suggest as a strategy by which the NIH/CSR should review proposals and award research funds?"
    DM,
    Since CPP's blood pressure is about to reach the danger zone, I'll not ellaborate, again, why the NIH review system that attempts to predict outcome and impact will never work. The whole idea of prediction is a casino idea, except, that the NIH casino is trying to take safe bets and still losing the house. I indicated numerous times before that funding of scientific research should be left to individual institutions. If the government seek to subsidize research, then, divide the subsidy equally among all research institutions based on several clear and logical criteria. Funding of applicable projects in the pre-medical and clinical arenas should be left for corporations. Universities can work with the pharmaceutical and medical corporations, as they do now to a certain extent, but on a much larger scale. Only when scientific research is both done and controlled by scientists, that we will see science fulfills its full potential. When politicians and administrators determine the direction, the scope, the focus and the predictability of scientific research, we find ourselves with a huge cadre of frustrated PDs, PIs and many other qualified scientists who running around like chickens without heads trying to figure out how to survive as scientists by fantasizing about the possible outcome and impact of research projects they themselves are not sure about.

  • singingscientist says:

    Getting back to the comment about what the scores really mean and what perceived reviewer transgression led to the change in the system, I wanted to make two points.
    The first is that the system can't really be fair because science is like fashion and some years one protein/system/disease/technique/etc. is in fashion and then suddenly it's not. Those who keep abreast of the science fashions do better. So how ever you tell reviewers to score, they will still be influenced by the pop culture of science. I think I am, too...
    The second is that somewhere in the 4-6 range of the scoring system is an alternative interpretation. It may be either "don't fund but I'd entertain a resubmission" OR "I think this is solid but I just can't make myself care."
    That's my two cents.

  • DrugMonkey says:

    You are not making sense Sol. You *hate* Uni administrators and assert they are all corrupt. Why put local administrators in charge of the $$?

  • If the government seek to subsidize research, then, divide the subsidy equally among all research institutions based on several clear and logical criteria.

    Shitlin, you blithering freak, what do you propose those "clear and logical criteria" would be, and who is going to apply them? Can you really truly be this fucking stupid?

  • Solomon Rivlin says:

    DM,
    The corruption of administrators is a direct outcome of the luxury deals they receive, as if they are the ones bringing in the dough from the NIH and other sources. In my university, and I'm sure it is similar in others, the dean of the medical school who preceeded the last one, came from a big name U in NY in 1997 and the contract he signed with the university included beside his $320,000 salary, a bonus of up to $450,000 (a lot of dough 12 years ago), which was described as 3% of IDC of extramural grants. Can you believe it? For every grant a faculty member was awarded, 3% of the IDC went to the bank account of the dean, up to $450,000. Yes, the dean, an asshole who never, not even once, had a meeting with the medical students during his 5-year stint. The dean who everyone wanted to get rid off for his poor performance, was instead promoted to a Chancelor of the medical school (a new postion that was created especially for him) and the vice president of health affairs. Only by isolating him completly in his new position and leaving him out of any major decisions, did he get the message and left, but not before he signed an agreement that his salary would continued to be paid until the end of the fiscal year (a total of 10 months of salary for which he wasn't even present at the university. You should see the list of applicants for the vacated job of dean. They came from all the big name universities around the country. They did not care that the university is a second tier one, or maybe even third tier, they all wanted the fat, big check. Money corrupt people and scientists are not immuned. Administrators should work for and on behalf of the scientists and their salaries should never be as high or higher than the people which without, they wouldn't have a job. The system yoday is upside down, where the administrators are kings and scientists-mentors-teachers are peons.
    Oh, BTW, that dean was not the first choice of the faculty members and the search committee, but he was the first choice of the university president.

  • Shtlin, you fuck-up, do you even know how to read? In what universe could you possibly interpret DrugMonkey's comment as a solicitation for your bajillionth logorrheic explosion about how horrible university administrators are and how the ones at your former institution are particularly horrible. Everyone already knows all about your paranoid fantasies.
    Now answer the affirmative questions you have been asked: (1) Why put local administrators in charge of the $$? (2) What do you propose those "clear and logical criteria" for allocating research funds to universities would be, and who is going to apply them? (3) Can you really truly be this fucking stupid?

  • Solomon Rivlin says:

    CPP, I thought that by now you would listen to some of Mozart's music to help you relax.
    If the government is going to subsidize scientific research based on criteria such as size of student body, number of research scientists or any other criteria that would make such subsidy equitable, there will be no insentive to hire pompous ass administrators who claim that thank to them the NIH money is coming to their university. Thus, when universities know in advance how much money the government send their way, only administrators that are necessary to run the university business will be hired. All those that are now fisting around the bowl of the NIH money will be gone. Are you that stupid, CPP, that you could not understand these simple axioms from my previous two comments?

  • DrugMonkey says:

    Ok, so we have federal disbursement based on the number of undergrads, grads, PIs or some such. Leaving the details of why this wouldn't encourage, say, more expansion of sham "faculty" appointments to boost federal \( rank aside for the moment...
    at the local level, who decides how the \)
    should be allocated? equal $$ to all PIs regardless of type of research? w00t Good times for the undergrads-hitting-a-button cognitive psychologists!!!!!!!!!!
    sorry, no more gene sequencing eleventy shit though. bye, bye fMRI. probably can't afford mouse colonies either... higher animals? fugeddaboutit. biocontainment? pass.
    we'll end up biasing for dudes who mostly sit around pontificating and don't really do much in the way of money burning experimental science....

  • Solomon Rivlin says:

    DM, I see that you have great confidence in your colleagues. In essence, you're saying that you prefer corrupted administrators over corrupted scientists.

  • qaz says:

    Sol #31 - Administrators don't make decisions about who gets funded. Currently, scientists do. That's what study section is all about. At NIH, program actually has very little(*) power about who gets funded. At many institutes, the percentile scores pretty much define who gets funded. But you know all that. Instead of letting this debate rant into who's "corrupted" and who's "honest", how about we try to figure out how to help the system get the best money to the best scientists to do the best work. Your recommendation sounds like a recipe for corruption, putting a lot of money in the hands of administrators. At least in the current system, the money (direct costs at least) go pretty much into the labs of scientists.
    * not zero, I know.
    The problem with laying out explicit \(/student or \)/faculty member is that it makes it clear how to game the system.
    PS. The fact that your university was stupid enough to fall for some administrator claiming that he would get more NIH money should not be taken as an implication that the system is corrupt. I recommend that you move to a different university where they are not stupid enough to offer a dean 3% of the indirect costs.

  • Joe says:

    C'mon, DM, is this so hard to figure out? The NIH intramural/administrative budget ballooned over the past 15 years. In order for NIH staff to justify themselves, their doughnut- and coffee-fueled meetings, and the bloated NIH administrative budgets in general, they must produce statements/documents/policies/guidelines.
    The instructions you lament are basically the poo of obese bureaucracy.

  • revere says:

    I'm getting ready to be reviewed, not to review, so I've paid a lot of attention to this. And frankly, I don't find it that confusing. The only score that gets reported at the meeting is the Overall Impact score. The point of the scored criteria is that they explicitly signal what kinds of things a reviewer is supposed to take into account. If the whole is greater than the sum of its parts, then the Overall Impact score can be greater than any of the scored criteria. That makes sense to me.
    More problematic, I think, are the page limits which put a heavier burden on reviewers because there is less explanation and gives an advantage to geezers like me; and the single digit score, which is more difficult for reviewers.

Leave a Reply