Archive for the 'Fixing the NIH' category

Senator Murray and Representative DeLauro Want to Know What NIH Is Doing About Sexual Harassment

Readers of this blog will not need too much reminder that sexual harassment and sex-based workplace discrimination are very much a problem in academic science. We have seen numerous cases of this sort of academic misconduct reach the national and sometimes international press in the past several years. Indeed, recent discussions on this blog have mentioned the cases of Thomas Jessell and Inder Verma as well as three cases at Dartmouth College.

In these cases, and ones of scientific fraud, I and others have expressed frustration that the NIH does not appear to use what we see as its considerable power of the purse and bully pulpit to discourage future misconduct. My view is that since NIH award is a privilege and not a right, the NIH could do a lot to help their recipient institutions see that taking cases of misconduct more seriously is in their (the recipient institution's) best interest. They could pull the grants associated with any PI who has been convicted of misconduct, instead of allowing the University to appoint a replacement PI. They could refuse to make any new awards or, less dramatically, make any exception pickups if they aren't happy with the way the University has been dealing with misconduct. They could focus on training grants or F-mech fellowships if they see a particular problem in the treatment of trainees. Etc. Lots of room to work since the NIH decides all the time to fund this grant and not that grant for reasons other than the strict order of review.

Well, two Democratic members of Congress have sent a letter (PDF) to NIH Director Francis Collins gently requesting* information on how NIH is addressing sexual harassment in the workplace. And the overall message is in line with the above belief that NIH can and should play a more active role in addressing sexual misconduct and harassment.

As pointed out in a Mike the Mad Biologist's post on this letter, these two Congresspeople have a lot of potential power if the Democrats return to the majority.

are ranking members of committees that oversee NIH funding–and if the Democrats take back the House or Senate, would be the leaders of those committees.

One presumes that the NIH will be motivated to take this seriously and offer up some significant response. Hopefully they can do this by what seems a rather optimistic deadline of 8/17/2018, given the letter was dated 8/06/2018.

The first 6 listed items to which NIH is being asked to response seem mostly to do with the workings of Intramural NIH, both Program and the IRP. Those are of less interest as a dramatic change, important as they are.

Most importantly, the letter puts the NIH squarely on the hook for the way that it ensures that the extramural awardee institutions are behaving. Perhaps obviously, the power of NIH to oversee issues of harassment at all of the Universities, Institutes and companies that they fund is limited. The main point of justification in this letter is the NOT-OD-15-152: Civil Rights Protections in NIH-Supported Research, Programs, Conferences and Other Activities.

To give you a flavor:

Federal civil rights laws prohibit discrimination on the basis of race, color, national origin, disability, and age in all programs and activities that receive Federal financial assistance, and prohibit discrimination on the basis of sex in educational programs or activities conducted by colleges and universities. These protections apply in all settings where research, educational programs, conferences, and other activities are supported by NIH, and apply to all mechanisms of support (i.e., grant awards, contracts and cooperative agreements). The civil rights laws protect NIH-supported investigators, students, fellows, postdocs, participants in research, and other individuals involved in activities supported by NIH.

The notice then goes on to list several specific statutes, some of which are referenced in footnotes to the letter.
The Murray/DeLauro letter concentrates on the obligation recipient institutions have to file an Assurance of Compliance with the Health and Human Services (NIH's parent organization) Office of Civil Rights and the degree to which NIH exercises oversight on these Assurances.

I think the motivations of Senatory Murray and Rep DeLauro are on full display in this passage (emphasis added).

"It therefore appears that NIH's only role...is confirming...institution has signed, dated, and mailed the compliance document....

This lack of engagement from NIH is particularly unacceptable in light of disturbing news reports that cases of sexual harassment in the academic sciences often involve high profile faculty offenders whose behavior is considered an 'open secret'.

...colleagues may have warned new faculty and students.....but institutions themselves take little to no action."

It is on.

__
*demanding

8 responses so far

When NIH uses affirmative action to fix a bias

Jul 20 2018 Published by under Anger, Fixing the NIH, NIH, NIH Careerism

We have just learned that in addition to the bias against black PIs when they try to get research funding (Ginther et al., 2011), Asian-American and African-American K99 applicants are also at a disadvantage. These issues trigger my usual remarks about how NIH has handled observed disparities in the past. In the spirit of pictures being worth more than words we can look up the latest update on success rates for RPG (a laundry list of research grant support mechanisms) broken down by two key factors.
First up is the success rate by the gender of the PI. As you can see very clearly, something changed in 2003. All of a sudden a sustained advantage for men disappeared. Actually two things happened. This disparity was "fixed" and the year after success rates went in the tank for everyone. There are a couple of important observations. The NIH didn't suddenly fix whatever was going on in study section, I guaranfrickentee it. I guarantee there were not also any magic changes in the pipeline or female PI pool or anything else. I guarantee you that the NIH decided to equalize success rates by heavy handed top-down affirmative action policies in the nature of "make it so" and "fix this". I do not recall ever seeing anything formal so, hey, I could be way off base. If so, I look forward to any citation of information showing change in the way they do business that coincided directly with the grants submitted for the FY2003 rounds.
The second thing to notice here is that women's success rates never exceeded that for men. Not for fifteen straight Fiscal Years. This further supports my hypothesis that the bias hasn't been fixed in some fundamental way. If it had been fixed, this would be random from year to year, correct? Sometimes the women's rates would sneak above the men's rates. That never happens. Because of course when we redress a bias, it can only ever just barely reach statistically indistinguishable parity and if god forbid the previously privileged class suffers even the tiniest little bit of disadvantage it is an outrage.
Finally, the fact that success rates went in the tanker in 2004 should remind you that men enjoyed the advantage all during the great NIH doubling! The salad days. Lots of money available and STILL it was being disproportionately sucked up by the advantaged group. You might think that when there is an interval of largesse that systems would be more generous. Good time to slip a little extra to women, underrepresented individuals or the youth, right? Ha.

Which brings me to the fate of first-time investigators versus established investigators. Oh look, the never-funded were instantly brought up to parity in 2007. In this case a few years after the post-doubling success rates went in the toilet but more or less the same pattern. Including the failure of the statistically indistiguishable success rates for the first timers to ever, in 11 straight years of funding, to exceed the rates for established investigators. Because of affirmative action instead of fixing the bias. As you will recall, the head of the NIH at that time made it very clear that he was using "make it so" top-down heavy handed quota based affirmative action to accomplish this goal.

Zerhouni created special awards for young scientists but concluded that wasn't enough. In 2007, he set a target of funding 1500 new-investigator R01s, based on the previous 5 years' average.

Some program directors grumbled at first, NIH officials say, but came on board when NIH noticed a change in behavior by peer reviewers. Told about the quotas, study sections began “punishing the young investigators with bad scores,” says Zerhouni.

"quotas".

I do not recall much in the way of discussing the "pipelines" and how we couldn't possible do anything to change the bias of study sections until a new, larger and/or better class of female or not-previously-funded investigators could be trained up. The NIH just fixed it. ish. permanently.

For FY2017 there were 16,954 applications with women PIs. 3,186 awards. If you take the ~3% gap from the interval prior to 2003, this means that the NIH is picking up some 508 research project grants from women PIs via their affirmative action process. Per year. If you apply the ~6% deficit enjoyed by first time investigators in the salad days you end up with 586 research project grants picked up by affirmative action. Now there will be some overlap of these populations. Women are PI of about 31% of applications in the data for the first graph and first timers are about 35% for the second. So very roughly women might be 181 of the affirmative action newbie apps and newbies might be 178 of the affirmative action women's apps. The estimates are close. So let's say something like 913 unique grants are picked up by the NIH just for these two overt affirmative action purposes. Each and every Fiscal Year.

Because of the fact that, for example, African-American PIs of research grants or K99 apps represent such tiny percentages of the total (2% in both cases), the number of pickups that would be necessary to equalize success rate disparities is tiny. In the K99 analysis, it was a mere 23 applications across a decade. Two per year. I don't have research grant numbers handy but if we use the data underlying the first graph, this means there were about 1,080 applications with African-American PIs in FY2017. If they hit the 19% success rate this would be about 205 applications. Ginther reported about a 13% success rate deficit, working out to 55% of the success rate enjoyed by white applicants at the time. This would correspond to a 10.5% success rate for black applicants now, or about 113 application. So 92 would be needed to make up the difference for African-American PIs assuming the Ginther disparity still holds. This would be less than one percent of the awards made.

Less than one percent. And keep in mind these are not gifts. These are making up for a screwjob. These are making up for the bias. If any applicants from male, established or white populations go unfunded to redress the bias, they are only losing their unearned advantage. Not being disadvantaged.

28 responses so far

Racial Disparity in K99 Awards and R00 Transitions

Oh, what a shocker.

In the wake of the 2011 Ginther finding [see archives on Ginther if you have been living under a rock] that there was a significant racial bias in NIH grant review, the concrete response of the NIH was to blame the pipeline. Their only real dollar, funded initiatives were to attempt to get more African-American trainees into the science pipeline. The obvious subtext here was that the current PIs, against whom the grant review bias was defined, must be the problem, not the victim. Right? If you spend all your time insisting that since there were not red-fanged, white-hooded peer reviewers overtly proclaiming their hate for black people that peer review can't be the problem, and you put your tepid money initiatives into scraping up more trainees of color, you are saying the current black PIs deserve their fate. Current example: NIGMS trying to transition more underrepresented individuals into faculty ranks, rather than funding the ones that already exist.

Well, we have some news. The Rescuing Biomedical Research blog has a new post up on Examining the distribution of K99/R00 awards by race authored by Chris Pickett.

It reviews success rates of K99 applicants from 2007-2017. Application PI demographics broke down to nearly 2/3 White, ~1/3 Asian, 2% multiracial and 2% black. Success rates: White, 31%, Multiracial, 30.7%, Asian, 26.7%, Black, 16.2%. Conversion to R00 phase rates: White, 80%, Multiracial, 77%, Asian, 76%, Black, 60%.

In terms of Hispanic ethnicity, 26.9% success for K99 and 77% conversion rate, neither significantly different from the nonHispanic rates.

Of course, seeing as how the RBR people are the VerySeriousPeople considering the future of biomedical careers (sorry Jeremy Berg but you hang with these people), the Discussion is the usual throwing up of hands and excuse making.

"The source of this bias is not clear...". " an analysis ...could address". "There are several potential explanations for these data".

and of course
"put the onus on universities"

No. Heeeeeeyyyyyuuullll no. The onus is on the NIH. They are the ones with the problem.

And, as per usual, the fix is extraordinarily simple. As I repeatedly observe in the context of the Ginther finding, the NIH responded to a perception of a disparity in the funding of new investigators with immediate heavy handed top-down quota based affirmative action for many applications from ESI investigators. And now we have Round2 where they are inventing up new quota based affirmative action policies for the second round of funding for these self-same applicants. Note well: the statistical beneficiaries of ESI affirmative action polices are white investigators.

The number of K99 applications from black candidates was 154 over 10 years. 25 of these were funded. To bring this up to the success rate enjoyed by white applicants, the NIH need only have funded 23 more K99s. Across 28 Institutes and Centers. Across 10 years, aka 30 funding cycles. One more per IC per decade to fix the disparity. Fixing the Asian bias would be a little steeper, they'd need to fund another 97, let's round that to 10 per year. Across all 28 ICs.

Now that they know about this, just as with Ginther, the fix is duck soup. The Director pulls each IC Director aside in quiet moment and says 'fix this'. That's it. That's all that would be required. And the Directors just commit to pick up one more Asian application every year or so and one more black application every, checks notes, decade and this is fixed.

This is what makes the NIH response to all of this so damn disturbing. It's rounding error. They pick up grants all the time for reasons way more biased and disturbing than this. Saving a BSD lab that allegedly ran out of funding. Handing out under the table Administrative Supplements for gawd knows what random purpose. Prioritizing the F32 applications from some labs over others. Ditto the K99 apps.

They just need to apply their usual set of glad handing biases to redress this systematic problem with the review and funding of people of color.

And they steadfastly refuse to do so.

For this one specific area of declared Programmatic interest.

When they pick up many, many more grants out of order of review for all their other varied Programmatic interests.

You* have to wonder why.
__
h/t @biochembelle

*and those people you are trying to lure into the pipeline, NIH? They are also wondering why they should join a rigged game like this one.

13 responses so far

NIH Ginther Fail: Do the ersatz reviews recapitulate the original reviews?

A bit in Science authored by Jocelyn Kaiser recently covered the preprint posted by Forscher and colleagues which describes a study of bias NIH grant review. I was struck by a response Kaiser obtained from one of the authors on the question of range restriction.

Some have also questioned Devine’s decision to use only funded proposals, saying it fails to explore whether reviewers might show bias when judging lower quality proposals. But she and Forscher point out that half of the 48 proposals were initial submissions that were relatively weak in quality and only received funding after revisions, including four that were of too low quality to be scored.

They really don't seem to understand NIH grant review where about half of all proposals are "too low quality to be scored". Their inclusion of only 8% ND applications simply doesn't cut it. Thinking about this, however, motivated me to go back to the preprint, follow some links to associated data and download the excel file with the original grant scores listed.

I do still think they are missing a key point about restriction of range. It isn't, much as they would like to think, only about the score. The score on a given round is a value with considerable error, as the group itself described in a prior publication in which the same grant reviewed in different ersatz study sections ended up with a different score. If there is a central tendency for true grant score, which we might approach with dozens of reviews of the same application, then sometimes any given score is going to be too good, and sometimes too bad, as an estimate of the central tendency. Which means that on a second review, the score for the former are going to tend to get worse and the scores for the latter are going to tend to get better. The authors only selected the ones that tended to get better for inclusion (i.e., the ones that reached funding on revision).

Anther way of getting at this is to imagine two grants which get the same score in a given review round. One is kinda meh, with mostly reasonable approaches and methods from a pretty good PI with a decent reputation. The other grant is really exciting, but with some ill considered methodological flaws and a missing bit of preliminary data. Each one comes back in revision with the former merely shined up a bit and the latter with awesome new preliminary data and the methods fixed. The meh one goes backward (enraging the PI who "did everything the panel requested") and the exciting one is now in the fundable range.

The authors have made the mistake of thinking that grants that are discussed, but get the same score well outside the range of funding, are the same in terms of true quality. I would argue that the fact that the "low quality" ones they used were revisable into the fundable range makes them different from the similar scoring applications that did not eventually win funding.

In thinking about this, I came to realize another key bit of positive control data that the authors could provide to enhance our confidence in their study. I scanned through the preprint again and was unable to find any mention of them comparing the original scores of the proposals with the values that came out of their study. Was there a tight correlation? Was it equivalently tight across all of their PI name manipulations? To what extent did the new scores confirm the original funded, low quality and ND outcomes?

This would be key to at least partially counter my points about the range of applications that were included in this study. If the test reviewer subjects found the best original scored grants to be top quality, and the worst to be the worst, independent of PI name then this might help to reassure us that the true quality range within the discussed half was reasonably represented. If, however, the test subjects often reviewed the original top grants lower and the lower grants higher, this would reinforce my contention that the range of the central tendencies for the quality of the grant applications was narrow.

So how about it, Forscher et al? How about showing us the scores from your experiment for each application by PI designation along with the original scores?
__
Patrick Forscher William Cox Markus Brauer Patricia Devine, No race or gender bias in a randomized experiment of NIH R01 grant reviews. Created on: May 25, 2018 | Last edited: May 25, 2018; posted on PsyArXiv

3 responses so far

NIH Ginther Fail: This is not anything like real grant review

May 31 2018 Published by under Fixing the NIH, NIH, Underrepresented Groups

I recently discussed some of the problems with a new pre-print by Forscher and colleagues describing a study which purports to evaluate bias in the peer review of NIH grants.

One thing that I figured out today is that the team that is funded under the grant which supported the Forscher et al study also produced a prior paper that I already discussed. That prior discussion focused on the use of only funded grants to evaluate peer review behavior, and the corresponding problems of a restricted range. The conclusion of this prior paper was that reviewers didn't agree with each other in the evaluation of the same grant. This, in retrospect, also seems to be a design that was intended to fail. In that instance designed to fail to find correspondence between reviewers, just as the Forscher study seems constructed to fail to find evidence of bias.

I am working up a real distaste for the "Transformative" research project (R01 GM111002; 9/2013-6/2018) funded to PIs M. Carnes and P. Devine that is titled EXPLORING THE SCIENCE OF SCIENTIFIC REVIEW. This project is funded to the tune of $465,804 in direct costs in the final year and reached as high as $614,398 direct in year 3. We can, I think, fairly demand a high standard for the resulting science. I do not think this team is meeting a high standard.

One of the papers (Pier et al 2017) produced by this project discusses the role of study section discussion in revising/calibrating initial scoring.

Results suggest that although reviewers within a single panel agree more following collaborative discussion, different panels agree less after discussion, and Score Calibration Talk plays a pivotal role in scoring variability during peer review.

So they know. They know that scores change through discussion and they know that a given set of applications can go in somewhat different directions based on who is reviewing. They know that scores can change depending on what other ersatz panel members are included and perhaps depending on how the total number of grants are distributed to reviewers in those panels. The study described in the Forscher pre-print did not convene panels:

Reviewers were told we would schedule a conference call to discuss the proposals with other reviewers. No conference call would actually occur; we informed the prospective reviewers of this call to better match the actual NIH review process.

Brauer is an overlapping co-author. The senior author on the Forscher study is Co-PI, along with the senior author of the Pier et al. papers, on the grant that funds this work. The Pier et al 2017 Res Eval paper shows that they know full well that study section discussion is necessary to "better match the actual NIH review process". Their paper shows that study section discussion does so in part by getting better agreement on the merits of a particular proposal across the individuals doing the reviewing (within a given panel). By extension, not including any study section type discussion is guaranteed to result in a more variable assessment. To throw noise into the data. Which has a tendency to make it more likely that a study will arrive at a null result, as the Forscher et al study did.

These investigators also know that the grant load for NIH reviewers is not typically three applications, as was used in the study described in the Forscher pre-print. From Pier et al 2017 again:

We further learned that although a reviewer may be assigned 9–10 applications for a standing study section, ad hoc panels or SEPs can receive assignments as low as 5–6 applications; thus, the SRO assigned each reviewer to evaluate six applications based on their scientific expertise, as we believed a reviewer load on the low end of what is typical would increase the likelihood of study participation.

I believe that the reviewer load is critically important if you are trying to mimic the way scores are decided by the NIH review process. The reason is that while several NIH documents and reviewer guides pay lipservice to the idea that the review of each grant proposal is objective, the simple truth is that review is comparative.

Grant applications are scored on a 1-9 scale with descriptors ranging from Exceptional (1) to Very Good (4) to Poor (9). On an objective basis, I and many other experienced NIH grant reviewers argue, the distribution of NIH grant applications (all of them) is not flat. There is a very large peak around the Excellent to Very Good (i.e., 3-4) range, in my humble estimation. And if you are familiar with review you will know that there is a pronounced tendency of reviewers, unchecked, to stack their reviews around this range. They do it within reviewer and they do it as a panel. This is why the SRO (and Chair, occasionally) spends so much time before the meeting exhorting the panel members to spread their scores. To flatten the objective distribution of merit into a more linear set of scores. To, in essence, let a competitive ranking procedure sneak into this supposedly objective and non-comparative process.

Many experienced reviewers understand why this is being asked of them, endorse it as necessary (at the least) and can do a fair job of score spreading*.

The fewer grants a reviewer has on the immediate assignment pile, the less distance there need be across this pile. If you have only three grants and score them 2, 3 and 4, well hey, scores spread. If, however, you have a pile of 6 grants and score them 2, 3, 3, 3, 4, 4 (which is very likely the objective distribution) then you are quite obviously not spreading your scores enough. So what to do? Well, for some reason actual NIH grant reviewers are really loathe to throw down a 1. So 2 is the top mark. Gotta spread the rest. Ok, how about 2, 3, 3...er 4 I mean. Then 4, 4...shit. 4, 5 and oh 6 seems really mean so another 5. Ok. 2, 3, 4, 4, 5, 5. phew. Scores spread, particularly around the key window that is going to make the SRO go ballistic.

Wait, what's that? Why are reviewers working so hard around the 2-4 zone and care less about 5+? Well, surprise surprise that is the place** where it gets serious between probably fund, maybe fund and no way, no how fund. And reviewers are pretty sensitive to that**, even if they do not know precisely what score will mean funded / not funded for any specific application.

That little spreading exercise was for a six grant load. Now imagine throwing three more applications into this mix for the more typical reviewer load.

For today, it is not important to discuss how a reviewer decides one grant comes before the other or that perhaps two grants really do deserve the same score. The point is that grants are assessed against each other. In the individual reviewer's stack and to some extent across the entire study section. And it matters how many applications the reviewer has to review. This affects that reviewer's pre-discussion calibration of scores.

Read phase, after the initial scores are nominated and before the study section meets, is another place where re-calibration of scores happens. (I'm not sure if they included that part in the Pier et al studies, it isn't explicitly mentioned so presumably not?)

If the Forscher study only gave reviewers three grants to review, and did not do the usual exhortation to spread scores, this is a serious flaw. Another serious and I would say fatal flaw in the design. The tendency of real reviewers is to score more compactly. This is, presumably, enhanced by the selection of grants that were funded (either on the version that used or in revision) which we might think would at least cut off the tail of really bad proposals. The ranges will be from 2-4*** instead of 2-5 or 6. Of course this will obscure differences between grants, making it much much more likely that no effect of sex or ethnicity (the subject of the Forscher et al study) of the PI would emerge.

__
Elizabeth L. Pier, Markus Brauer, Amarette Filut, Anna Kaatz, Joshua Raclaw, Mitchell J. Nathan, Cecilia E. Ford and Molly Carnes, Low agreement among reviewers evaluating the same NIH grant applications. 2018, PNAS: published ahead of print March 5, 2018, https://doi.org/10.1073/pnas.1714379115

Elizabeth L. Pier, Joshua Raclaw, Anna Kaatz, Markus Brauer,Molly Carnes, Mitchell J. Nathan and Cecilia E. Ford. ‘Your comments are meaner than your score’: score calibration talk influences intra- and inter-panel variability during scientific grant peer review, Res Eval. 2017 Jan; 26(1): 1–14. Published online 2017 Feb 14. doi: 10.1093/reseval/rvw025

Patrick Forscher, William Cox, Markus Brauer, and Patricia Devine. No race or gender bias in a randomized experiment of NIH R01 grant reviews. Created on: May 25, 2018 | Last edited: May 25, 2018 https://psyarxiv.com/r2xvb/

*I have related before that when YHN was empanled on a study section he practiced a radical version of score spreading. Initial initial scores for his pile were tagged to the extreme ends of the permissible scores (this was under the old system) and even intervals within that were used to place the grants in his pile.

**as are SROs. I cannot imagine a SRO ever getting on your case to spread scores for a pile that comes in at 2, 3, 4, 5, 7, 7, 7, 7, 7.

***Study sections vary a lot in their precise calibration of where the hot zone is and how far apart scores are spread. This is why the more important funding criterion is the percentile, which attempts to adjust for such study section differences. This is the long way of saying I'm not encouraging comments naggling over these specific examples. The point should stand regardless of your pet study sections' calibration points.

10 responses so far

NIH Ginther Fail: A transformative research project

May 29 2018 Published by under Fixing the NIH, NIH, Underrepresented Groups

In August of 2011 the Ginther et al. paper published in Science let us know that African-American PIs were disadvantaged in the competition for NIH awards. There was an overall success rate disparity identified as well as a related necessity of funded PIs to revise their proposals more frequently to become funded.

Both of these have significant consequences for what science gets done and how careers unfold.

I have been very unhappy with the NIH response to this finding.

I have recently become aware of a "Transformative" research project (R01 GM111002; 9/2013-6/2018) funded to PIs M. Carnes and P. Devine that is titled EXPLORING THE SCIENCE OF SCIENTIFIC REVIEW. From the description/abstract:

Unexplained disparities in R01 funding outcomes by race and gender have raised concern about bias in NIH peer review. This Transformative R01 will examine if and how implicit (i.e., unintentional) bias might occur in R01 peer review... Specific Aim #2. Determine whether investigator race, gender, or institution causally influences the review of identical proposals. We will conduct a randomized, controlled study in which we manipulate characteristics of a grant principal investigator (PI) to assess their influence on grant review outcomes...The potential impact is threefold; this research will 1) discover whether certain forms of cognitive bias are or are not consequential in R01 peer review... the results of our research could set the stage for transformation in peer review throughout NIH.

It could not be any clearer that this project is a direct NIH response to the Ginther result. So it is fully and completely appropriate to view any resulting studies in this context. (Just to get this out of the way.)

I became aware of this study through a Twitter mention of a pre-print that has been posted on PsyArXiv. The version I have read is:

No race or gender bias in a randomized experiment of NIH R01 grant reviews. Patrick Forscher William Cox Markus Brauer Patricia Devine Created on: May 25, 2018 | Last edited: May 25, 2018

The senior author is one of the Multi-PI on the aforementioned funded research project and the pre-print makes this even clearer with a statement.

Funding: This research was supported by 5R01GM111002-02, awarded to the last author.

So while yes, the NIH does not dictate the conduct of research under awards that it makes, this effort can be fairly considered part of the NIH response to Ginther. As you can see from comparing the abstract of the funded grant to the pre-print study there is every reason to assume the nature of the study as conducted was actually spelled out in some detail in the grant proposal. Which the NIH selected for funding, apparently with some extra consideration*.

There are many, many, many things wrong with the study as depicted in the pre-print. It is going to take me more than one blog post to get through them all. So consider none of these to be complete. I may also repeat myself on certain aspects.

First up today is the part of the experimental design that was intended to create the impression in the minds of the reviewers that a given application had a PI of certain key characteristics, namely on the spectra of sex (male versus female) and ethnicity (African-American versus Irish-American). This, I will note, is a tried and true design feature for some very useful prior findings. Change the author names to initials and you can reduce apparent sex-based bias in the review of papers. Change the author names to African-American sounding ones and you can change the opinion of the quality of legal briefs. Change sex, apparent ethnicity of the name on job resumes and you can change the proportion called for further interviewing. Etc. You know the literature. I am not objecting to the approach, it is a good one, but I am objecting to its application to NIH grant review and the way they applied it.

The problem with application of this to NIH Grant review is that the Investigator(s) is such a key component of review. It is one of five allegedly co-equal review criteria and the grant proposals include a specific document (Biosketch) which is very detailed about a specific individual and their contributions to science. This differs tremendously from the job of evaluating a legal brief. It varies tremendously from reviewing a large stack of resumes submitted in response to a fairly generic job. It even differs from the job of reviewing a manuscript submitted for potential publication. NIH grant review specifically demands an assessment of the PI in question.

What this means is that it is really difficult to fake the PI and have success in your design. Success absolutely requires that the reviewers who are the subjects in the study both fail to detect the deception and genuinely develop a belief that the PI has the characteristics intended by the manipulation (i.e., man versus woman and black versus white). The authors recognized this, as we see from page 4 of the pre-print:

To avoid arousing suspicion as to the purpose of the study, no reviewer was asked to evaluate more than one proposal written by a non-White-male PI.

They understand that suspicion as to the purpose of the study is deadly to the outcome.

So how did they attempt to manipulate the reviewer's percept of the PI?

Selecting names that connote identities. We manipulated PI identity by assigning proposals names from which race and sex can be inferred 11,12. We chose the names by consulting tables compiled by Bertrand and Mullainathan 11. Bertrand and Mullainathan compiled the male and female first names that were most commonly associated with Black and White babies born in Massachusetts between 1974 and 1979. A person born in the 1970s would now be in their 40s, which we reasoned was a plausible age for a current Principal Investigator. Bertrand and Mullainathan also asked 30 people to categorize the names as “White”, “African American”, “Other”, or “Cannot tell”. We selected first names from their project that were both associated with and perceived as the race in question (i.e., >60 odds of being associated with the race in question; categorized as the race in question more than 90% of the time). We selected six White male first names (Matthew, Greg, Jay, Brett, Todd, Brad) and three first names for each of the White female (Anne, Laurie, Kristin), Black male (Darnell, Jamal, Tyrone), and Black female (Latoya, Tanisha, Latonya) categories. We also chose nine White last names (Walsh, Baker, Murray, Murphy, O’Brian, McCarthy, Kelly, Ryan, Sullivan) and three Black last names (Jackson, Robinson, Washington) from Bertrand and Mullainathan’s lists. Our grant proposals spanned 12 specific areas of science; each of the 12 scientific topic areas shared a common set of White male, White female, Black male, and Black female names. First names and last names were paired together pseudo-randomly, with the constraints that (1) any given combination of first and last names never occurred more than twice across the 12 scientific topic areas used for the study, and (2) the combination did not duplicate the name of a famous person (i.e., “Latoya Jackson” never appeared as a PI name).

So basically the equivalent of blackface. They selected some highly stereotypical "black" first names and some "white" surnames which are almost all Irish (hence my comment above about Irish-American ethnicity instead of Caucasian-American. This also needs some exploring.).

Sorry, but for me this heightens concern that reviewers deduce what they are up to. Right? Each reviewer had only three grants (which is a problem for another post) and at least one of them practically screams in neon lights "THIS PI IS BLACK! DID WE MENTION BLACK? LIKE REALLY REALLY BLACK!". As we all know, there are not 33% of applications to the NIH from African-American investigators. Any experienced reviewer would be at risk of noticing something is a bit off. The authors say nay.

A skeptic of our findings might put forward two criticisms: .. As for the second criticism, we put in place careful procedures to screen out reviewers who may have detected our manipulation, and our results were highly robust even to the most conservative of reviewer exclusion criteria.

As far as I can tell their "careful procedures" included only:

We eliminated from analysis 34 of these reviewers who either mentioned that they learned that one of the named personnel was fictitious or who mentioned that they looked up a paper from a PI biosketch, and who were therefore likely to learn that PI names were fictitious.

"who mentioned".

There was some debriefing which included:

reviewers completed a short survey including a yes-or-no question about whether they had used outside resources. If they reported “yes”, they were prompted to elaborate about what resources they used in a free response box. Contrary to their instructions, 139 reviewers mentioned that they used PubMed or read articles relevant to their assigned proposals. We eliminated the 34 reviewers who either mentioned that they learned of our deception or looked up a paper in the PI’s biosketch and therefore were very likely to learn of our deception. It is ambiguous whether the remaining 105 reviewers also learned of our deception.

and

34 participants turned in reviews without contacting us to say that they noticed the deception, and yet indicated in review submissions that some of the grant personnel were fictitious.

So despite their instructions and discouraging participants from using outside materials, significant numbers of them did. And reviewers turned in reviews without saying they were on to the deception when they clearly were. And the authors did not, apparently, debrief in a way that could definitively say whether all, most or few reviewers were on to their true purpose. Nor does there appear to be any mention of asking reviewers afterwards of whether they knew about Ginther, specifically, or disparate grant award outcomes in general terms. That would seem to be important.

Why? Because if you tell most normal decent people that they are to review applications to see if they are biased against black PIs they are going to fight as hard as they can to show that they are not a bigot. The Ginther finding was met with huge and consistent protestation on the part of experienced reviewers that it must be wrong because they themselves were not consciously biased against black PIs and they had never noticed any overt bias during their many rounds of study section. The authors clearly know this. And yet they did not show that the study participants were not on to them. While using those rather interesting names to generate the impression of ethnicity.

The authors make several comments throughout the pre-print about how this is a valid model of NIH grant review. They take a lot of pride in their design choices in may places. I was very struck by:

names that were most commonly associated with Black and White babies born in Massachusetts between 1974 and 1979. A person born in the 1970s would now be in their 40s, which we reasoned was a plausible age for a current Principal Investigator.

because my first thought when reading this design was "gee, most of the African-Americans that I know who have been NIH funded PIs are named things like Cynthia and Carl and Maury and Mike and Jean and.....dude something is wrong here.". Buuuut, maybe this is just me and I do know of one "Yasmin" and one "Chanda" so maybe this is a perceptual bias on my part. Okay, over to RePORTER to search out the first names. I'll check all time and for now ignore F- and K-mechs because Ginther focused on research awards, iirc. Darnell (4, none with the last names the authors used); LaTonya (1, ditto); LaToya (2, one with middle / maiden? name of Jones, we'll allow that and oh, she's non-contact MultiPI); Tyrone (6; man one of these had so many awards I just had to google and..well, not sure but....) and Tanisha (1, again, not a president surname).

This brings me to "Jamal". I'm sorry but in science when you see a Jamal you do not think of a black man. And sure enough RePORTER finds a number of PIs named Jamal but their surnames are things like Baig, Farooqui, Ibdah and Islam. Not US Presidents. Some debriefing here to ensure that reviewers presumed "Jamal" was black would seem to be critical but, in any case, it furthers the suspicion that these first names do not map onto typical NIH funded African-Americans. This brings us to the further observation that first names may convey not merely ethnicity but something about subcategories within this subpopulation of the US. It could be that these names cause percepts bound up in geography, age cohort, socioeconomic status and a host of other things. How are they controlling for that? The authors make no mention that I saw.

The authors take pains to brag on their clever deep thinking on using an age range that would correspond to PIs in their 40s (wait, actually 35-40, if the funding of the project in -02 claim is accurate, when the average age of first major NIH award is 42?) to select the names and then they didn't even bother to see if these names appeared on the NIH database of funded awards?

The takeaway for today is that the study validity rests on the reviewers not knowing the true purpose. And yet they showed that reviewers did not follow their instructions for avoiding outside research and that reviewers did not necessarily volunteer that they'd detected the name deception*** and yet some of them clearly had. Combine this with the nature of how the study created the impression of PI ethnicity via these particular first names and I think this can be considered a fatal flaw in the study.
__

Race, Ethnicity, and NIH Research Awards, Donna K. Ginther, Walter T. Schaffer, Joshua Schnell, Beth Masimore, Faye Liu, Laurel L. Haak, Raynard Kington. Science 19 Aug 2011:Vol. 333, Issue 6045, pp. 1015-1019
DOI: 10.1126/science.1196783

*Notice the late September original funding date combined with the June 30 end date for subsequent years? This almost certainly means it was an end of year pickup** of something that did not score well enough for regular funding. I would love to see the summary statement.

**Given that this is a "Transformative" award, it is not impossible that they save these up for the end of the year to decide. So I could be off base here.

*** As a bit of a sidebar there was a twitter person who claimed to have been a reviewer in this study and found a Biosketch from a supposedly female PI referring to a sick wife. Maybe the authors intended this but it sure smells like sloppy construction of their materials. What other tells were left? And if they *did* intend to bring in LBTQ assumptions...well this just seems like throwing random variables into the mix to add noise.

DISCLAIMER: As per usual I encourage you to read my posts on NIH grant matters with the recognition that I am an interested party. The nature of NIH grant review is of specific professional interest to me and to people who are personally and professionally close to me.

23 responses so far

The Purchasing Power of the NIH Grant Continues to Erode

It has been some time since I made a figure depicting the erosion of the purchasing power of the NIH grant so this post is simply an excuse to update the figure.

In brief, the NIH modular budget system used for a lot of R01 awards limits the request to $250,000 in direct costs per year. A PI can ask for more but they have to use a more detailed budgeting process, and there are a bunch of reasons I'm not going to go into here that makes the "full-modular" a good starting point for discussion of the purchasing power of the typical NIH award.

The full modular limit was put in place at the inception of this system (i.e., for applications submitted after 6/1/1999) and has not been changed since. I've used the FY2001 as my starting point for the $250,000 and then adjusted it in two ways according to the year by year BRDPI* inflation numbers. The red bars indicate the reduction in purchasing power of a static $250,000 direct cost amount. The black bars indicate the amount the full-modular limit would have to be escalated year over year to retain the same purchasing power that $250,000 conferred in 2001.


(click to enlarge)

The executive summary is that the NIH would have to increase the modular limit to $450,000 $400,000** per year in direct costs for FY2018 in order for PIs to have the same purchasing power that came with a full-modular grant award in 2001.
__
*The BRDPI inflation numbers that I used can be downloaded from the NIH Office of Budget. The 2017 and 2018 numbers are projected.

**I blew it. The BRDPI spreadsheet actually projects inflation out to 2023 and I pulled the number from 2021 projection. The correct FY2018 equivalent is $413,020.

7 responses so far

Repost- Your Grant in Review: Competing Continuation, aka Renewal, Apps

May 11 2018 Published by under Fixing the NIH, NIH, NIH Careerism

Two recent posts discuss the topic of stabilizing NIH funding within a PI's career, triggered by a blog post from Mike Lauer and Francis Collins. In the latter, the two NIH honchos claim to be losing sleep over the uncertainty of funding in the NIH extramural granting system, specifically in application to those PIs who received funding as an ESI and are now trying to secure the next round of funding.

One key part of this, in my view, is how they (the NIH) and we (extramural researchers, particularly those reviewing applications for the NIH) think about the proper review of Renewal (formerly known as competing continuation) applications. I'm reposting some thoughts I had on this topic for your consideration.

This post originally appeared Jan 28, 2016.
___
In the NIH extramural grant funding world the maximum duration for a project is 5 years. It is possible at the end of a 5 year interval of support to apply to continue that project for another interval. The application for the next interval is competitively reviewed alongside of new project proposals in the relevant study sections, in general.

Comradde PhysioProffe addressed the continuation application at his Ftb joint. NIAID has a FAQ page.

The NIH Success Rate data shows that RPG success rates were 16.8% in 2013 and 18.1% in 2014. Comparable rates for competing continuation RPG applications were 35% in 2013 and 39% in 2014. So you can see why this is important.

I visited these themes before in a prior post. I think I covered most of the issues but in a slightly different way.

Today I want to try to get you folks to talk about prescriptives. How should a competing continuation / renewal NIH grant application be reviewed?

Now in my experience, the continuation application hinges on past-productivity in a way that a new application does not. Reviewers are explicitly considering the work that has been conducted under the support of the prior award. The application is supposed to include a list of publications that have resulted from the prior award. The application is supposed to detail a Progress Report that overviews what has been accomplished. So today I will be focusing on review mostly as it pertains to productivity. For reference, Berg's old post on the number of papers per grant dollar is here and shows an average output of 6 papers (IQR about 4-11) per $250K full modular award*.

Quoted bits are from my prior post.

Did you knock our socks off? This could be amazing ELEVENTY type findings, GlamourPub record (whether “expected” for your lab or not), unbelievably revolutionary advances, etc. If you have a record of this, nobody is going to think twice about what your Aims may have been. Probably won’t even give a hoot whether your work is a close match to the funding IC, for that matter.

We should probably separate these for discussion because after all, how often is a panel going to recognize a Nobel Prize type of publication has been supported by the award in the past 5 years? So maybe we should consider Glamour publications and amazing advances as two different scenarios. Are these going to push any renewal application over the hurdle for you even if the remaining items below are lacking? Does GlamMag substitute for direct attention to the experiments that were proposed or the Aims that guided the plan? In the extreme case, should we care if the work bears very little on the mission of the IC that has funded it?

Were you productive? Even if you didn’t WOW the world, if you’ve pumped out a respectable number of papers that have some discernible impact on a scientific field, you are in good shape. The more, the merrier. If you look “fabulously productive” and have contributed all kinds of interesting new science on the strength of your award(s), this is going to go down like gangbusters with the review panels. At this level of accomplishment you’d probably be safest at least be doing stuff that is vaguely in line with the IC that has funded your work.

Assuming that Glam may not be in the control of most PIs but that pedestrian, workaday scientific output is, should this be a major credit for the continuation application? We don't necessarily have to turn this into a LPU sausage-slicing discussion. Let's assume a quality of paper commensurate with the kind of work that most PIs with competitive applications in that particular study section publish. Meets the subfield standard. How important should raw productivity be?

Were you productive in addressing your overall goals? This is an important distinction from the Specific Aims. It is not necessary, in my view, that you hew closely to Aims first dreamed up 7 years prior to the conclusion of the actual study. But if you have moderate, or disappointing, productivity it is probably next most-helpful that you have published work related to the overall theme of the project. What was the big idea? What was mentioned in the first three sentences of your Specific Aims page? If you have published work related to this broad picture, that’s good.

This one is tricky. The reviewers do not have the prior grant application in front of them. They have the prior Summary Statement and the Abstract as published on RePORTER. It is a decent bet the prior Aims can be determined but broader themes may or may not come across. So for the most part if the applicant expects the reviewers to see that productivity has aligned with overarching programmatic goals, she has to tell them what those were. Presumably in the Progress Report part of the continuation application. How would you approach this as a reviewer? If the project wasn't overwhelmingly productive, didn't obviously address all of the Aims but at least generated some solid work along the general themes. Are you going to be satisfied? Or are you going to downgrade the failure to address each Aim? What if the project had to can an entire Aim or two? Would it matter? Is getting "stuck" in a single Aim a death knell when it comes time to review the next interval of support? As a related question if the same exact Aim has returned with the argument of "We didn't get to this in the past five years but it is still a good idea"? Neutral? Negative? AYFK?

Did you address your original Specific Aims? ...this can be a big obsession of certain reviewers. Not saying it isn’t a good idea to have papers that you can connect clearly to your prior Aims. ... A grant is not a contract. It is quite natural in the course of actual science that you will change your approaches and priorities for experiments. Maybe you’ve been beaten to the punch. Maybe your ongoing studies tell you that your original predictions were bad and you need to go in a whole new direction. Maybe the field as a whole has moved on. ... You might want to squeeze a drop out of a dry well to meet the “addressed Aims” criterion but maybe that money, effort and time would be better spent on a new direction which will lead to three pubs instead of one?

My original formulation of this isn't quite right for today's discussion. The last part is actually more relevant to the preceding point. For today, expand this to a continuation application that shows that the prior work essentially covers exactly what the application proposed. With data either published or included as ready-to-submit Preliminary Data in the renewal. Maybe this was accomplished with only a few papers in pedestrian journals (Lord knows just about every one of my manuscript reviews these days gets at least one critique that to calls for anywhere from 2 to 5 Specific Aims worth of data) so we're not talking about Glam or fabulous productivity. But should addressing all of the Aims and most if not all of the proposed experiments be enough? Is this a credit to a competing continuation application?

It will be unsurprising to you that by this point of my career, I've had competing continuation applications to which just about all of these scenarios apply, save Glam. We've had projects where we absolutely nailed everything we proposed to do. We've had projects get distracted/sidelined off onto a subsection of the proposal that nevertheless generated about the same number and quality of publications that would have otherwise resulted. We've had low productivity intervals of support that addressed all the Aims and ones that merely covered a subset of key themes. We've had projects with reasonably high productivity that have....wandered....from the specifics of the awarded proposal due to things that are happening in the subfield (including getting scooped). We've never been completely blanked on a project with zero related publications to my recollection, but we've had some very low productivity ones (albeit with excellent excuses).

I doubt we've ever had a perfect storm of sky-high productivity, all Aims addressed and the overarching themes satisfied. Certainly I have the review comments to suggest this**.

I have also been present during review panel discussions of continuation applications where reviewers have argued bitterly over the various productivity attributes of a prior interval of support. The "hugely productive" arguments are frequently over an application from a PI who has more than one award and tends to acknowledge more than one of them on each paper. This can also involve debates about so called "real scientific progress" versus papers published. This can be the Aims, the overall theme or just about the sneer of "they don't really do any interesting science".

I have for sure heard from people who are obsessed during review with whether each proposed experiment has been conducted (this was back in the days when summary statements could be fairly exhaustive and revealed what was in the prior application to a broader extent). More generally from reviewers who want to match publications up to the scope of the general scientific terrain described by the prior application.

I've also seen arguments about suggested controls or key additional experiments which were mentioned in the summary statement of the prior review, never addressed in the resulting publications and may still be a criticism of the renewal application.

Final question: Since the reviewers of the competing continuation see the prior summary statement, they see the score and percentile. Does this affect you as a reviewer? Should it? Especially if in your view this particular application should never have been funded at that score and is a likely a Programmatic pickup? Do you start steaming under the collar about special ESI paylines or bluehair/graybeard insider PO backslapping?

DISCLAMER: A per usual, I may have competing continuation applications under current or near-future review by NIH study sections. I am an interested party in how they are reviewed.
__
*This probably speaks to my point about how multi-award PIs attribute more than one grant on each paper. My experience has not been that people in my field view 5 papers published per interval of support (and remember the renewal application is submitted with the final year of funded support yet to go, if the project is to continue uninterrupted) as expected value. It is certainly not viewed as the kind of fabulous productivity that of course would justify continuing the project. It is more in line with the bare minimum***. Berg's data are per-grant-dollar of course and are not exactly the same as per-grant. But it is a close estimate. This blog post estimates "between 0.6 and 5 published papers per $100k in funding." which is one to 12 per year of a full-modular NIH R01. Big range and that high number seems nigh on impossible to me without other funding (like free trainee labor or data parasitism).

**and also a pronounced lack of success renewing projects to go with it.

***I do not personally agree. At the point of submitting a competing continuation in year 4 a brand new research program (whether b/c noob PI or very new lab direction) may have really only been rocking for 2 years. And large integrated projects like a big human subjects effort may not even have enrolled all the subjects yet. Breeding, longitudinal development studies, etc - there are many models that can all take a long time to get to the point of publishing data. These considerations play....let us say variably, with reviewers. IME.

No responses yet

Stability of funding versus the project-based funding model of the NIH

May 09 2018 Published by under Fixing the NIH, NIH, NIH Careerism, NIH funding

In response to a prior post, Morgan Price wonders about the apparent contrast of NIH's recent goal to stabilize research funding and the supposed "project-based" model.

I don't see how stability based funding is consistent with project-based funding and "funding the best science". It would be a radical change...?

NIH grants are supposed to be selected and awarded on the basis of the specific project that is proposed. That is why there is such extensive detailing of a very specific area of science, well specified Specific (not General!) Aims and a listing of specific experiments.

They are not awarded on the basis of a general program of research that seems to be promising for continued funding.

Note that there are indeed mechanisms of funding that operate on the program level to much greater extent. HHMI being one of the more famous ones of these. In program based award, the emphasis is on what the investigating team (and generally this means specifically the PI) has accomplished and published in recent years. There may be some hints about what the person plans to work on next but generally the emphasis is on past performance, rather than the specific nature of the future plan.

In the recent handwringing from NIH about how investigators that they have launched with special consideration for their newcomer status (e.g., the Early Stage Investigator PI applications can be funded at lower priority scores / percentile ranks than would be needed by an established investigator.

if we are going to nurture meritorious, productive mid-career investigators by stabilizing their funding streams, monies will have to come from somewhere.

"Stabilizing", Morgan Price assumes is the same thing as a radical change. It is not.

Here's the trick:

The NIH funding system has always been a hybrid which pays lip service to "project based funding" as a model while blithely using substantial, but variable, input from the "program based" logic. First off, the "Investigator" criterion of proposal review is one of 5 supposedly co-equal major criteria. The Biosketch, which details the past accomplishments and skills of the PI) is prominent in the application. This Biosketch lists both papers and prior research grant support* which inevitably leads to some degree of assessment of how productive the PI was with her prior awards. This then is used to judge the merit of the proposal that is under current review - sounds just a bit like HHMI, doesn't it?

The competing continuation application (called a Renewal application now) is another NIH beast that reveals the hybrid nature of the selection system. You are allowed to ask for no more than 5 years of support for a given project, but you can then ask for successive five year extensions via competitive application review. This type of proposal has a "Progress Report" and a list of papers resulting from the project required within the application. This, quite obviously, focuses the review in large part on the past accomplishment. Now, sure, the application also has to have a detailed proposal for the next interval. Specific Aims. Experiments listed. But it also has all of the prior accomplishments pushed into the center of the review.

So what is the problem? Why are Collins and Lauer proposing to make the NIH grant selection even more based on the research program? Well, times have changed. The figure here is a bit dated by now but I like to keep refreshing your view of it because NIH has this nasty tendency to truncate their graphs to only the past decade or so. The NIH does this to obscure just how good investigators had things in the 80s. That was when established investigators enjoyed success rates north of 40%. For all applications, not just for competing renewals. Many of the people who started their careers in those wonderful days are still very much with us, by the way. This graph shows that within a few years of the end of the doubling, the success rates for established investigators had dropped to about where the new investigators were in the 1980s. Success rates have only continued to get worse but thanks to policies enacted by Zerhouni, the established and new investigator success rates have been almost identical since 2007.
Interestingly, one of the things Zerhouni had to do was to insist that Program change their exception pay behavior. (This graph was recreated from a GAO report [PDF], page down to Page 56, PDF page 60.) It is relevant because it points to yet another way that the NIH system used to prioritize program qualities over the project qualities. POs historically were much more interested in "saving" previously funded, now unfunded, labs than they were in saving not-yet-funded labs.

Now we get to Morgan Price's point about "the best science". Should the NIH system be purely project-based? Can we get the best science one 5 year plan at a time?

I say no. Five years is not enough time to spool up a project of any heft into a well honed and highly productive gig. Successful intervals of 5 year grants depend on what has come before to a very large extent. Often times, adding the next 5 years of funding via Renewal leads to an even more productive time because it leverages what has come before. Stepping back a little bit, gaps in funding can be deadly for a project. A project that has been killed off just as it is getting good is not only not the "best" science it is hindered science. A lack of stability across the NIH system has the effect of making all of its work even more expensive because something headed off in Lab 1 (due to gaps in funding) can only be started up in Lab 2 at a handicap. Sure Lab 2 can leverage published results of Lab 1 but not the unpublished stuff and not all of the various forms of expertise locked up in the Lab 1 staff's heads.

Of course if too much of the NIH allocation goes to sinecure program-based funding to continue long-running research programs, this leads to another kind of inefficiency. The inefficiencies of opportunity cost, stagnation, inflexibility and dead-woodery.

So there is a balance. Which no doubt fails to satisfy most everyone's preferences.

Collins and Lauer propose to do a bit of re-balancing of the program-based versus project-based relationship, particularly when it comes to younger investigators. This is not radical change. It might even be viewed in part as a selective restoration of past realities of grant funded science careers.

__
*In theory the PI's grants are listed on the Biosketch merely to show the PI is capable of leading a project something like the one under review. Correspondingly, it would in theory be okay to just list the most successful ones and leave out the grant awards with under-impressive outcomes. After all, do you have to put in every paper? no. Do you have to put every bit of bad data that you thought might be preliminary data into the app? no. So why do you have to** list all of your grants? This is the program-based aspects of the system at work.

**dude, you have to. this is one of those culture of review things. You will be looked up on RePORTER and woe be to you if you try to hide some project, successful or not, that has active funding within the past three years.

14 responses so far

Addressing the Insomnia of Francis Collins and Mike Lauer

The Director of the NIH and the Deputy Director in charge of the office of extramural research have posted a blog post about The Issue that Keeps Us Awake at Night. It is the plight of the young investigator, going from what they have written.


The Working Group is also wrestling with the issue that keeps us awake at night – considering how to make well-informed strategic investment decisions to nurture and further diversify the biomedical research workforce in an environment filled with high-stakes opportunity costs. If we are going to support more promising early career investigators, and if we are going to nurture meritorious, productive mid-career investigators by stabilizing their funding streams, monies will have to come from somewhere. That will likely mean some belt-tightening in other quarters, which is rarely welcomed by the those whose belts are being taken in by a notch or two.

They plan to address this by relying on data and reports that are currently being generated. I suspect this will not be enough to address their goal.

I recently posted a link to the NIH summary of their history of trying to address the smooth transition of newly minted PIs into NIH-grant funded laboratories, without much comment. Most of my Readers are probably aware by now that handwringing from the NIH about the fate of new investigators has been an occasional feature since at least the Johnson Administration. The historical website details the most well known attempts to fix the problem. From the R23 to the R29 FIRST to the New Investigator check box, to the "sudden realization"* they needed to invent a true Noob New Investigator (ESI) category, to the latest designation of the aforementioned ESIs as Early Established Investigators for continued breaks and affirmative action. It should be obvious from the ongoing reinvention of the wheel that the NIH periodically recognizes that the most recent fix isn't working (and may have unintended detrimental consequences).

One of the reasons these attempts never truly work and have to be adjusted or scrapped and replaced by the next fun new attempt was identified by Zerhouni (a prior NIH Director) in about 2007. This was right after the "sudden realization" and the invention of the ESI. Zerhouni was quoted in a Science news bit as saying that study sections were responding to the ESI special payline boost by handing out ever worsening scores to the ESI applications.

Told about the quotas, study sections began “punishing the young investigators with bad scores,” says Zerhouni.

Now, I would argue that viewing this trend of worsening scores as "punishing" is at best only partially correct. We can broaden this to incorporate a simple appreciation that study sections adapt their biases, preferences and evolved cultural ideas about grant review to the extant rules. One way to view worsening ESI scores may have to do with the pronounced tendency reviewers have to think in terms of fund it / don't fund it, despite the fact that SROs regularly exhort them not to do this. When I was on study section regularly, the scores tended to pile up around the perceived payline. I've seen the data for one section across multiple rounds. Reviewers were pretty sensitive to the scuttlebutt about what sort of score was going to be a fundable one. So it would be no surprise whatsoever to me if there was a bias driven by this tendency, once it was announced that ESI applications would get a special (higher) payline for funding.

This tendency might also be driven in part by a "Get in line, youngun, don't get too big for your britches" phenomenon. I've written about this tendency a time or two. I came up as a postdoc towards the end of the R29 / FIRST award era and got a very explicit understanding that some established PIs thought that newbies had to get the R29 award as their first award. Presumably there was a worsening bias against giving out an R01 to a newly minted assistant professor as their first award**, because hey, the R29 was literally the FIRST award, amirite?

sigh.

Then we come to hazing, which is the even nastier relative of the "Don't get to big for your britches". Oh, nobody will admit that it is hazing, but there is definitely a subcurrent of this in the review behavior of some people that think that noob PIs have to prove their worth by battling the system. If they sustain the effort to keep coming back with improved versions, then hey, join the club kiddo! (Here's an ice pack for the bruising). If the PI can't sustain the effort to submit a bunch of revisions and new attempts, hey, she doesn't really have what it takes, right? Ugh.

Scientific gate-keeping. This tends to cover a multitude of sins of various severity but there are definitely reviewers that want newcomers to their field to prove that they belong. Is this person really an alcohol researcher? Or is she just going to take our*** money and run away to do whatever basic science amazeballs sounded super innovative to the panel?

Career gate-keeping. We've gone many rounds on this one within the science blog- and twittospheres. Who "deserves" a grant? Well, reviewers have opinions and biases and despite their best intentions and wounded protestations...these attitudes affect review. In no particular order we can run down the favorite targets of the "Do it to Julia, not me, JULIA!" sentiment. Soft money job categories. High overhead Universities. Well funded labs. Translational research taking all the money away from good honest basic researchers***. Elite coastal Universities. Big Universities. R1s. The post-normative-retirement crowd. Riff-raff plodders.

Layered over the top of this is favoritism. It interacts with all of the above, of course. If some category of PI is to be discriminated against, there is very likely someone getting the benefit. The category of which people approve. Our club. Our kind. People who we like who must be allowed to keep their funding first, before we let some newbie get any sniff of a grant.

This, btw, is a place where the focus must land squarely on Program Officers as well. The POs have all the same biases mentioned above, of course. And their versions of the biases have meaningful impact. But when it comes to thought of "we must save our long term investigators" they have a very special role to play in this debacle. If they are not on board with the ESI worries that keep Collins and Lauer awake at night, well, they are ideally situated to sabotage the effort. Consciously or not.

So, Director Collins and Deputy Director Lauer, you have to fix study section and you have to fix Program if you expect to have any sort of lasting change.

I have only a few suggestions and none of this is a silver bullet.

I remain convinced that the only tried and true method to minimize the effects of biases (covert and overt) is the competition of opposing biases. I've remarked frequently that study sections would be improved and fairer if less-experienced investigators had more power. I think the purge of Assistant Professors effected by the last head of the CSR (Scarpa) was a mistake. I note that CSR is charged with balancing study sections on geography, sex, ethnicity, university type and even scientific subdomains...while explicitly discriminating against younger investigators. Is it any wonder if there is a problem getting the newcomers funded?

I suggest you also pay attention to fairness. I know you won't, because administrators invariably respond to a situation of perceived past injustice with "ok, that was the past and we can't do anything about it, moving forward please!". But this is going to limit your ability to shift the needle. People may not agree on what represents fair treatment but they sure as heck are motivated by fairness. Their perception of whether a new initiative is fair or unfair will tend to shape their behavior when reviewing. This can get in the way of NIH's new agenda if reviewers perceive themselves as being mistreated by it.

Many of the above mentioned reviewer quirks are hardened by acculturation. PIs who are asked to serve on study section have been through the study section wringer as newbies. They are susceptible to the idea that it is fair if the next generation has it just about as hard as they did and that it is unfair if newbies these days are given a cake walk. Particularly, if said established investigators feel like they are still struggling. Ahem. It may not seem logical but it is simple psychology. I anticipate that the "Early Established Investigator" category is going to suffer the same fate as the ESI category. Scores will worsen, compared to pre-EEI days. Some of this will be the previously mentioned tracking of scores to the perceived payline. But some of this will be people**** who missed the ESI assistance who feel that it is unfair that the generation behind them gets yet another handout to go along with the K99/R00 and ESI plums. The intent to stabilize the careers of established investigators is a good one. But limiting this to "early" established investigators, i.e., those who already enjoyed the ESI era, is a serious mistake.

I think Lauer is either aware, or verging on awareness, of something that I've mentioned repeatedly on this blog. I.e. that a lot of the pressure on the grant system- increasing numbers of applications, PIs seemingly applying greedily for grants when already well funded, they revision queuing traffic pattern hold - comes from a vicious cycle of the attempt to maintain stable funding. When, as a VeryEstablished colleague put it to me suprisingly recently "I just put in a grant when I need another one and it gets funded" is the expected value, PIs can be efficient with their grant behavior. If they need to put in eight proposals to have a decent chance of one landing, they do that. And if they need to start submitting apps 2 years before they "need" one, the randomness is going to mean they seem overfunded now and again. This applies to everyone all across the NIH system. Thinking that it is only those on their second round of funding that have this stability problem is a huge mistake for Lauer and Collins to be making. And if you stabilize some at the expense of others, this will not be viewed as fair. It will not be viewed as shared pain.

If you can't get more people on board with a mission of shared sacrifice, or unshared sacrifice for that matter, then I believe NIH will continue to wring its hands about the fate of new investigators for another forty years. There are too many applicants for too few funds. It amps up the desperation and amps up the biases for and against. It decreases the resistance of peer reviewers to do anything to Julia that they expect might give a tiny boost to the applications of them and theirs. You cannot say "do better" and expect reviewers to change, when the power of the grant game contingencies is so overwhelming for most of us. You cannot expect program officers who still to this day appear entirely clueless about they way things really work in extramural grant-funded careers to suddenly do better because you are losing sleep. You need to delve into these psychologies and biases and cultures and actually address them.

I'll leave you with an exhortation to walk the earth, like Caine. I've had the opportunity to watch some administrative frustration, inability and nervousness verging on panic in the past couple of years that has brought me to a realization. Management needs to talk to the humblest of their workforce instead of the upper crust. In the case of the NIH, you need to stop convening preening symposia from the usual suspects, taking the calls of your GlamHound buddies and responding only to reps of learn-ed societies. Walk the earth. Talk to real applicants. Get CSR to identify some of your most frustrated applicants and see what is making them fail. Find out which of the apparently well-funded applicants have to work their tails off to maintain funding. Compare and contrast to prior eras. Ask everyone what it would take to Fix the NIH.

Of course this will make things harder for you in the short term. Everyone perceives the RealProblem as that guy, over there. And the solutions that will FixTheNIH are whatever makes their own situation easier.

But I think you need to hear this. You need to hear the desperation and the desire most of us have simply to do our jobs. You need to hear just how deeply broken the NIH award system is for everyone, not just the ESI and EEI category.

PS. How's it going solving the problem identified by Ginther? We haven't seen any data lately but at last check everything was as bad as ever so...

PPS. Are you just not approving comments on your blog? Or is this a third rail issue nobody wants to comment on?
__
*I make fun of the "sudden realization" because it took me about 2 h of my very first study section meeting ever to realize that "New Investigator" checkbox applicants from genuine newbies did very poorly and all of these were being scooped up by very well established and accomplished investigators who simply hadn't been NIH funded. Perhaps they were from foreign institutions, now hired in the US. Or perhaps lived on NSF or CDC or DOD awards. The idea that it took NIH something like 8-10 years to realize this is difficult to stomach.

**The R29 was crippled in terms of budget, btw. and had other interesting features.

***lolsob

****Yep, that would be my demographic.

12 responses so far

Older posts »