Predictors of Grad School Publications

Jan 11 2017 Published by under Postgraduate Training

A new paper in PLoS ONE purports to report on the relationship between traditional graduate school selection factors and graduate school success.

Joshua D. Hall, Anna B. O’Connell, Jeanette G. Cook. Predictors of Student Productivity in Biomedical Graduate School Applications. 2017, PLoS ONE, Published: January 11, 2017 [Publisher Link]

The setup:

The cohort studied comprised 280 graduate students who entered the BBSP at UNC from 2008-2010; 195 had graduated with a PhD at the time of this study (July 2016), 45 were still enrolled, and 40 graduated with a Master's degree or withdrew. The cohort included all of the BBSP students who matriculated from 2008-2010.

The major outcome measure:

Publications by each student during graduate school were quantified with a custom Python script that queried Pubmed ( using author searches for each student's name paired with their research advisor's name. The script returned XML attributes ( for all publications and generated the number of first-author publications and the total number of publications (including middle authorship) for each student/advisor pair.

For analysis they grouped the students into bins of 3+, 1-2 or 0 first author pubs with a '0+' category for zero first-author pubs but at least one middle-author publication.

OMG! Nothing predicts graduate school performance (especially those evil, evil, biased - I mentioned evil, right? - standardized scores).

Yes, even people who score below the 50th percentile on quantitative or verbal GRE land first-author publications! (Apple-polishing GPA kids don't seem to fare particularly well, either, plenty of first author publications earned by the 3.0-3.5 riff-raff.)

Oh bai the wai...prior research experience doesn't predict anything either.

Guess what did predict first author publications? Recommendation scores. That's right. The Good Old Boys/Girls Club of biased recommendations from undergraduate professors is predictive of the higher producing graduate students.

As the authors note in the Discussion, this analysis focused only on student characteristics. It could not account for the mentor lab, interaction of student characteristics with the mentor lab characteristics and the like.

I'll let you Readers mull this one over for a bit but I was struck by one thing.

We may be talking at cross purposes when we discuss how application metrics are used to predict graduate student success because we do not have the same idea of success in mind.

This analysis suggests the primary measure of success of a graduate student is the degree to which they succeeded in being a good data-monkey who produces a lot of publishable stuff within the context of their given research laboratory. And by this measure, nothing is very predictive, going by the Hall et al analysis, except the recommendation letter of those who are trying to assess the whole package from their varied perspectives of "I know it when I see it*".

Grad student publication number is, of course, related to who will go on to be a success as a creative independent scientist because of the very common belief that past performance predicts future performance. Those who exit grad school with zero pubs are facing an uphill battle to attain a faculty position. Those with 3+ first author pubs will generally be assumed to be more in the hunt as a potential future faculty member all along the postdoctoral arc.

Assuming all else equal.

This is another way we talk past each other about standardized scores, etc.

The choice of the PI who is trying to select a graduate student for their lab can assume "all else equal". Approximately. Same lab, same basic environment. We don't have this information from Hall et al. and I think it would be pretty difficult to do the study in a way that used same-lab as a covariate. Not just are going to need a very large boat.

I think of it this way. Maybe there are some labs where everyone gets 3 or more first-author papers? Maybe there are some where it takes a very special individual indeed to get more than one in the course of graduate school? And without knowing if the student characteristics determine the host lab, we have to assume random (ish) assignment. Thus it could be the case that the better GREquant, for example, gives a slight advantage within lab but this is wiped out by the variability between-labs.

The choice of a selection committee for graduate programs can be less confident about all else being equal. They have to ask what sort of student can be successful across all of the lab environments in the program. Or successful in the majority of them. The Hall et al. data say that many types can be. But we are still asking a question of whether the training environment is such an overwhelming factor that almost nothing about the individual matters. This seems to be the message

If so, why are we bothering to select students at all? Why have them apply with any details other than the recommendation letters?

Maybe this is another place we are speaking at cross purposes. Some of us may still believe that the point of graduate school selection is to train the faculty (or insert any other specific career outcome if relevant) of tomorrow. Part of the goal, therefore, may be to select people on the basis of who we think would be best at that future role**, regardless of the variation in papers generated with first-author credit as a graduate student.

Is the Hall et al. paper based on a straw notion of "success"?

I think you've probably noticed, Dear Reader, that my opinion is that the career of grant-funded PI takes some personality characteristics that are not easily captured by the number of first-author pubs as a graduate student. Grit and resilience. Intrinsic motivation to git-er-done. Awareness of the structural, career-type aspects. At least a minimal amount of interpersonal skills.

What I am not often on about is the fact that I think that given approximately equal conditions, smarts matters. This is not saying that smarts is the only thing. If you are smart as all heck and you don't have what it takes to be productive or to take a hit, you aren't going to do well. It's the flip side. If two people do have grit and resilience and motivation...the smarter person is going to have an easier time of it or achieve more for the same effort**. On average.

And this is a test that is not performed in the new paper. Figuring out how to compare outcomes within laboratory groups might be an advance on this question.

*When I write recommendation letters for undergrads who have worked with me I do not have access to their standardized scores or grades. I have my subjective impressions of their smarts and industry from their work in my lab to go by. That's it. Maybe other people formally review a transcript and scores before writing a letter? I doubt it but I guess that is possible.

**Regarding that future role, again it may be a question of what is most important for success. Within our own lab, we are assuming that differential opportunity to get publications is not a thing. So since this part of the environment is fixed, we should be thinking about what is going to lead to enhanced success down the road, given conceivable other environments. From the standpoint of a Program, the same? or do we just feel as though the best success in our Program is enough to ensure the best success in any subsequent environment? The way we look at this may be part of what keeps us talking past each other about what graduate selection is for.

17 responses so far

  • Ola says:

    Probably their measure of "success" is sub-optimal, but my guess would be even if one built a more complex success parameter (e.g. taking into account NRSA or K99/R00 success, poster and travel awards, number of job interviews for post-doc, patents, invited talks, h-index 5 years after graduating, salary 5 years after PhD, time to write up, number of linked circle-jerk friends, and of course ownership of a successful blog) the result would be the same.

    This is just another microcosm of the fact that we as humans suck at judging each other (if you need proof of this basic truth, look at the election we just went through).

  • lylebot says:

    Does the paper address selection bias? What were the criteria by which students were admitted? Did students that were not admitted at UNC for whatever reason find success elsewhere? These seem like big questions regarding this sort of research.

  • Alex says:

    I will RTFA later, but off the top of my head I wonder if they accounted for range restriction. I would expect that someone who measures up poorly by one admissions variable is only getting in if they stand out as strong by some other measure. And someone who is really strong on some measure might get slack for some other weakness in the application. Those effects will suppress correlations.

  • drugmonkey says:

    Alex- I bet that you are right. I wonder if you could come up with a weighted summary metric that was correlated with the recommendation scores since presumably those opinions are some sort of Gestalt impression.

    lylebot- it's OA so you can check yourself. but I think that was mentioned as a caveat.

  • Boehninglab says:

    Let me get this straight, you hate grad school pubs as a metric, yet you highlight a study using this as a metric? I don't get it. Regardless, there are so many caveats to this study it is almost impossible to interpret.

  • qaz says:

    The Hall paper (which is making the rounds among grad school committees as a way to attack standardized scores) is based on incorrect statistical methodologies. Basically, they don't have the controls. Basically, they're looking at already selected data. That means even if there was a correlation, it might not show up because they are only looking at the top corner of their data. If we really want to know if any of these factors are predictors, some grad school would have to accept everyone and then determine which factors predict.

  • drugmonkey says:

    Boehninglab- are you referring to my saying it is wrong to make publications a requirement of the PHD?

    qaz- yes it is embarrassing the degree to which academics are misinterpreting these kinds of analyses. Or, misusing is perhaps more accurate.

  • Morgan Price says:

    The mean percentile on the quantitative or verbal GRE was about 73% or roughly the top quarter. So yes, there is a lot of range restriction. On the other hand, if you thought that students with 90th percentile GREs would do better than those with 7oth percentile -- apparently they don't (or the effect is small).

  • babyattachmode says:

    But what does number of publications tell me if you don't take into account the amount of data that goes into each publication (I'm actively avoiding the use of IF here)? If that one first author paper is a CNS paper, then doesn't that do waaaaaay more for someone than 3 papers in the Scandinavian Journal of Applied Bunnyhopping?

  • anon says:


    This study doesn't include me but it does include my lab mates and a lot of other people I know...

  • drugmonkey says:

    iBAM- Yes

  • enginoob says:

    PI mentorship probably trumps grad student admission stats many days. We used to joke that anyone could get a PhD/paper in a certain group in my program - the PI was a fantastic scientist and a really hands-on manager. A good GRE score doesn't mean you get a good mentor.

  • A Salty Scientist says:

    On the other hand, if you thought that students with 90th percentile GREs would do better than those with 70th percentile -- apparently they don't (or the effect is small).

    Reminds of the NIH Grant Lottery Paper. Even with haphazard criteria, we're probably pretty good at distinguishing the top 75th percentile candidates from the bottom 25th percentile. If we care to identify differences between [students, faculty hires, grantees] with 90th percentile "outcomes" vs 70th percentile outcomes, we need a much better experimental design this this study.

  • Alex says:

    A quick search through the paper shows no attention to range restriction. While it is true that the range of each variable is somewhat wide, we're looking at a single institution, so anyone who comes in weak on one variable must have done well on another to get in, and people who are quite strong on two variables are likely to have accepted offers from higher-ranked schools. (Nothing against UNC--it's a good place--but there's always a bigger fish.) And anybody who did poorly on multiple variables is unlikely to have gotten in.

    So while the ranges on individual variables are wide, the region of parameter space explored will almost certainly be fairly narrow.

    Of course, there's a subtext to all discussions of standardized tests: Equity concerns. And my response to equity concerns is simple. If you value equity and diversity and opportunity then you should accept that you'll have to give people a chance despite overall weaker application packages, invest a lot in extra help and mentoring and whatnot, and accept the costs and risks. If you won't accept costs and risks then you don't actually value it.

    Inevitably, somebody will say "Hey, I know plenty of people who did great despite weak application packages!" Yes. People can beat the odds. If you value giving people the opportunity to beat the odds then you'll admit them, invest in trying to help them beat the odds, and accept the risk that it won't always work out. That's a clear-eyed and productive path forward, more productive than denying that odds mean anything.

  • Mitch says:

    It's appropriate to note the range restriction issue, but that doesn't mean the study is useless. For one thing, all of science relies on flawed models. The key thing is that we make decisions based on the best available data. You can't just admit every student and see what happens, so this study is probably about as good as it could get for its purpose (maybe they could have delved into a few details more but the overall design probably cannot be topped - if you think it could be, then state how it could realistically be improved in a significant way, instead of this hit-and-run criticism). For another thing, this may have utility for PIs looking into potential students for their labs. In that case, the whole range is represented here (students already admitted and looking for labs). Access to scores, letters, etc. likely varies with institution, but I'm sure some PIs are privy to that info.

  • Artnsci says:

    When we do faculty searches, we look through applicants' CVs and count only the number of high impact papers. It's amazing how many applicants have 50 papers but only one in a journal that anyone has heard of--not impressive. Thus I find it bizarre that this study didn't even look at impact factors, but just counted papers.

  • JL says:

    Artnsci, I agree that it is weird to write a paper stating how quantitative metrics are poor predictors of another poor quantitative metric. I understand the difficulty in collecting the data, but when these students move on, they get letters from their mentors. I wonder if there are correlations in the strength of a letter to get into the program and the strength of a letter out of the program. The letters out will reflect that "gestalt" of how well the student did.

    Sadly they didn't make the data available. There's a statement about preventing sleuthing to figure out confidential student numbers, but it's still sad. I would have liked to do more analysis on the data.

    Alex, it's not true that either you take the risk of you don't value equity. As has been discussed before, not everyone is in the same position to take risks. A 1 student lab takes a very different amount of risk on one student than a 10 student lab. Same for many other variables. Your absolutism is part of the problem, not the solution. No doubt there is a lot of empty talk on valuing equity, but you don't know why people make decisions and it's dangerous for you to make such broad claims.

Leave a Reply