This is amazing. Strike that, AMAZING!
A paper published in PLoS ONE by Martin and colleagues examines the fate of R01 applications reviewed in 61 of the 172 standing study sections convened by the Center for Scientific Review of the NIH in a single round (the January 2009 Council one- submitted Jun-Jul 2008 and reviewed in Oct-Nov 2008).
It is going to take me a bit to go through all the data but lets start with Figure 1. This plots the preliminary scores (average of ~3 assigned reviewers) against the final priority score voted by the entire panel.
The first and most obvious feature is the tendency for discussion to make the best scores (lowest in the NIH scoring system) more extreme. I would suggest that this results from two factors. First, reviewers are reluctant (in my experience) to assign the best possible score prior to discussion. I don't understand this personally, but I guess I can grasp the psychology. People have the idea that perfection exists out there in some application and they want to reserve some room so that they can avoid having ever awarded a perfect score to a lesser application. Silly, but whatever. Once discussion starts and everyone is nodding along approvingly it is easier to drift to a more perfect score.
Second, there is a bit of the old "Fund that puppy NOW!" going on. Particularly, I would estimate, for applications that were near misses on a prior version and have come back in review. There can be a tendency to want to over-emphasize to Program staff that the study section found the application to be in the must-fund category.
Martin MR, Kopstein A, Janice JM, 2010 An Analysis of Preliminary and Post-Discussion Priority Scores for Grant Applications Peer Reviewed by the Center for Scientific Review at the NIH. PLoS ONE 5(11): e13526. doi:10.1371/journal.pone.0013526