Pier and colleagues published a study purporting to evaluate the reliability of NIH style peer review of grant applications. Related work that appears to be from the same study was published by this group in 2017.
From the supplement to the 2018 paper, we note that the reviewer demographics were 62% Asian, 38% white with zero black or hispanic reviewers. I don't know how that matches the panels that handle NCI applications but I would expect some minimal black/hispanic representation and a lot lower Asian representation to match my review panel experiences. The panels were also 24% female which seems to match with my memory of NIH stats for review running under 1/3 women.
There were 17% of reviewers at assistant professor rank. This is definitely a divergence from CSR practice. The only data I saw right around the time of Scarpa's great Purge of Assistant Professors suggested a peak of 10% of reviewers. Given the way ad hoc / empaneled reviewer loads work, I think we can conclude that way fewer than 10% of reviews were coming from Assistant Professors. As you know, we are now a decade past the start of the purge and these numbers have to be lower. So the panel demographics are not similar.
N.b., The 2017 papers says they surveyed the reviewers on similarity to genuine NIH review experience but I can't find anywhere it states the amount of review experience for the subjects. Similarly, while they all had to have been awarded at least one R01, we don't know anything about their experiences as applicants. Might be relevant. A missed opportunity would seem to be the opportunity to test reviewer demographics in the 2017 paper which covers more about the process of review, calibration of scoring, agreement after discussion, etc.
The paper(s) also says that they tried to de-identify the applicants.
All applications were deidentified, meaning the names of the PIs, any co-investigators, and any other research personnel were replaced with pseudonyms. We selected pseudonyms using public databases of names that preserved the original gender, nationality, and relative frequency across national populations of the original names. All identifying information, including institutional addresses, email addresses, phone numbers, and hand-written signatures were similarly anonymized and re-identified as well.
I am still looking but I cannot find any reference to any attempt of the authors to validate whether the blinding worked. Which is in and of itself a fascinating question. But for the purposes of the "replication" of NIH peer review we must recognize that Investigator and Environment are two of five formally co-equal scoring criteria. We know that the NIH data show poor correlation of Investigator and Environment criterion scores with overall voted impact score (Approach and Significance are the better predictors), but these are still scoring criteria. How can this study attempt to delete two of these and then purport to be replicating the process? It is like they intentionally set out to throw noise into the system.
I don't think the review panels triaged any of the 25 proposals. The vast majority of NIH review involves triage of the bottom ~half of the assigned proposals. Reviewers know this when they are doing their preliminary reading and scoring.