Archive for the 'Replication' category

Group effects. or "effects".

Jul 22 2016 Published by under Replication, ReplicationCrisis

How many times do we see the publication of a group effect in an animal model that is really just a failure to replicate? Or a failure to completely replicate?

How many of those sex-differences, age-differences or strain-differences have been subjected to replication?

10 responses so far

Amgen continues their cherry picking on "reproducibility" agenda

Feb 05 2016 Published by under Conduct of Science, Replication, ReplicationCrisis

A report by Begley and Ellis, published in 2012, was hugely influential in fueling current interest and dismay about the lack of reproducibility in research. In their original report the authors claimed that the scientists of Amgen had been unable to replicate 47 of 53 studies.

Over the past decade, before pursuing a particular line of research, scientists (including C.G.B.) in the haematology and oncology department at the biotechnology firm Amgen in Thousand Oaks, California, tried to confirm published findings related to that work. Fifty-three papers were deemed 'landmark' studies (see 'Reproducibility of research findings'). It was acknowledged from the outset that some of the data might not hold up, because papers were deliberately selected that described something completely new, such as fresh approaches to targeting cancers or alternative clinical uses for existing therapeutics. Nevertheless, scientific findings were confirmed in only 6 (11%) cases. Even knowing the limitations of preclinical research, this was a shocking result.

Despite the limitations identified by the authors themselves, this report has taken on a life of truthy citation as if most of all biomedical science reports cannot be replicated.

I have remarked a time or two that this is ridiculous on the grounds the authors themselves recognize, i.e., a company trying to skim the very latest and greatest results for intellectual property and drug development purposes is not reflective of how science works. Also on the grounds that until we know exactly which studies and what they mean by "failed to replicate" and how hard they worked at it, there is no point in treating this as an actual result.

At first, the authors refused to say which studies or results were meant by this original population of 53.

Now we have the data! They have reported their findings! Nature announces breathlessly that Biotech giant publishes failures to confirm high-profile science.

Awesome. Right?

Well, they published three of them, anyway. Three. Out of fifty-three alleged attempts.

Are you freaking kidding me Nature? And you promote this like we're all cool now? We can trust their original allegation of 47/53 studies unreplicable?

These are the data that have turned ALL OF NIH UPSIDE DOWN WITH NEW POLICY FOR GRANT SUBMISSION!

Christ what a disaster.

I look forward to hearing from experts in the respective fields these three papers inhabit. I want to know how surprising it is to them that these forms of replication failure occurred. I want to know the quality of the replication attempts and the nature of the "failure"- was it actually failure or was it a failure to generalize in the way that would be necessary for a drug company's goals? Etc.

Oh and Amgen? I want to see the remaining 50 attempts, including the positive replications.
__

Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012 Mar 28;483(7391):531-3. doi: 10.1038/483531a.

21 responses so far

British Journal of Pharmacology issues new experimental design standards

Dec 23 2015 Published by under Conduct of Science, Replication, ReplicationCrisis

The BJP has decided to require that manuscripts submitted for publication adhere to certain experimental design standards. The formulation can be found in Curtis et al., 2015.

Curtis MJ, Bond RA, Spina D, Ahluwalia A, Alexander SP, Giembycz MA, Gilchrist A, Hoyer D, Insel PA, Izzo AA, Lawrence AJ, MacEwan DJ, Moon LD, Wonnacott S, Weston AH, McGrath JC. Experimental design and analysis and their reporting: new guidance for publication in BJP. Br J Pharmacol. 2015 Jul;172(14):3461-71. doi: 10.1111/bph.12856 [PubMed]

Some of this continues the "huh?" response of this behavioral pharmacologist who publishes in a fair number of similar journals. In other words, YHN is astonished this stuff is not just a default part of the editorial decision making at BJP in the first place. The items that jump out at me include the following (paraphrased):

2. You should shoot for a group size of N=5 or above and if you have fewer you need to do some explaining.
3. Groups less than 20 should be of equal size and if there is variation from equal sample sizes this needs to be explained. Particularly for exclusions or unintended loss of subjects.
4. Subjects should be randomized to groups and treatment order should be randomized.
6.-8. Normalization and transformation should be well justified and follow acceptable practices (e.g., you can't compare a treatment group to the normalization control that now has no variance because of this process).
9. Don't confuse analytical replicates with experimental replicates in conducting analysis.

Again, these are the "no duh!" issues in my world. Sticky peer review issues quite often revolve around people trying to get away with violating one or other of these things. At the very least reviewers want justification in the paper, which is a constant theme in these BJP principles.

The first item is a pain in the butt but not much more than make-work.

1. Experimental design should be subjected to ‘a priori power analysis’....latter requires an a priori sample size calculation that should be included in Methods and should include alpha, power and effect size.

Of course, the trouble with power analysis is that it depends intimately on the source of your estimates for effect size- generally pilot or prior experiments. But you can select basically whatever you want as your assumption of effect size to demonstrate a range of sample sizes as acceptable. Also, you can select whatever level of power you like, within reasonable bounds along the continuum from "Good" to "Overwhelming". I don't think there are very clear and consistent guidelines here.

The fifth one is also going to be tricky, in my view.

Assignment of subjects/preparations to groups, data recording and data analysis should be blinded to the operator and analyst unless a valid scientific justification is provided for not doing so. If it is impossible to blind the operator, for technical reasons, the data analysis can and should be blinded.

I just don't see how this is practical with a limited number of people running experiments in a laboratory. There are places this is acutely important- such as when human judgement/scoring measures are the essential data. Sure. And we could all stand to do with a reminder to blind a little more and a little more completely. But this has disaster written all over it. Some peers doing essentially the same assay are going to disagree over what is necessary and "impossible" and what is valid scientific justification.

The next one is a big win for YHN. I endorse this. I find the practice of reporting any p value other than your lowest threshold to be intellectually dishonest*.


10. When comparing groups, a level of probability (P) deemed to constitute the threshold for statistical significance should be defined in Methods, and not varied later in Results (by presentation of multiple levels of significance). Thus, ordinarily P < 0.05 should be used throughout a paper to denote statistically significant differences between groups.

I'm going to be very interested to see how the community of BJP accepts* this.

Finally, a curiosity.

11. After analysis of variance post hoc tests may be run only if F achieves the necessary level of statistical significance (i.e. P < 0.05) and there is no significant variance in homogeneity.

People run post-hocs after a failure to find a significant main effect on the ANOVA? Seriously? Or are we talking about whether one should run all possible comparison post-hocs in the absence of an interaction? (seriously, when is the last time you saw a marginal-mean post-hoc used?) And isn't this just going to herald the return of the pre-planned comparison strategy**?

Anyway I guess I'm saying Kudos to BJP for putting down their marker on these design and reporting issues. Sure I thought many of these were already the necessary standards. But clearly there are a lot of people skirting around many of these in publications, specifically in BJP***. This new requirement will stiffen the spine of reviewers and editors alike.

__
*N.b. I gave up my personal jihad on this many years ago after getting exactly zero traction in my scientific community. I.e., I had constant fights with reviewers over why my p values were all "suspiciously" p<0.5 and no backup from editors when I tried to slip this concept into reviews. **I think this is possibly a good thing. ***A little birdy who should know claimed that at least one AE resigned or was booted because they were not down with all of these new requirements.

39 responses so far

Thought of the day

Dec 05 2014 Published by under Replication, ReplicationCrisis, Science Publication

One thing that always cracks me up about manuscript review is the pose struck* by some reviewers that we cannot possibly interpret data or studies that are not perfect.

There is a certain type of reviewer that takes the stance* that we cannot in any way compare treatment conditions if there is anything about the study that violates some sort of perfect, Experimental Design 101 framing even if there is no reason whatsoever to suspect a contaminating variable. Even if, and this is more hilarious, if there are reasons in the data themselves to think that there is no effect of some nuisance variable.

I'm just always thinking....

The very essence of real science is comparing data across different studies, papers, paradigms, laboratories, etc and trying to come up with a coherent picture of what might be a fairly invariant truth about the system under investigation.

If the studies that you wish to compare are in the same paper, sure, you'd prefer to see less in the way of nuisance variation than you expect when making cross-paper comparisons. I get that. But still....some people.

Note: this is some way relates to the alleged "replication crisis" of science.
__
*having nothing to go on but their willingness to act like the manuscript is entirely uninterpretable and therefore unpublishable, I have to assume that some of them actually mean it. Otherwise they would just say "it would be better if...". right?

8 responses so far

Replication costs money

I ran across a curious finding in a very Glamourous publication. Being that it was in a CNS journal, the behavior sucked. The data failed to back up the central claim about that behavior*. Which was kind of central to the actual scientific advance of the entire work.

So I contemplated an initial, very limited check on the behavior. A replication of the converging sort.

It's going to cost me about $15K to do it.

If it turns out negative, then where am I? Where am I going to publish a one figure tut-tut negative that flies in the face of a result published in CNS?

If it turns out positive, this is almost worse. It's a "yeah we already knew that from this CNS paper, dumbass" rejection waiting to happen.

Either way, if I expect to be able to publish in even a dump journal I'm gong to need to throw some more money at the topic. I'd say at least $50K.

At least.

Spent from grants that are not really related to this topic in any direct way.

If the NIH is serious about the alleged replication problem then it needs to be serious about the costs and risks involved.
__
*a typical problem with CNS pubs that involve behavioral studies.

34 responses so far