I stumbled back onto something I've been meaning to get to. It touches on both the ethical use of animals in research, the oversight process for animal research and the way we think about scientific inference.
Now, as has been discussed here and there in the animal use discussions, one of the central tenets of the review process is that scientists attempt to reduce the number of animals wherever possible. Meaning without compromising the scientific outcome, the minimum number of subjects required should be used. No more.
We accept as more or less a bedrock that if a result meets the appropriate statistical test to the standard p < 0.05. Meaning that sampling the set of numbers that you have sampled 100 times from the same underlying population, fewer than five times will you get the result you did by chance. From which you conclude it is likely that the populations are in fact different.
There is an unfortunate tendency in science, however, to believe that if your statistical test returns p < 0.01 that this result is better. Somehow more significant, more reliable or more..real. On the part of the experimenter, on the part of his supervising lab head, on the part of paper reviewers and on the part of readers. Particularly the journal club variety.
I think this is intellectually dishonest. I mean, fine, there may be some assays and data types (or experiments) that essentially require that you adopt a different criterion to accept a result as resulting from other than chance. But you should have consistent standards and in the vast majority of cases that standard is going to be p < 0.05. Meaning that if pressed, you are willing to publish that result and willing to act as if you believe that result as firmly as you believe any other result. Trumpeting your p < 0.001 result as if it is somehow more real, however, is trying to say that you had a more stringent criterion in the first place. Which you most certainly did not. So it is dishonest. Within scientist, within fields and across science as a whole.
If p < 0.05 is the standard, than all else is gravy.
As anyone who has done any work with animals knows, in a whole bunch of cases you can lower that p-value simply by running more subjects. In fact it is not unheard of for PIs to tell their trainees to run a few more subjects to make the p-values (or error bars, same principle) "look better".
"Look better" means, "we don't actually make our inferences by statistics at all, what we actually believe in is the significant-by-error-bar-eyeball technique".
So imagine yourself on an Institutional Animal Care and Use Committee. One of the things you are supposed to evaluate is if the number of rats proposed to study, say, mephedrone is excessive. Roughly speaking let us stipulate that N=6 gives us the minimum power required to get a significant p-value; N=12 is robust. A decent chance of p < 0.01. But the PI is asking for N=18 per group so that the error bars look super tight and that notorious Reviewer #3 won't complain that the p < 0.05 doesn't seem real to him.
What group size are you going to approve? On what basis? How do you reconcile Reduction with the eyeball inference technique?
By all means take a stab at interpreting the graphical results derived from a repeated-measures study involving two timepoints. Which one is the significant result? [Update: I initially forgot to mention the bars are SEM]