The Summer Olympics are finally upon us. No doubt there will be some interesting sports doping cases arising. While we're waiting, might as well beat a dead horse and see if we can get anything out of it. The latest issue of Nature contains a commentary from Donald A. Berry on the "flawed statistics and flawed logic" of detecting sports doping. I'll get to that after the jump but first the Nature editorial team issued a fairly strident position:
Nature believes that accepting 'legal limits' of specific metabolites without such rigorous verification goes against the foundational standards of modern science, and results in an arbitrary test for which the rate of false positives and false negatives can never be known. By leaving these rates unknown, and by not publishing and opening to broader scientific scrutiny the methods by which testing labs engage in study, it is Nature's view that the anti-doping authorities have fostered a sporting culture of suspicion, secrecy and fear.
Preach on! [Update 8/7/08: roundup of commentary on this story from Trust but Verify blog]
Okay, back to the article.
I was struck by this comment:
Mass spectrometry requires careful sample handling, advanced technician training and precise instrument calibration. The process is unlikely to be error-free. Each of the various steps in handling, labelling and storing an athlete's sample represents opportunity for error.
which is familiar- I had pointed to a post at 49 percent which outlined exactly these issues with chemical analysis. Are we all on the same page yet? Even fancy-pants analysis using magic machines that go "ping" is not fool proof.
Then there is the question of statistical probability when you are dealing with any determination that has a false positive rate. Can you say "correction for multiple comparisons?"
Landis seemed to have an unusual test result. Because he was among the leaders he provided 8 pairs of urine samples (of the total of approximately 126 sample-pairs in the 2006 Tour de France). So there were 8 opportunities for a true positive -- and 8 opportunities for a false positive. If he never doped and assuming a specificity of 95%, the probability of all 8 samples being labelled 'negative' is the eighth power of 0.95, or 0.66. Therefore, Landis's false-positive rate for the race as a whole would be about 34%. Even a very high specificity of 99% would mean a false-positive rate of about 8%. The single-test specificity would have to be increased to much greater than 99% to have an acceptable false-positive rate. But we don't know the single-test specificity because the appropriate studies have not been performed or published.[emphasis added]
Plots show the distribution of 167 samples of the
metabolites etiocholanone and 5 β-androstanediol
(a, b), and androsterone and 5 α-androstanediol
(c, d). Panels b and d show samples the French national
anti-doping laboratory (LNDD) designate to be 'positive'
(red crosses) or 'negative' (green dots); the values from
Landis's second sample from stage 17 is shown as a blue
dot. Axes display delta notation, expressing isotopic
composition of a sample relative to a reference compound. The point the author is making with this last is that we (the public) have very little knowledge of how these cutoffs and criteria for positive/negative decision about doping have been constructed. Even once you get past the question of whether the analytical part of the equation is "good", meaning the values are correct/true/accurate, the next question to ask is the interpretive one. How good is our knowledge of what various doping-related-indices should look like under conditions of athletic stress similar to Stage 17 of the Tour de France? Remember, one has to have known doping and known non-doping samples to make the baselines, does one not? I would imagine that the population of known and unknown doping samples is vanishingly small- all they have access to is the smallish population of samples from the actual racers. Of course they don't know for sure who is and is not doping!
Finally, you just have to review Figure 1.
During arbitration and in response to appeals from Landis, the LNDD provided the results of its androgen metabolite tests for 139 'negative' cases, 27 'positive' cases, and Landis's stage 17 results (see Fig. 1). These data were given to me by a member of Landis's defence team. The criteria used to discriminate a positive from a negative result are set by the World Anti-Doping Agency and are applied to these results in Fig. 1b and d. But we have no way of knowing which cases are truly positive and which are negative. It is proper to establish threshold values such as these, but only to define a hypothesis; a positive test criterion requires further investigation on known samples.[emphasis added]
No, I'm not really an expert but let's just say it looks....like data. You know, messy. Something we'd like to know a lot more about before we could say "Oh yeah, I totally get where the cutoff is!". Now let us remind ourselves, Landis lost all of his appeals. So the relevant boards of review were being bombarded with these data and, one presumes, much more. Presented by expert witnesses on both sides.
But still. This cycling fan would like a little more of the methods and validations for sports doping detection displayed. Bravo Nature for the editorial.