The Statistics of Sports Doping

Aug 06 2008 Published by under Cycling, Doping

The Summer Olympics are finally upon us. No doubt there will be some interesting sports doping cases arising. While we're waiting, might as well beat a dead horse and see if we can get anything out of it. The latest issue of Nature contains a commentary from Donald A. Berry on the "flawed statistics and flawed logic" of detecting sports doping. I'll get to that after the jump but first the Nature editorial team issued a fairly strident position:

Nature believes that accepting 'legal limits' of specific metabolites without such rigorous verification goes against the foundational standards of modern science, and results in an arbitrary test for which the rate of false positives and false negatives can never be known. By leaving these rates unknown, and by not publishing and opening to broader scientific scrutiny the methods by which testing labs engage in study, it is Nature's view that the anti-doping authorities have fostered a sporting culture of suspicion, secrecy and fear.

Preach on! [Update 8/7/08: roundup of commentary on this story from Trust but Verify blog]


Okay, back to the article.
I was struck by this comment:

Mass spectrometry requires careful sample handling, advanced technician training and precise instrument calibration. The process is unlikely to be error-free. Each of the various steps in handling, labelling and storing an athlete's sample represents opportunity for error.

which is familiar- I had pointed to a post at 49 percent which outlined exactly these issues with chemical analysis. Are we all on the same page yet? Even fancy-pants analysis using magic machines that go "ping" is not fool proof.
Then there is the question of statistical probability when you are dealing with any determination that has a false positive rate. Can you say "correction for multiple comparisons?"

Landis seemed to have an unusual test result. Because he was among the leaders he provided 8 pairs of urine samples (of the total of approximately 126 sample-pairs in the 2006 Tour de France). So there were 8 opportunities for a true positive -- and 8 opportunities for a false positive. If he never doped and assuming a specificity of 95%, the probability of all 8 samples being labelled 'negative' is the eighth power of 0.95, or 0.66. Therefore, Landis's false-positive rate for the race as a whole would be about 34%. Even a very high specificity of 99% would mean a false-positive rate of about 8%. The single-test specificity would have to be increased to much greater than 99% to have an acceptable false-positive rate. But we don't know the single-test specificity because the appropriate studies have not been performed or published.[emphasis added]

Berry08-Fig1-sm.jpg
Plots show the distribution of 167 samples of the
metabolites etiocholanone and 5 β-androstanediol
(a, b), and androsterone and 5 α-androstanediol
(c, d). Panels b and d show samples the French national
anti-doping laboratory (LNDD) designate to be 'positive'
(red crosses) or 'negative' (green dots); the values from
Landis's second sample from stage 17 is shown as a blue
dot. Axes display delta notation, expressing isotopic
composition of a sample relative to a reference compound.
The point the author is making with this last is that we (the public) have very little knowledge of how these cutoffs and criteria for positive/negative decision about doping have been constructed. Even once you get past the question of whether the analytical part of the equation is "good", meaning the values are correct/true/accurate, the next question to ask is the interpretive one. How good is our knowledge of what various doping-related-indices should look like under conditions of athletic stress similar to Stage 17 of the Tour de France? Remember, one has to have known doping and known non-doping samples to make the baselines, does one not? I would imagine that the population of known and unknown doping samples is vanishingly small- all they have access to is the smallish population of samples from the actual racers. Of course they don't know for sure who is and is not doping!
Finally, you just have to review Figure 1.

During arbitration and in response to appeals from Landis, the LNDD provided the results of its androgen metabolite tests for 139 'negative' cases, 27 'positive' cases, and Landis's stage 17 results (see Fig. 1). These data were given to me by a member of Landis's defence team. The criteria used to discriminate a positive from a negative result are set by the World Anti-Doping Agency and are applied to these results in Fig. 1b and d. But we have no way of knowing which cases are truly positive and which are negative. It is proper to establish threshold values such as these, but only to define a hypothesis; a positive test criterion requires further investigation on known samples.[emphasis added]

No, I'm not really an expert but let's just say it looks....like data. You know, messy. Something we'd like to know a lot more about before we could say "Oh yeah, I totally get where the cutoff is!". Now let us remind ourselves, Landis lost all of his appeals. So the relevant boards of review were being bombarded with these data and, one presumes, much more. Presented by expert witnesses on both sides.
But still. This cycling fan would like a little more of the methods and validations for sports doping detection displayed. Bravo Nature for the editorial.

13 responses so far

  • phisrow says:

    I hope that the WADA people have a strong commitment to justice and good science, or this seems very unlikely to go well. In a situation of uncertainty and questioning by outsiders, the temptation to circle the wagons, deny the problem, and fall back on your authority is awefully strong. And completely the wrong thing to do.

  • JW Tan says:

    Doesn't the test of the B-sample help detect and weed out false positives?

  • Robert says:

    As I recall, the labs did a isotope test on the testosterone found in Landis' system, and found it was of exogenous origin. So maybe the application or interpretation of the test to Landis wasn't so inaccurate...

  • Ian says:

    "The Summer Olympics are finally upon us."
    Yawn. Let's get obsessive-compulsive about people we don't know and will forget in a few weeks and then do it all again in four years time like this year never happened....

  • Samia says:

    Eff the Olympics.

  • JSinger says:

    I would imagine that the population of known and unknown doping samples is vanishingly small- all they have access to is the smallish population of samples from the actual racers. Of course they don't know for sure who is and is not doping!
    There is certainly a reasonable sample of known positives accumulated from admitted users. True negatives are, as you say, a lot harder to come by.

  • DrugMonkey says:

    Doesn't the test of the B-sample help detect and weed out false positives?
    It helps, yes. Depends on the source of the variance that leads to false positives however. If that variance arises at the source the A and B samples will result in the same false positive. Individual variation, source contamination, etc. And even if one is still in the arena of random variability in the analysis one just needs to adjust the statistical cutoffs but the points raised by Berry are still valid. Until one knows what the error rate is one does not know how to estimate the probability of A and B samples turning up as false positives.
    the labs did a isotope test on the testosterone found in Landis' system, and found it was of exogenous origin. So maybe the application or interpretation of the test to Landis wasn't so inaccurate...
    Sure. I'm not arguing Landis was innocent of exogenous testosterone doping. I like to talk about the way the process leaves a bad taste in the mouth of fan-scientists and hopefully in a way to get non-scientists to consider that the nice and tidy representation of analytical science presented on CSI and the like is not that accurate. I think you can see that I am not alone in this from the Nature approach. I haven't looked into the isotope analyses but I would be very surprised if there was less than a rock solid black/white on that one as well. It cuts down on the possible sources of error, no doubt. Throws most of the Landis 'defense' back to questioning the veracity of the lab, that sort of thing...
    Eff the Olympics.
    Dang it Samia, I was hoping you'd look into that testosterone / exogenous / endogenous isotope analysis stuff for us!!!
    There is certainly a reasonable sample of known positives accumulated from admitted users.
    Really? How many admitted testosterone (or EPO or that new EPO or ....) doped cyclists does WADA/whatever have blood samples from? How many timepoints before and after the admitted doping? Were these in the context of the middle of hard stage races? ....
    I'm not trying to be a denialist here. As I usually say, the system says Landis doped. I'm going with that. but the scientist and fan in me wants to see the validation data. T

  • DrugMonkey says:

    This story is getting a bit of play. A roundup of commentary on this story is up over at the Trust but Verify blog.

  • Samia says:

    I actually think everyone should boycott the Olympics. But, you know, I'm just CRAYZAY like that.

  • Nic says:

    Wouldn't the use of the biological passport system help prevent this issue in the first place?

  • DrugMonkey says:

    Wouldn't the use of the biological passport system help prevent this issue in the first place?
    It would help a lot with the individual-differences factor, yes. Assuming the passport was created with random surprise sampling over a lengthy period so that an athlete couldn't game their baseline...

  • KeithB says:

    I suggest that they use just-retired cyclists or those on suspension for doping 8^) and dope them. Let them run practice races in as-real-as-possible conditions. You could even double blind things - take 10 volunteers and dope some but not the others.

  • DrugMonkey says:

    KB FTW! I love it! Great idea, esp the suspended or confessed dopers.

Leave a Reply