On learning from "doing the math"

Aug 20 2009 Published by under Conduct of Science, Statistical Reasoning

In case you have been living under a rock (yet inexplicably reading this blog) Usain Bolt has now run both the 100 meter and 200 meter track events faster than anyone ever. The margin of improvement in the 100 meter event (which occurred earlier) was sufficient to start the sports world abuzz. Naturally, sports fans are willing to talk endlessly about the most absurd minutia and implications of such an event in terms pedestrian and embarrassingly overwrought.
YHN is no different.


I was struck, however, by one particular approach to analyzing the performance of Usain Bolt in the 100 meter dash. As we see from this discussion, it is possible to take the world record 100 m times over the past several decades and construct a mathematical function which fits, more or less, the data.
100mCurveFit.jpg
source[see update!]
This is a not-uncommon approach in science. First, graph your data and take a look. See if there is something orderly about the data that allow you to predict other outcomes or other data in a series. Perhaps use some mathematics to make those predictions somewhat more quantitative, general or precise. All well and good.
Just so long as you don't forget something important.
This prediction approach has been blogged by Ethan Siegel over at Starts with a Bang. The trouble is I think he does a huge disservice to both sports fans and communicating the essential conduct of science* with his post. How so? Well Ethan generated his own version of the above figure and then made several key observations.

Luckily, simply modeling this mathematically -- by an exponential -- will tell us what the world record progression ought to look like, and should tell us what the theoretical limit of the human body is. Not only that, but we can predict what the future record ought to be. What do we find?
mathematically, it looks like the theoretical limit of how fast humans can run the 100 meter dash is somewhere around 9.2 seconds, but it looks like we won't get there for hundreds of years.

future100mrecord.jpg
Ethan Siegel's update
Okay, okay, he emphasized the mathematically part. Almost as if the obvious caveats were about to emerge. "Under these conditions". That is about the most general way to put it. The prediction into the future only holds as long as the conditions under which the data which were used to generate the prediction hold.
Sadly, Ethan fails to make this point and indeed sprints steps right into error (in the post, anyway).

So what do we learn, practically, from doing this math? That watching Usain Bolt run is like watching Bob Beamon's long jump in 1968; it's a record that should stand for at least a generation.

Gak. Let me get back to the proper way to look at the world record 100 m dash data and the predictive curve fitting. One of the best parts of science can be to look at your data and find something that looks funny**. To ask: Where do the data violate the trend?
Who cares about performances matching prediction? Violations of the expected are what is absolutely fascinating. Especially to the sports-fan scientists. Because now you get to engage in hypothesis generating. Which, in the sports fan, is otherwise known as bar-stool bullshitting. In science, however, this is what allows you to move forward in new, perhaps unexpected, perhaps fascinating, directions. Looking at the graph here, you might identify some outlier data that makes you wonder.
Why did 100 m dash times stagnate in the late seventies? A drop in public interest in track and field changing contingencies? Technological stall-out? Contingencies in other sporting endeavors poaching what would be the top sprinting talent?
Why did times start dropping again in the 80s-90s? [*cough*doping*cough*]
And now, how do we explain the performances of Usain Bolt?
Leaving pharmacological doping aside, since current testing has been unable to find any evidence of this with Bolt to date, we have a description, if not a mechanistic explanation. In a radio interview I heard on NPR, Professor Peter Weyand, of Southern Methodist University [also quoted on the topic here] pointed out that Bolt is unusually tall for a world-class 100 m sprinter. That he started the race as quickly as the more traditional (shorter) sprinter but then finished the race faster (i.e., consistent with a taller runner). In short this individual sprinter is a violation of previously existing conditions.
So first of all, we get a slap upside the head about the mathematical modeling. Conditions have changed. Our assumptions about a rule of the physiology of elite sprinters have changed. It is possible to find a tall man who starts the race as quickly as the previous 100 m champion phenotype. (Actually, once you accept this, is Bolt really so surprising?) We need to fit a new function.
Will Bolt's record stand for a generation? Maybe. Maybe not. Perhaps highly promising young runners who coaches "know" are "too tall" to be 100 m runners will not be pushed into the 400 m + events but rather trained for the 100 and 200 m events? Or coaches will focus their runner selection on the start of the sprint in taller runners? Or is Usain Bolt simply an outlier?
Final points from the world of pro cycling. Miguel Indurain absolutely dominated the Tour de France in the early 1990s, winning 5 in a row. He was a violation of type, at 1.88 m (6 ft 2 in) and 80 kg (176 lbs) however it was assumed that a generous cardiovascular endowment that was consistent with type enabled him to excel. Not knowing this, one might assume towards the end of the Indurain era that many subsequent Grand Tour cyclists would be big guys (who, btw, focused exclusively on the Tour events). No evidence of this yet so we must assume he was indeed something of an outlier with his cardiovascular superiority compensating for his large(r) size.
The world hour record is a performance benchmark more similar to track and field in that ultimately it is one person against the clock under more or less fixed conditions. In this case, the 1990s heralded some performances that violated the mathematical prediction of prior data (page down to the bottom of the wikipedia article). The reason in this case was technological advances in the aerodynamic efficiency of bike and rider (clothing and positioning). A change of the conditions under which the data were collected. Ultimately the sanctioning body artificially restored the trendline by outlawing the technological advances.
Okay, irritation with the unthinking application of mathematics in a case which calls for, you know, thinking scratched. These Bolt performances are awesome, aren't they?
[UPDATE: As Isis kindly points out in a comment, the first graph comes from a much more sophisticated modeling procedure than does the second. I gave an erroneous impression by not detailing this. I would refer you to the website of one Jonas Mureika, Associate Professor of Physics, LMU-LA which even includes some collected sprint-event datasets! I would have to suggest that Mureika's approach certainly adheres to my main points, even if MSM coverage appropriates graphs for muddled interpretation. Still, none of this excuses Ethan Siegel's analysis blog post :-)]
__
*I am trying very hard not to make this about a certain I-have-a-hammer and I-know-how-to-use-it blindness of particular mathematically oriented scientific disciplines.
**My new favorite is the apparently well worn cliche that science proceeds not through Eureka! but rather through "huh,...that's funny"
[h/t for some links: Isis]

14 responses so far

  • DM,
    Don't you think that modeling that predicts times decades into the future takes into account, at some level, violations of existing conditions both environmental and personal? After all, there were rule-breakers that enabled the 0.8 second drop in the 100m record time over the past century
    Anyway, Bolt's runs are indeed awesome.

  • Robert says:

    Until recently, scientists claimed that all world class male sprinters had heights normally distributed in the range of 5'6" to 6'3" and sprinters taller or shorter had little chance of being world class.
    It's not clear why Bolt is an outlier and what the future trends will be.
    Perhaps there were no tall sprinters because tall athletes had always been told that they could not be sprinters, and so they didn't submit themselves to the years of rigorous training. In other words, maybe the scientists' assumption resulted in a self-fulfilling prophesy that "proved" the scientific "truth."
    And tall athletes tend to play the highest paying sports, except perhaps in Jamaica where there is there a special emphasis on track.
    My guess is scientists made the mistake of thinking that athletes of all heights made equal efforts to be world class sprinters. But what if that assumption was always false?

  • DrugMonkey says:

    AM- sure...within some limit. The interesting thing is shooting the breeze about which change of circumstances violates the assumed conditions enough to require a new function. Tech is easy to ID- fancy swimsuits, aero bars in cycling, golf club designs... Selection of athletes, training practices, etc are harder to ID.

  • The interesting thing about Mureika's, which you show in the first figure, is that he has looked at how different factors influence his data -- wind resistance. Altitude. Drag. Starting velocity. It's not as simple as taking some data, fitting a curve, and "mathematically" predicting a limit on performance. Mureika (who is in a computer science dept and is and fascinating dude) has literally applied a mathematical model with unique variable constructs to his model and looked at the influences these variables have on performance. The difference between what Ethan did and what Mureika did is that Ethan looked at the data and drew a curve. Mureika created his model, calculated the predicited values based on the variables, and then tested whether the data fit the model. At the time, they did. I imagine, if asked, Mureika might say that the new data do not support his model and the addition of novel variables is appropriate.
    Mureika's 2001 model seems not to fit the most recent data, which would lead me to wonder what variable is missing from his equation?

  • DrugMonkey says:

    Thanks for the additional details on Mureika's analysis, Isis.
    I imagine a great deal of entertainment and cool prediction is available from considering not just the world record but perhaps the top 50 times recorded for each year?
    Hmm. Now and again the world doping folks in cycling come out with pronouncements about how they "know" there is some new doping product, they just haven't caught it yet. I wonder if analysis of lots of performance data for the pool of, e.g., professional cyclists, allows them to identify better-than-predicted trends...?

  • I am not as familiar with life in the cycling world as I am in the running world. If'n I were a betting woman, though, I would predict that is exactly how they do it.

  • Eli Rabett says:

    Ben Johnson springs to mind.
    The point is that the Jamaican team in general has stepped it up which means that better athletics through better chemistry is out there as a real possibility. It is not outside the realm of reality.

  • Eric Johnson says:

    > Conditions have changed. Our assumptions about a rule of the physiology of elite sprinters have changed. It is possible to find a tall man who starts the race as quickly as the previous 100 m champion phenotype.
    Your analysis is slightly less penetrating than you think it is. Almost every person who moved the world record, not just Bolt, had some unique feature or features. Thus, we could have said "conditions have changed" after nearly every new world record. To the extent that it works, we expect the model to do fairly well even when these dozens or thousands of different variables are not specifically addressed.
    Meanwhile, the laws which make tall people unlikely to start really fast have of course not changed. And our assumptions about the starts of tall 100-m runners haven't necessarily changed; strictly speaking they were probably always probabilistic ones, of the form "a world-class man of height H has expectation E and standard deviation S for the time he needs to cover the first twenty meters."
    But yeah, the idea that the data should really nicely fit such a simple model, in the first place, is pretty questionable. I'll bet that if you looked at more data - that is, a lot of different sporting records - the average validity of models like this one probably wouldn't be so hot.
    That said, times have taken a heck of a dip recently. And there's been a lot of revelations, recently, about chemical cheating in sports. How are you supposed to look at anyone at the world record level and say, "I'm 99% certain he's clean," or even 95% certain - that is, literally willing to take a bet on it at 19-to-1 odds. Unless you know the guy personally and quite closely, I don't see how you could. And my personal response is to not care about sports.

  • Eric Johnson says:

    I mean really... can anyone name just one world-class-victorious athlete for whom you'd give even just 4-to-1 odds that he's clean, ie you are 80% certain. Can you name three or four? Ten?
    Remember, we are talking hard cash money (and experiments show that people are on average quite overconfident, quantitatively, in their predictions).
    Sounds like a pretty bad investment.

  • Eli, Eric,
    Or it may just be what Robert said, esp in his last 2 paras. You have to consider the possibility that athletes like Randy Moss, Deion Sanders etc may have been rule-breaking sprinters if they grew up in a country where track was the major sport and the possible ticket to riches.
    And Eric, I have one name for you: Eldrick Tiger Woods.

  • Eric Johnson says:

    Yeah, I'm sure there are some top athletes who are clean. And I feel bad for them; it really galls to be suspected of lying when you're not.
    Yet considering the supreme incentives for hiding one's doping, it's pretty hard to imagine that there aren't at least one or two never-busted doper champs for each one that has been busted, if not more. And so dang many have been busted, at least in cycling, track, and baseball. Would people still do it if there were an 80% chance of getting caught? 50%? That's not inconceivable but it seems pretty unlikely.

  • antipodean says:

    I can't believe that on this scienceblog nobody has pointed out that you can't really model data like this by plonking down a best fit curve.
    It's world record data. It can't get worse. So the best fit line can't have random scatter around it. All of the observations are constrained by the previous observation.
    Would be a great graduate school class "what's wrong with this picture" question.

  • Eric Johnson says:

    > All of the observations are constrained by the previous observation.
    Your point is intriguing. Nevertheless, if data points do fit to a certain curve, they fit it. That's all there is to it. The fact that the data points are not independent may have some interesting consequences, but if so they are beyond me off the top of my head - and possibly beyond you as well.

  • Somewhat offtopic but... a great mathematician Israel Gelfand who has also made important contributions to biology died yesterday.

Leave a Reply