Page 2 of 2 FirstFirst 12
Results 11 to 16 of 16

Thread: BGR data

  1. #11
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,823
    I've got hold of a Javascript statistics package and done a little analysis. The following listing has the completions in bins or buckets of five years to make things a bit more manageable. Also some of the early 1970s only had a handful of successes each year so there simply aren't enough data points to make a valid analysis, even so the numbers are pretty low. The data used is from the years 1971 to 2022 inclusive.

    If you aren't used to some of the terminology here's a quick explanation,

    quartiles: these are the 25% and 75% points along the data. I.e. 50% of the time lie between these two points. It's a simple version of the standard deviation.
    mean: the arithmetical average - add up all the times and divide by the number of items.
    middle: simply the halfway point between min and max values. 1, 2, 5, 6, 10 would have a middle of 5.5
    median: the middle value in the series. 1, 2, 5, 6, 10 would have a median of 5
    Standard deviation: describes how the data is clustered around the mean. A low value means the data is mostly close to the mean. 68.2% of the values lie within this distance from the mean.
    skewness: describes how "normal" the distribution is. Zero (more practically between -0.5 and 0.5) would be a perfectly symmetrical distribution where the mean and median are the same value. A value lower than -1 or higher than 1 indicates very skewed data. A negative value means the outliers are to the left, a positive value means the outliers are to the right.
    Kurtosis: describes how much the data conforms to a normal distribution which has a value of 3. Values below this mean fewer outliers (or a higher central peak), values above 3 mean more outliers in the tails, i.e. a flatter curve.

    The quartiles show a quite consistent pattern, namely that the second value, the 75% percentile is always between 23:20 and 23:40, that is 25% of all completions take more than that time. The 25% percentile is more varied, generally between 22:30 and 22:00 but does appear to be dropping in the more recent buckets. This is likely due to the larger number of very fast times in recent years. Prior to the year 2000 there were just 19 sub 18hr times but there's been double that number since then.

    The skewness shows a consistent centring of times towards the slower times, which is to be expected - there are nearly twice as many finishing times in the 23:00 to 23:59 range as there are in the 22:00 to 22:59 range. (This halving actually continues down to around 17hr mark).

    This "strange" distribution might make any use of standard statistics a bit awkward. It comes about because any finishing time over 24hrs has been discarded - it's a bit like analysing times for the mile and ignoring any time over four minutes.

    Anyway, here's the data.

    [
    {
    "bin": "1970 - 1974",
    "count": 24,
    "stats": {
    "quartiles": [
    21:51, // 25%
    23:20 // 75%
    ],
    "max": "23:42",
    "min": "20:37",
    "mean": "22:24",
    "median": "22:22",
    "middle": "22:10",
    "stdDeviation": "01:02",
    "skewness": -0.264,
    "kurtosis": 1.770
    }
    },
    {
    "bin": "1975 - 1979",
    "count": 113,
    "stats": {
    "quartiles": [
    22:01,
    23:21
    ],
    "max": "23:57",
    "min": "17:45",
    "mean": "22:26",
    "median": "22:57",
    "middle": "20:51",
    "stdDeviation": "01:18",
    "skewness": -1.375,
    "kurtosis": 4.281
    }
    },
    {
    "bin": "1980 - 1984",
    "count": 182,
    "stats": {
    "quartiles": [
    22:24,
    23:34
    ],
    "max": "23:56",
    "min": "14:56",
    "mean": "22:43",
    "median": "23:13",
    "middle": "19:26",
    "stdDeviation": "01:19",
    "skewness": -2.377,
    "kurtosis": 10.679
    }
    },
    {
    "bin": "1985 - 1989",
    "count": 333,
    "stats": {
    "quartiles": [
    22:05,
    23:25
    ],
    "max": "23:59",
    "min": "17:40",
    "mean": "22:40",
    "median": "22:56",
    "middle": "20:49",
    "stdDeviation": "01:02",
    "skewness": -1.223,
    "kurtosis": 4.555
    }
    },
    {
    "bin": "1990 - 1994",
    "count": 242,
    "stats": {
    "quartiles": [
    22:27,
    23:38
    ],
    "max": "23:59",
    "min": "18:18",
    "mean": "22:50",
    "median": "23:09",
    "middle": "21:08",
    "stdDeviation": "01:04",
    "skewness": -1.651,
    "kurtosis": 5.937
    }
    },
    {
    "bin": "1995 - 1999",
    "count": 241,
    "stats": {
    "quartiles": [
    22:43,
    23:35
    ],
    "max": "23:57",
    "min": "17:48",
    "mean": "22:58",
    "median": "23:15",
    "middle": "20:52",
    "stdDeviation": "00:58",
    "skewness": -2.580,
    "kurtosis": 11.489
    }
    },
    {
    "bin": "2000 - 2004",
    "count": 100,
    "stats": {
    "quartiles": [
    22:36,
    23:39
    ],
    "max": "23:56",
    "min": "17:00",
    "mean": "22:54",
    "median": "23:16",
    "middle": "20:29",
    "stdDeviation": "01:04",
    "skewness": -2.372,
    "kurtosis": 11.263
    }
    },
    {
    "bin": "2005 - 2009",
    "count": 290,
    "stats": {
    "quartiles": [
    22:13,
    23:32
    ],
    "max": "23:59",
    "min": "17:08",
    "mean": "22:38",
    "median": "23:00",
    "middle": "20:33",
    "stdDeviation": "01:15",
    "skewness": -1.84,
    "kurtosis": 6.88
    }
    },
    {
    "bin": "2010 - 2014",
    "count": 376,
    "stats": {
    "quartiles": [
    21:59,
    23:29
    ],
    "max": "23:59",
    "min": "14:58",
    "mean": "22:33",
    "median": "22:56",
    "middle": "19:29",
    "stdDeviation": "01:18",
    "skewness": -1.77,
    "kurtosis": 7.344
    }
    },
    {
    "bin": "2015 - 2019",
    "count": 479,
    "stats": {
    "quartiles": [
    21:48,
    23:30
    ],
    "max": "23:59",
    "min": "12:52",
    "mean": "22:24",
    "median": "22:49",
    "middle": "18:25",
    "stdDeviation": "01:30",
    "skewness": -2.0462,
    "kurtosis": 9.343
    }
    },
    {
    "bin": "2020 - 2022",
    "count": 329,
    "stats": {
    "quartiles": [
    21:39,
    23:23
    ],
    "max": "23:57",
    "min": "12:22",
    "mean": "22:07",
    "median": "22:44",
    "middle": "18:10",
    "stdDeviation": "01:49",
    "skewness": -1.912,
    "kurtosis": 7.816
    }
    }
    ]
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

  2. #12
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,823
    Anyway, sometimes a picture paints a thousand words and all that.

    Here's age (on the x-axis) vs time (err, that'll be on the y-axis then). The lopsided distribution of times, i.e. nothing over 24hrs, that I mentioned is obvious. I'm not sure that the skewness would be removed even if we recorded *all* finishing times.

    ag-time.jpg

    No real pattern that I can discern (I've also checked the rank correlation and it's around 0.17 which basically means no discernible correlation) but if you draw a straight line from the 60yr-24hr intersection on the x-axis to the 30yr-12hr intersection at the top of the graph (or do as I did and put a ruler up against the screen) then there are very few times up and right of that line - I counted fewer than 25. So age does eventually make a difference. Also easy to see are that there are only 9 times quicker than 18hrs by those aged 40 or over.

    Since 2012 I've been recording the number of previous attempts. Prior to this the data is very patchy and mostly depends on me noticing something like: "after two failed attempts ..." in an individual's report. I've 160 individuals recorded as having taken more than one attempt 2012 onwards and have found 60 from the earlier records. As such you can't really compare this to the full data set but on their second attempt only 6 persons had a time quicker than 21hrs, for those on their third attempt only one person managed the round in under 22hrs.

    I've also plotted the distribution of successful attempts across the year by week number. Up until around the millenium the vast majority of rounds were done between weeks 22 and 30, very roughly the months of June and July. Since then the "season" has expanded somewhat and the successes lie between weeks 18 & 35 so basically May to August but there are still significant numbers four weeks either side of those. Of course there are also the handful of winter attempts outside these bounds.

    year-weeknum.jpg

    Perhaps the main problem in trying to draw any sort of conclusion from the data is that there are so many variables. It's not like in track and field athletics where the conditions are pretty uniform, if there's too much of a following wind then record times don't count, etc., if a strong contender experiences a stiff headwind for a couple of legs then they might not be submitting as fast a time as they otherwise might.

    End of geek mode (maybe )
    Last edited by Bob; 05-12-2023 at 08:03 PM.
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

  3. #13
    Master
    Join Date
    Jan 2015
    Location
    Within sight of Leicestershire's Beacon Hill
    Posts
    2,443
    Bob, your first graph gives "Invalid attachment specified" when I click on the link. Can you try re-posting?
    In his lifetime he suffered from unreality, as do so many Englishmen.
    Jorge Luis Borges

  4. #14
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,823
    Quote Originally Posted by anthonykay View Post
    Bob, your first graph gives "Invalid attachment specified" when I click on the link. Can you try re-posting?
    Oops!

    Now fixed :-)
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

  5. #15
    Master
    Join Date
    Jan 2015
    Location
    Within sight of Leicestershire's Beacon Hill
    Posts
    2,443
    Quote Originally Posted by Bob View Post
    Oops!

    Now fixed :-)
    Great. Regarding your observation that age makes a difference, this is also true at the younger end of the range, e.g. with the youngest sub-18-hour round by a 25-year-old, and younger runners (below 25 years) tending to take longer times.

    Incidentally, I know who the 71-year-old is who completed in a little over 22 hours, but who was the 13-year-old who just scraped in below the 24 hours?
    In his lifetime he suffered from unreality, as do so many Englishmen.
    Jorge Luis Borges

  6. #16
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,823
    Ben Squibb way back in 1992 with a time of 23:51.

    Membership lists at http://bobgrahamclub.org.uk/index.php?page=members and record lists (including the youngest, oldest, etc) at http://bobgrahamclub.org.uk/index.php?page=records.

    There's a list of the fastest teenagers (23 of them to date), only one is under 20hrs though the next was just 6mins over that mark.

    The lists are correct to the end of the year prior to the current one, i.e. they are currently to the end of 2022, on 1st Jan 2024 they will be good to the end of 2023 and so on going forward.
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •