Page 2 of 2 FirstFirst 12
Results 11 to 19 of 19

Thread: BGR data

  1. #11
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,832
    I've got hold of a Javascript statistics package and done a little analysis. The following listing has the completions in bins or buckets of five years to make things a bit more manageable. Also some of the early 1970s only had a handful of successes each year so there simply aren't enough data points to make a valid analysis, even so the numbers are pretty low. The data used is from the years 1971 to 2022 inclusive.

    If you aren't used to some of the terminology here's a quick explanation,

    quartiles: these are the 25% and 75% points along the data. I.e. 50% of the time lie between these two points. It's a simple version of the standard deviation.
    mean: the arithmetical average - add up all the times and divide by the number of items.
    middle: simply the halfway point between min and max values. 1, 2, 5, 6, 10 would have a middle of 5.5
    median: the middle value in the series. 1, 2, 5, 6, 10 would have a median of 5
    Standard deviation: describes how the data is clustered around the mean. A low value means the data is mostly close to the mean. 68.2% of the values lie within this distance from the mean.
    skewness: describes how "normal" the distribution is. Zero (more practically between -0.5 and 0.5) would be a perfectly symmetrical distribution where the mean and median are the same value. A value lower than -1 or higher than 1 indicates very skewed data. A negative value means the outliers are to the left, a positive value means the outliers are to the right.
    Kurtosis: describes how much the data conforms to a normal distribution which has a value of 3. Values below this mean fewer outliers (or a higher central peak), values above 3 mean more outliers in the tails, i.e. a flatter curve.

    The quartiles show a quite consistent pattern, namely that the second value, the 75% percentile is always between 23:20 and 23:40, that is 25% of all completions take more than that time. The 25% percentile is more varied, generally between 22:30 and 22:00 but does appear to be dropping in the more recent buckets. This is likely due to the larger number of very fast times in recent years. Prior to the year 2000 there were just 19 sub 18hr times but there's been double that number since then.

    The skewness shows a consistent centring of times towards the slower times, which is to be expected - there are nearly twice as many finishing times in the 23:00 to 23:59 range as there are in the 22:00 to 22:59 range. (This halving actually continues down to around 17hr mark).

    This "strange" distribution might make any use of standard statistics a bit awkward. It comes about because any finishing time over 24hrs has been discarded - it's a bit like analysing times for the mile and ignoring any time over four minutes.

    Anyway, here's the data.

    [
    {
    "bin": "1970 - 1974",
    "count": 24,
    "stats": {
    "quartiles": [
    21:51, // 25%
    23:20 // 75%
    ],
    "max": "23:42",
    "min": "20:37",
    "mean": "22:24",
    "median": "22:22",
    "middle": "22:10",
    "stdDeviation": "01:02",
    "skewness": -0.264,
    "kurtosis": 1.770
    }
    },
    {
    "bin": "1975 - 1979",
    "count": 113,
    "stats": {
    "quartiles": [
    22:01,
    23:21
    ],
    "max": "23:57",
    "min": "17:45",
    "mean": "22:26",
    "median": "22:57",
    "middle": "20:51",
    "stdDeviation": "01:18",
    "skewness": -1.375,
    "kurtosis": 4.281
    }
    },
    {
    "bin": "1980 - 1984",
    "count": 182,
    "stats": {
    "quartiles": [
    22:24,
    23:34
    ],
    "max": "23:56",
    "min": "14:56",
    "mean": "22:43",
    "median": "23:13",
    "middle": "19:26",
    "stdDeviation": "01:19",
    "skewness": -2.377,
    "kurtosis": 10.679
    }
    },
    {
    "bin": "1985 - 1989",
    "count": 333,
    "stats": {
    "quartiles": [
    22:05,
    23:25
    ],
    "max": "23:59",
    "min": "17:40",
    "mean": "22:40",
    "median": "22:56",
    "middle": "20:49",
    "stdDeviation": "01:02",
    "skewness": -1.223,
    "kurtosis": 4.555
    }
    },
    {
    "bin": "1990 - 1994",
    "count": 242,
    "stats": {
    "quartiles": [
    22:27,
    23:38
    ],
    "max": "23:59",
    "min": "18:18",
    "mean": "22:50",
    "median": "23:09",
    "middle": "21:08",
    "stdDeviation": "01:04",
    "skewness": -1.651,
    "kurtosis": 5.937
    }
    },
    {
    "bin": "1995 - 1999",
    "count": 241,
    "stats": {
    "quartiles": [
    22:43,
    23:35
    ],
    "max": "23:57",
    "min": "17:48",
    "mean": "22:58",
    "median": "23:15",
    "middle": "20:52",
    "stdDeviation": "00:58",
    "skewness": -2.580,
    "kurtosis": 11.489
    }
    },
    {
    "bin": "2000 - 2004",
    "count": 100,
    "stats": {
    "quartiles": [
    22:36,
    23:39
    ],
    "max": "23:56",
    "min": "17:00",
    "mean": "22:54",
    "median": "23:16",
    "middle": "20:29",
    "stdDeviation": "01:04",
    "skewness": -2.372,
    "kurtosis": 11.263
    }
    },
    {
    "bin": "2005 - 2009",
    "count": 290,
    "stats": {
    "quartiles": [
    22:13,
    23:32
    ],
    "max": "23:59",
    "min": "17:08",
    "mean": "22:38",
    "median": "23:00",
    "middle": "20:33",
    "stdDeviation": "01:15",
    "skewness": -1.84,
    "kurtosis": 6.88
    }
    },
    {
    "bin": "2010 - 2014",
    "count": 376,
    "stats": {
    "quartiles": [
    21:59,
    23:29
    ],
    "max": "23:59",
    "min": "14:58",
    "mean": "22:33",
    "median": "22:56",
    "middle": "19:29",
    "stdDeviation": "01:18",
    "skewness": -1.77,
    "kurtosis": 7.344
    }
    },
    {
    "bin": "2015 - 2019",
    "count": 479,
    "stats": {
    "quartiles": [
    21:48,
    23:30
    ],
    "max": "23:59",
    "min": "12:52",
    "mean": "22:24",
    "median": "22:49",
    "middle": "18:25",
    "stdDeviation": "01:30",
    "skewness": -2.0462,
    "kurtosis": 9.343
    }
    },
    {
    "bin": "2020 - 2022",
    "count": 329,
    "stats": {
    "quartiles": [
    21:39,
    23:23
    ],
    "max": "23:57",
    "min": "12:22",
    "mean": "22:07",
    "median": "22:44",
    "middle": "18:10",
    "stdDeviation": "01:49",
    "skewness": -1.912,
    "kurtosis": 7.816
    }
    }
    ]
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

  2. #12
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,832
    Anyway, sometimes a picture paints a thousand words and all that.

    Here's age (on the x-axis) vs time (err, that'll be on the y-axis then). The lopsided distribution of times, i.e. nothing over 24hrs, that I mentioned is obvious. I'm not sure that the skewness would be removed even if we recorded *all* finishing times.

    ag-time.jpg

    No real pattern that I can discern (I've also checked the rank correlation and it's around 0.17 which basically means no discernible correlation) but if you draw a straight line from the 60yr-24hr intersection on the x-axis to the 30yr-12hr intersection at the top of the graph (or do as I did and put a ruler up against the screen) then there are very few times up and right of that line - I counted fewer than 25. So age does eventually make a difference. Also easy to see are that there are only 9 times quicker than 18hrs by those aged 40 or over.

    Since 2012 I've been recording the number of previous attempts. Prior to this the data is very patchy and mostly depends on me noticing something like: "after two failed attempts ..." in an individual's report. I've 160 individuals recorded as having taken more than one attempt 2012 onwards and have found 60 from the earlier records. As such you can't really compare this to the full data set but on their second attempt only 6 persons had a time quicker than 21hrs, for those on their third attempt only one person managed the round in under 22hrs.

    I've also plotted the distribution of successful attempts across the year by week number. Up until around the millenium the vast majority of rounds were done between weeks 22 and 30, very roughly the months of June and July. Since then the "season" has expanded somewhat and the successes lie between weeks 18 & 35 so basically May to August but there are still significant numbers four weeks either side of those. Of course there are also the handful of winter attempts outside these bounds.

    year-weeknum.jpg

    Perhaps the main problem in trying to draw any sort of conclusion from the data is that there are so many variables. It's not like in track and field athletics where the conditions are pretty uniform, if there's too much of a following wind then record times don't count, etc., if a strong contender experiences a stiff headwind for a couple of legs then they might not be submitting as fast a time as they otherwise might.

    End of geek mode (maybe )
    Last edited by Bob; 05-12-2023 at 08:03 PM.
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

  3. #13
    Master
    Join Date
    Jan 2015
    Location
    Within sight of Leicestershire's Beacon Hill
    Posts
    2,539
    Bob, your first graph gives "Invalid attachment specified" when I click on the link. Can you try re-posting?
    In his lifetime he suffered from unreality, as do so many Englishmen.
    Jorge Luis Borges

  4. #14
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,832
    Quote Originally Posted by anthonykay View Post
    Bob, your first graph gives "Invalid attachment specified" when I click on the link. Can you try re-posting?
    Oops!

    Now fixed :-)
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

  5. #15
    Master
    Join Date
    Jan 2015
    Location
    Within sight of Leicestershire's Beacon Hill
    Posts
    2,539
    Quote Originally Posted by Bob View Post
    Oops!

    Now fixed :-)
    Great. Regarding your observation that age makes a difference, this is also true at the younger end of the range, e.g. with the youngest sub-18-hour round by a 25-year-old, and younger runners (below 25 years) tending to take longer times.

    Incidentally, I know who the 71-year-old is who completed in a little over 22 hours, but who was the 13-year-old who just scraped in below the 24 hours?
    In his lifetime he suffered from unreality, as do so many Englishmen.
    Jorge Luis Borges

  6. #16
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,832
    Ben Squibb way back in 1992 with a time of 23:51.

    Membership lists at http://bobgrahamclub.org.uk/index.php?page=members and record lists (including the youngest, oldest, etc) at http://bobgrahamclub.org.uk/index.php?page=records.

    There's a list of the fastest teenagers (23 of them to date), only one is under 20hrs though the next was just 6mins over that mark.

    The lists are correct to the end of the year prior to the current one, i.e. they are currently to the end of 2022, on 1st Jan 2024 they will be good to the end of 2023 and so on going forward.
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

  7. #17
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,832
    A minor site update (there's some big ones on the way!)

    Someone asked me about the percentage of runners faster than a given time. That felt worthy of a bit of work, so...

    From a search on http://bobgrahamclub.org.uk/index.php?page=members if you click on a member's number you'll get the usual full set of info about their round(s) but now each standard round also has the original qualifying and overall rankings and percentiles (i.e. at the time of their round) and the current qualifying and overall rankings and percentiles plus how that round ranks by gender. An example (I've used dashes so the forum software doesn't scrunch things up):


    -------------------- Rank ------- Percentile
    Overall ------------ 250=/728 --- 66%
    Qualifying --------- 244/715 ---- 66%
    Current overall ---- 1020=/2864 - 64%
    Current qualifying - 989/2800 --- 65%
    Gender rank -------- 69/280 ----- 75%


    The above round is from 1991. What's interesting is that the percentile doesn't change much over time for most rounds. Very early rounds do change mostly because there's not that many rounds to compare against.
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

  8. #18
    Master Bob's Avatar
    Join Date
    Jan 2007
    Location
    Darkest eckythumpland
    Posts
    1,832
    I mentioned that there were some big updates on the way. Well some are here...

    New ratification form

    Since I took on the role of membership secretary, ratification has been via an Excel spreadsheet filled out by the contender and sent to myself. It evolved over the years but essentially it was still a spreadsheet and required about 250 - 300 different cells to be filled in with data - most of these were timings at each summit. A lot of work on the part of the contender.

    From my viewpoint the spreadsheet was limited in that there was no way to enforce data validity - I couldn't use VBA macros since there was no guarantee that the user would actually use Excel to fill in the form. As a result the form had quite a lot of text stating things like: "Enter a list of membership numbers of those successful rounds you have helped on, no other text" and people would put "1234 leg 1"!! Not what was asked for, the excuse being "I thought it would be helpful". If it was going to be helpful we'd ask for it

    Browsers have moved on (a lot) and many features that used to be provided by third party software are now built in. This allowed me to start afresh and build in traps for incorrect data etc and make it a lot easier for the user to enter data. A lot of the data (existing member info, prolific supporters, club names) is already available on the site, the form just needed to be able to access it.

    The new form thus pulls in:

    • Some personal data from the registration process
    • Basic round info, again from the registration process
    • Optionally lets the user upload GPX files or tracker data.
    • Gives access to the membership list data and the supporter list data


    Typically a user will only have to enter their age, postcode and club for personal data. Uploading tracker data or a GPX file fills in all their schedule data and completes the round info. Having access to the data lists means many only have to enter a membership number or surname and click a box or two to list their support team along with details of existing members they may have helped or are related to. A couple of clicks and the form is generated and downloaded.

    Membership search

    The search facility on the membership lists page was, to put it kindly, a little basic. It was what I could easily put together using the tools at the time and just let you search by year, surname and club name. When someone succeeds on the round I send them a questionnaire asking what worked, what didn't and what new features they'd like to see.

    One feature requested by about half a dozen people was "I'd like to be able to see the entire membership list in one". This didn't make sense, even on a large screen I can only get about 40 rows at a time on screen. That's currently around 80 screens worth! Asking the terrible six just exactly what it was that they wanted, I got just one response which boiled down to: "a better search facility".

    This was much easier than I anticipated - I'd already written some of the building blocks when doing the new ratification system - a couple of hours had the user interface mostly sorted and a further morning saw most of the back end stuff done. Each field/column has its own set of tests that make sense for the data. There's no limit to the number of tests that can be applied so you can really drill down into the data. The search "widget" also allows the user to sort the results how they see fit.

    Then feature creep set in!

    I added a similar facility for pacers/support.

    The set of columns I'd chosen for the membership list results table was really just a "best guess". Some people might want different columns or different sets of columns depending on what their search was. No point in listing a club's name if your search was to look for a specific club, it's just using up screen width.

    So a "User options" page was added. There's a few minor items but the main one is that the user can drag and drop columns to create the table they wish. Again they can save each configuration with a name which can be used to associate a specific table layout with each named search.

    I added the ability to save searches with a suitable name. The code also automatically remembers a user's "last search" for them. That then ran into a problem - Apple Safari deletes a site's stored data after seven days if the user doesn't use the site. This is down to advertisers abusing the feature. That lead to looking at storing user's data on the server. There's no easy way to do this for simple site visitors but for registered contenders it's relatively simple.

    So now the code automatically stores the user's data locally and, for registered users, copies it to the server (potentially sensitive data like email addresses and phone numbers are removed before being uploaded). When a user logs in, the code checks to see if their data is on the server and if so downloads it. This means that if a user logs in to a second browser or device they get data synchronisation.

    Phew! Apologies for waffling on!
    Last edited by Bob; 24-11-2024 at 05:21 PM.
    Bob

    http://bobwightman.co.uk/run/bob_graham.php

    Without me you'd be one place nearer the back

  9. #19
    Master mr brightside's Avatar
    Join Date
    Jul 2009
    Location
    Loving it in the Pilates Studio
    Posts
    8,035
    Feature creep, eh. We could do with some of that setting in on the old forum, might attract a few over from FB.
    Luke Appleyard (Wharfedale)- quick on the dissent

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •