• Hi Guest Just in case you were not aware I wanted to highlight that you can now get a free 7 day trial of Horseracebase here.
    We have a lot of members who are existing users of Horseracebase so help is always available if needed, as well as dedicated section of the fourm here.
    Best Wishes
    AR

Early days

Oh,
regarding the rail moves - actually it beggars belief that people can have such a poor understanding of basic geometry considering the Greeks worked most of this stuff out 3,000 years or more ago. You have an arc, the rail, on a bend - simplify matters and consider it a section from a circle. Now, what happens to the circumference of a circle if you increase or decrease its radius?

Answers, as you say, on a postcard....

Dave
 
I'm afraid you'll need to be both optimistic and patient!

It's not that important, something for the future maybe.

Regarding the maths required for rail movements I received this from Jack Pyor the clerk of the course at Huntindon/Market Rasen last August.

***************************************

Dear Mike,

Thank you for your email and apologies for my delayed response.

Whilst I appreciate the times do not appear to match the distance alterations I have re-calculated the added distance and I can confirm, it is correct.

For every yard the rail is moved out on a bend we multiple that by three. This is then multiplied by the number of times the bend is used during the race, giving us the overall added distance.

Stands Bend - 18yds x3 = 54yds, they only use this bend once.
Wood Bend - 14yds x3 = 42yds, they use this bend twice in a 2m3f chase. 84yds.

Totalling 138yds, whilst I appreciate this doesn’t address your particular query regarding the race won by Our Cat, I hope it goes some way in explaining how we calculate added distances.

Kind regards,

Jack Pyor

Clerk of the Course – Huntingdon & Market Rasen Racecourses

******************************************

Mike.
 
2 x Pi r for circumference, so as an approximation multiply by 3 is pretty close, provided you are talking about 180 degrees of turn. It's not an exact art anyway as, obviously, they have to 'guesstimate' how far from the rail to measure anyway.

The programming thing is fine, it's something that would use a bunch of routines that I wanted to develop for myself anyway - 'scraping' websites isn't something I've ever tackled before but would be very useful, so I've been doing a little bit on it as and when I can already. The only issue is how complicated it might get - it would help enormously if Timeform would kindly publish the race times for each meeting on a single page (as the RP does with the normal results page and the analyse race times pages) - as far as I can tell you can only get one race time per page downloaded, so on a day like Saturday the program would have to find out around 60 different web addresses to go to to download a single time from each (having downloaded around 12-1300 lines of HTML you'd then extract about 20 characters from that entire page). I can't see it happening in Excel somehow, but computers are good at doing big things like that repeatedly so it looks quite possible I'll manage it in time.

An alternative is that we both subscribe to the Timeform API, which I looked at on their website - a well written API can be a dream, allowing access to all sorts with a few line of code - the Timeform API would offer everything they have to a coder, unfortunately at a starting price of £1000 a month. I felt it a tad expensive just so I could download some racetimes....

Dave
 
Okay,
late as ever - you'd think I'd have learned to stop fiddling with things by now....
Nottingham, Chelmsford (thought I'd try the AW) and Ffos Las for tomorrow. For the ratings be careful with the 'top 2' section at the start of each card, although the actual ratings are okay in the racecard section (ie the correct figures are there for top rating on turf and AW) the top 2 bit is having problems sorting the two out and although I've spent a couple of hours giving it a stiff talking to the darn thing is still shoving the odd turf figure into what ought to be an AW ratings only bit for Chelmsford - basically listing the top 2 ratings regardless of whether they are turf or AW - so until I get that sorted please check ratings in the main section of the card.

If I can sort this in time I'll post an updated set of ratings tomorrow. For purposes of checking how the ratings are doing, if I can 't correct the listings in time I'll base my analysis on the correct values from the race listings, rather than the top2 section at the start.

Dave
 

Attachments

  • chelmsford2.csv
    22 KB · Views: 3
  • nottingham2.csv
    24 KB · Views: 0
  • card2_25july.xlsx
    98.4 KB · Views: 1
  • ffoslas2.csv
    22.8 KB · Views: 0
Sorry to be slow,
I've been sorting that ratings issue out, and it dragged on stupendously....
Much later than I wanted, but the full ratings set for today is now being attached - it should give the same info as reading the racecard section already did, I just find it handy to have the top ratings identified like this as well.

dicko14 dicko14 Dunno is the short answer, I don't use RUK very much - if I get that time comparison thing going for Mike I'll see if it's possible to check other feeds. At the moment I use HorceraceBase information, and as there's very occasionally a dubious value in there I then cross check against both the Racing Post and Timeform. If there's still any doubt I start looking elsewhere - if its up on the ATR race replays I'll watch the race and hand time it for example, to see if I can spot who's got the most sensible time for the race. You can access race times at RP and TF via free accounts, which is useful. One thing I find you have to watch is that some of the sites themselves are buying the data in from elsewhere, so you can get 2 or 3 sites agreeing on a time that is equally wrong!

Dave
 

Attachments

  • ratings25_july.csv
    5.1 KB · Views: 2
By the way dicko14 dicko14 that RUK display is very similar to the Racing Post's times analysis that you get access to if you join their club thingy, in case you weren't familiar with it.

Right, tomorrow's efforts - Bath, Catterick, and Sandown

Out of yesterday's top2 lists the results were 27 runners on the flat with 4 winners, prices 1.51, 2.93, 5.9, 6.25 which I'd call pretty rubbish, 2/20 handicap winners and 2/7 non-handicap. It's still too early to derive anything meaningful from the numbers (except that the overall win rate is pants). Over the jumps 1 winner from 7 at odds of 6.4 so almost broke even.
I'm figuring I need at least a couple of weeks worth of results to make head or tail of this, so bear with me while I see how things go....you can't draw sensible conclusions from a few dozen races, an d although I have a few gut feelings I accept that this is probably just incipient wind....

Dave
 

Attachments

  • bath2.csv
    27.3 KB · Views: 0
  • catterick2.csv
    29.9 KB · Views: 0
  • sandown2.csv
    19.7 KB · Views: 1
  • card2_26july.xlsx
    109.4 KB · Views: 0
dicko14 dicko14

I never personally look at what times the RUK put up.

On the flat the times returned should be exactly the same, over the jumps Timeform and the Racing Post time the races from when the horses pass the start line.

Timeform Jim once stated on Channel4, that jump races should be timed when the flag goes down, which is the most stupid statement I have ever heard, they could be
a 100 yards away from the start, add that to the rail movements, and the race distance could be a furlong out before they have even started.

Mike.
 
Last edited:
davejb davejb, @The Blues Brother
Thanks for your responses.Seems like your going to have your work cut out. Your more in tune to the situation than the people who get paid to produce the official times are.
Thanks for sharing your figures.
 
That's doubtless true of Mike, but not myself!
Personally I think it's down to a mindset, many people question whether speed/time calculations have any merit - there's an article about wind assistance in today's RP online news section that certainly makes you think.... well it did, but it's not on there any more.... giving the suggestion that you'd mistakenly assume a fairly ordinary runner was a complete rocket due to breaking the course record with a strong tailwind. That, of course, is why we try so hard to get accurate timing data and to calculate really effective going allowances - to sort of allow us to extract the value of the horse's actual performance on the day with all the weather and going variations factored out.

If people don't think time/speed ratings have really got much validity, then it's not surprising they don't worry too much about getting it spot on. I have to agree with Mike (it'd be a brave man who didn't) that it is absolutely critical to have a good time and distance, and both are subject to the vagaries of human nature... rail movements and the like can (and do) often add significantly to a published race distance, which we can account for if that information is released. It's when the information isn't released, or nobody bothers checking, that the problems occur. Timing also goes awry - sometimes quite strangely (as per the race on ATR I mentioned the other day) - on the flat it's not exactly hard to figure that when the stalls open the average runner emerges like a bat out of hell, so starting and stopping the timer ought to be fairly straightforward and not subject to too great an error. Over the jumps though it's a bit different - frequently the runners kind of amble along towards a nice chap who consents to their departure, picking the actual moment to click your stopwatch could result in a couple of seconds (or worse) variation in a timing. Luckily, in percentage terms, this isn't too drastic - it can still cause a fair difference in rating however.

Anyhow, back to the daily analysis -
Yesterday there were 33 runners (others NR) of which 23 were on the flat with 3 winners, 2 winners from 6 over jumps.
What is becoming more obvious daily is that the odds are dramatically indicating whether to bet or not. The highest priced flat winner was BFSP 5.1, for jumps 2.76 Looking at the prices of those that did not win some of them were gigantic - apart from the two winners over jumps the other 4 on that card went off at prices from 28 to 213.37! Looking at the last 3 days, which is too small a sample to make any serious deductions from, but certainly indicating a possible trend, by only going with runners at odds of 5.0 (4/1) or less the win rate approaches something reasonable - the occasional big priced winner doesn't compensate for the large number of big priced losers.

At 4/1 or below the wins/runs over flat become 5/23 and over jumps 5/11.... in reality applying a price filter of 2/1 (3.0) would have caught every one of those winners and improved the win rates.

I'm not going to say that's it, top rated 4/1 or less and you're away, I'll continue to analyse results and see if that firms up more or - like many apparent correlations in data - just quietly fades away and turns out to be an anomaly. ('Commonsense', which is never common and seldom sensible, admittedly, would suggest that lower prices would invariably improve win rates, of course... the trick is to make a profit!)

Dave
 
Hey ho,
off we go. Doncaster, Newbury and Sandown tomorrow - see attached.
 

Attachments

  • card2_27july.xlsx
    97 KB · Views: 0
  • doncaster2.csv
    25.7 KB · Views: 0
  • newbury2.csv
    21.5 KB · Views: 0
  • sandown2.csv
    21.2 KB · Views: 0
TheBluesBrother TheBluesBrother

Hi Mike, I sure hope this works - I only just learned (I hope) to use Dropbox so you can access this stuff,
which is another first because prior to that I had to learn how to turn my Python code into an exe style program that you can run without installing Python, which was the result of learning how to 'scrape' code from the Timeform site.... if all goes according to plan......

This link should, if I understood this correctly, allow you to download a file called scraper.zip that I made up earlier. It isn't the complete program, I want to make sure that this is all going to work as planned before I complete the data grabbing side of things, so I put together a sort of little demonstrator program that collects the race times for yesterday from Timeform, then lists them in a csv file you can load into Excel. My plan is to do the processing so all you need to do is look at the results, but there'll be nothing to stop you playing with the data in Excel if you want to do other things with it. So if you'd download and unzip the file, then see the instructions that follow below:

Dropbox - scraper.zip

When unzipped (into any convenient folder you like) you'll see a small collection of files, a dozen or so, plus a folder called 'library' - there's also a file called 'scraper' which is listed as an application, you double click it to run the program. The other files are needed to run the code, as are a number that should already exist in your Windows directory on your PC - errors could come up if they aren't there, but they should be.

When you double click 'scraper' a command window (a black 'DOS' style job) will appear for the duration of the run, this window will click back out of existence when it finishes its job.
In that window you'll see messages (mainly to reassure you it is working) saying it's found so many meetings, and that it is checking the times for each race in turn. When it finishes running it will say "Where did it go?" - This is something I added to stop the command box winking out of existence on exit, press return and the box will disappear. (ie It's just an 'I'll press a key when I'm ready' sort of thing...without this I found it a bit disconcerting that everything just winked out of existence as it was getting interesting...)

That's the program finished - if you look in the folder your scraper program is in you'll see a file called Timeform_racetimes.csv (there's one from today in the zip file as an example) - open it in Excel or a similar spreadsheet program and you'll have three columns - racecourse, race start time, racetime.

If this all works okay for you then I can get on to finish the rest off, which should be a bit easier than getting this first bit done as I've got most of the 'skills' I need for the job now thanks to the last couple of days efforts. If all is okay I'll add the Racing Post times to the spreadsheet, and add a small menu to let you pick which date to get the times for. Would you like the race times converted to seconds as well? Currently they're in as '4m 16.5s' and similar, it's easy enough to program in a conversion routine to add a column saying 256.5
As you want to compare RP to TF I presume you'd like a column listing the time difference between the two racetimes listed for each race.

Is there anything else you'd like in there, basically if it's on the web page the times are on I can probably collect it. The program actually has to load a web page for each race individually with Timeform, so the speed it all runs at is down to your internet connection really - for example the test runs I did today loaded one page to collect the page addresses for each race, then looped through 38 individual pages to collect the time off each, which took maybe 2 minutes over sluggish ADSL.

Let me know if it works, and I'll get the rest done.
Dave
 
At 4/1 or below the wins/runs over flat become 5/23

I am not so sure that using odds to filter is the way to go. Yes you will show an improved strike rate but isn't that simply because lower odds selections win more races anyway?
For example from the last 23 runners that have run in a flat race with odds of 4/1 or below 6 of them have won!
 
davejb davejb

Thanks for your hard work, due to the security setup on my PC, Norton will not let me run the scraper.exe even after I excluded it.

EDIT: Finally got it to work, I had to go into the antivirus settings plus exclusions.

Thanks

Mike.
 
Last edited:
TheBluesBrother TheBluesBrother Windows10 warned me against running it, I told it to ignore. Then my antivirus asked if I wanted to let it go onto the internet to which I said yes and the scraper then done its job. So its working OK here, maybe something odd with Norton?
 
Yes Ark,
ultimately though, over time (rather more than 3 days of course!) I'm looking for correlations, the price is just the first 'obvious' one that seems to be popping out. It's never going to work if I just go top rated plus low odds (actually top rated plus any one thing is probably unlikely) as the odds will limit the returns too much unless the win rate is very good. It may be that rating+odds+race distance band or something gets the win rate high enough to allow the odds to return a profit. On the other hand it may be that the outsiders with something to filter them would do the trick - I have noticed along the way that the ratings can throw up some rather big priced winners.
Personally I think it'll come down to using the ratings to have a short list, and then it's just form reading...

As for the scraper program - I have Norton installed myself, and it didn't flag any problems with the program itself or any of the code I had to download. Python is a language that has a strong userbase who have produced a large number of extensions to the core language - no doubt some geek would pull faces at my description, but to me it's very much like Linux in that the user community beaver away happily to improve it over time. My code is written in the latest version of Python 2, Python 3 is the latest version of Python itself and I'll doubtless switch to it at some point, but it's the version best supported by 'teach yourself Python' type books/mags that I used to get to grips with it. The extensions used in my program are 2 very well used and known ones from the basic python 2 setup itself - 're' and 'urlib2'.
I had to use one 'add on' program called py2exe which again has been around forever and is well known and used, it's pretty much a staple of the python community - it's the program that allows you to turn your code into a stand alone exe for others to run without their having to install Python on their PCs.
I suspect the security issues would be flagged as a result of the program accessing the web, as of course that is something that a virus might well try to do, needless to say it's not suspicious behaviour for a program that is meant to obtain information from websites!

As the program is running when you guys get it, I'll program the rest of it in. I'll add a column for the winner's name I think, so you can be happy it's the right race you are looking at.
So course - time - winner - racetime(TF) - TF in seconds - racetime(RP) - RP in seconds - difference between the two times is what I'm planning, any changes to that say so.

Dave
 
Hi Donny Donny
I'm glad you find the stuff of some use. You can do quite a bit in Excel, it's a pity that packages with Access in tend to be so pricey - I had a full Office XP years ago and spent quite a long time using Access, which kind of had Visual Basic embedded in it so you could write program code to interact with your database if Access itself didn't do what you wanted.
Python's quite a nice language to work in, there are a few odds and ends I'd like to see in it that aren't there, but on the whole I rather like it. Programming languages get easier as you learn each one - you learn the basic syntax, the order and way that programming languages like to work, and from then on it's often just a case of learning the word (command) to type to do pretty much the same thing as some other word (command) did in the first language you learned.... which becomes steadily easier to learn.

I can't begin to emphasise enough that the most important attribute for learning coding and producing working code is sheer bloody mindedness - I bludgeon the language until it does what I want, frequently in a very inelegant way! (Like most programmers I do not like revealing my inefficient, ugly way of doing things to others - there's always a suspicion they'll turn round and say 'but you could have done it in three lines like this...')

Dave
 
Back
Top