• Hi Guest Just in case you were not aware I wanted to highlight that you can now get a free 7 day trial of Horseracebase here.
    We have a lot of members who are existing users of Horseracebase so help is always available if needed, as well as dedicated section of the fourm here.
    Best Wishes
    AR

Scraping RP Naps Table with python (Idiots Guide)

Mate of mine wants to start using python for scraping various things from websites , so I said read a few examples and watch a few vids off youtube then give it a go with the RP naps table which is easily obtainable.
The reason I said the naps table was RP dont like you scraping it and have put in little things that would make it a bit difficult for a newbie to scrape and would keep denying access.
Fair play he gave it several goes before he asked for an idiots guide to see where he was going wrong.
To cut a long story short Ive had a bit of free time today so knocked a guide up for him and thought I'd put it on here as well as there doesnt seem to be any basic guides on how to with Python on here, If it helps 1 person to get into scraping with python then its job done .

RP.JPG

The out [29] is the return from line 35 where the output is proofed(you would delete that line after proofing)
Lines 6 & 8 are the important lines as RP will deny access if you just try and get access through via the . get (url) route.

Heres a screenshot of the csv file output it gives in Excel.

screen.JPG
 
Last edited:
I'll have a look after, it should be a bit simpler than that no need for bs, Pandas should tidy it all up with read_html. In fact it's sacrilege going to excel from a DataFrame
 
I'll have a look after, it should be a bit simpler than that no need for bs, Pandas should tidy it all up with read_html. In fact it's sacrilege going to excel from a DataFrame

The object was not whether you need bs or read_html in pandas,
it was to get on the RP naps page in the first place without getting a response 403 or access denied .
 
Back
Top