- Published on
Cricket data sources and Python
NOTE
This is an old post and some of the information is likely out of date.
Where can I find cricket data in 2019?
Inspired by an enthralling Ashes series, I wanted to play with whatever cricket data I could find from free and commercial sources. Despite a bit of trawling, I couldn't find a single document detailing what was available.
So what sources are there?
I did a bit of Googling and also looked at what some of the obvious sites offered. Additionally, I looked for existing Python libraries and played around with them. I was more interested in being able to crunch stats as opposed to creating widgets showing live scores, for example, but have kept an open mind.
Cricinfo
Every online person with a cricket habit will have heard of Cricinfo. It has been under the ESPN umbrella for a few years now and various old questions on Stack Exchange at Quora suggest they used to have an API for developers but that it was discontinued. That said, there are still some options for grabbing their data by leveraging feeds or scraping pages.
RSS Feeds
This page lists the available RSS feeds. These can be parsed to get information on current matches, recent results and more.
Python libraries
Searching on Github returns quite a few results but most are old and unloved. A couple stood out though (there are probably more of interest):
python-espncricinfo
python-espncricinfo is a library that can scrape all sorts of data from Cricinfo. I've encountered a few bugs but it can be used to retrieve ball-by-ball data, player profiles, and much more. It does have some bugs so prepare to build some error handling into your code.
Cricbuzz
I'll confess to never really looking at Cricbuzz but it is another popular site dedicated to cricket news and there's a Python library for accessing it.
pycricbuzz
Although I haven't really looked at it pycricbuzz does seem to have quite a few stargazers and forks.
Cricsheet
Cricsheet is an effort to create freely accessible ball-by-ball datasets for cricket matches. It took its inspiration from a project called Retrosheet, which attempts to do the same for MLB. It offers free data for a subset of historical matches in JSON, YAML, and CSV formats.
Cricket API
Cricket API is a premium service offering numerous API endpoints. I've not looked at it beyond that.
Other resources
Other resources I have found but perhaps haven't explored in detail.
Criclabs (Github organisation)
(Criclabs)[https://github.com/criclabs] aims to locate "all open-source cricket libraries at one place" and contains forked repos of many of the projects I've looked at.
Conclusions
For those wanting to analyse cricket beyond what you can do with Statsguru, using Cricsheet is probably the easiest way to get started. You can also scrape further data from Cricinfo and Cricbuzz etc.