About the project
I’ve been frustrated, for a long time, about the fact that cricket doesn’t have an official, correct source for scorecards. Inconsistencies in scorecards have been a source of difficulty when trying to ensure the data for Cricsheet is as accurate as possible, but have been an annoyance, rather than a major blocker. However when I decided to look into adding scorecard data as a new Cricsheet project I discovered that the issue was far worse than I had imagined, and that discrepancies in scorecards occurred on a much more regular basis than I had expected. The idea for the Cricket Scorecard Accuracy Project was born.
The project compares the scorecards for all international, and major domestic T20 competitions in 2020, and checks them for accuracy. It also attempts to determine, if possible, which source is incorrect if there are discrepancies, as well as detailing what has caused each particular discrepancy.
Scorecards were retrieved from ESPNcricinfo, CricketArchive, Big Bash, the ECB (for England and T20 Blast matches), the IPL, Cricingif (the official Pakistan Super League source), and the ICC, where relevant for each match. Each scorecard is parsed into a common format, and I then perform a variety of checks on each match, such as checking that the result is the same on all scorecards, that the same number of innings are listed, and that batter and bowler details match. Full details on the Checks Performed are available for the curious.
What would I like to see happen with scorecards, after doing this project?
Firstly, the ICC should ensure that the data they publish is correct. I’m still stunned that the governing body for international cricket has such a lax attitude to scorecards of it’s matches, seeming to be happy to leave it up to online sites, and enthusiastic amateurs, to find and correct issues. For a game that places such reverence into the career numbers of players, to see such little care paid to those number by the custodians of the international game is disappointing.
Secondly, the organisers of the various T20 competitions should be ensuring that their data is correct. I blame them slightly less for not giving proper attention to the issue, as they’re not necessarily governing bodies, however they should still be performing this basic job.
Thirdly, the sources could perform extra checks of their data. Yes, they’re relying on data from different sources with different quality controls, however they could still perform some basic checks to ensure that scorecards are at least consistent within themselves. A simple example would be to validate that the wides listed for bowlers actually total up to the value provided for extras generally. A surprising number of scorecards wouldn’t pass even this basic check.
Finally, I’d love to see a standard format for storing the data of scorecards. This could allow different software to output scorecards in the standard format, making it easier for others to share and read the data, and also encourage a more robust checking of the data. Also, I love open data formats!
Who did this?
My name is Stephen Rushe. I live in Belfast, have an interest in cricket statistics, and write software for a living (and fun). I run Cricsheet, a project to provide freely-available structured ball-by-ball data for international and T20 club matches. I’ve been mentioned in Wisden once!
How can you contact me?
The easiest way is to send an email to firstname.lastname@example.org, and wait for me to get back to you. If there is interest I may setup a mailing list to allow discussion of the project.