Julian Schmidli, Duc-Quang Nguyen, Tania Boa, Luc Guillemot
20 years ago, he played his first professional match. Roger Federer has now won his 20th Grand Slam title. A data analysis of all the matches he has played reveals how he became the best tennis player of all time.In Tennis, every move, every point is being tracked. But none of it is made accessible in a machine readable way. What we tried to achieve was, to analyse all availabe data from the last 20+ years of tennis and make it accessible to others as well. We did that by publishing not only the story but also the code that we used (https://srfdata.github.io) plus a making of (https://medium.com/p/wie-wir-20-jahre-tennisdaten-analysiert-und-aufbereitet-haben-f4d3498580c3, in German).The story was published in many different languages at the same time in collaboration with Swissinfo (https://interactive.swissinfo.ch/2018_01_28_federer20/)
What makes this project innovative?
We at SRF Data are trying to set new standards when it comes to transparency and reproducibility. Not only do we publish all our code used for the analysis. We also publish the original data so that others can understand how we transformed and analyzed it.Furthermore, data journalism and sports have seldom been combined in a way we did. We tried to keep the dimensions as simple as possible so that everybody (even if a reader has no prior knowledge of tennis) can understand what we are trying to convey.
What was the impact of your project? How did you measure it?
With this story, we reached more than four times of the visitors we usually reach. By applying the data journalistic approach to a popular topic like tennis, we tried to introduce more people to our way of journalism.The feedback was extremely positive. Not only did important experts in the field of tennis analytics extol our story (https://twitter.com/StatsOnTheT/status/958162986786463750) but so did other data journalists (https://twitter.com/martinstabe/status/958254070367694848).
Source and methodology
The ATP (world tennis organization) does not offer an API for their data. Also on request, they do not publish or sell any of their collected data in a machine-readable format. That\'s why we collaborated with Mileta Cekovic, a Serbian computer scientist who scraped all the available information from the ATP website and organized it in a database.Furthermore, we used the deuce package by Stephanie Kovalchik which makes all of Jeff Sackmanns collected historical tennis data available for further analysis.
The Ultimate Tennis Statistics by Mileta Cekovic were written in Java on Windows. The backend is a PostgreSQL database that we could port also to our Unix (Linux, Mac) computers. From there we used R Markdown and the RPostgreSQL package to read data from the database.In R we used ggplot for visually analyze the data. With jsonlite we exported only the data needed for the final chart into a JSON file that we could import in our frontend which consists of a react-stack and d3.To update the data we can scrape the latest data from the ATP website and simply rerun our R Markdown. It will automatically export new JSON files and update the UI.To translate the story into 10 languages total we collaborated with Swissinfo. They entered their translated texts into a google drive sheet where we downloaded it automatically and entered it into our translate functionality.