FiveThirtyEight is the most prominent data journalism site in the U.S., but in our fourth year since re-launching in 2014 we had to somewhat reinvent ourselves for the Trump era. We took on more forms of storytelling, more urgent stories -- and, of course, did it all in an empirically sound, data-driven way.
You want data?
We did a 12-part, statistically driven investigation of what really happened in the presidential election. We used pitch-tracking data to show that -- despite denials from Major League Baseball -- the baseballs were juiced. We stole a technique from machine learning to profile President Trump\'s most rabid online following. We helped create 12 new tests to measure Hollywood diversity. We used data to help readers pick a World Cup team to root for, and to show the loooooong recovery ahead for Hurricane-struck areas. We created an NFL game that allowed users to try their own hand at modeling and compete against one another. We showed how the legacy of slavery still manifests itself in how people die in the United States. We gave readers real-time, updating data to help them make sense of their world -- on the NFL, Congress, the NBA, Trump, the Oscars, the job market and more.
Through it all, we’ve learned and grown as a newsroom. We’ve focused more on where we really add value: politics and policy, sports and science. There’s still a long way to go, but we’d suggest (with oodles of humility, of course) that no other site consistently produces data-driven journalism on such a variety of topics and in such quality and quantity. Maybe we should try to show that with data, but we’ll leave that judgement to you instead. 😉
What makes this project innovative?
Most newsrooms nowadays have journalists who work with data, but few claim the use of data and empiricism as guiding principles. This is what makes FiveThirtyEight really stand out. Everyone on staff believes in the power (and limits) of data journalism, and that fact informs how we approach every story. It also informs our newsroom makeup and workflow. We’ve built inhouse tools to make charts and tables. We require our writers to be able to post data to GitHub. Or, to take a small example, we provide a link to a story’s underlying data right along with the article’s byline. In other words, we don’t just use data -- we’ve built a newsroom around the values that data journalism embodies.
What was the impact of your project? How did you measure it?
Fifty-seven million people visited FiveThirtyEight.com in 2017, according to Omniture (which is a bit higher than comScore’s figures). More importantly, those readers spent a total of 618,047,818 minutes on the site. About a third of visits were for more than five minutes -- an insanely high share by internet standards. That’s really how we judge our work: Is it engaging and intellectually challenging -- not only attracting a wide audience, but holding that audience’s attention?
Source and methodology
We use a variety of sources for data -- governmental, crowd-sourced, official, unofficial, academic, corporate, etc. We use ready-made data and we collect data ourselves. But everything is sanity-checked, verified and tested. We have a full-time quantitative editor who does much of that work. But editors, copy editors and reporters themselves are expected to cast a suspicious eye towards all data.
We use a wide variety of technologies for our work at FiveThirtyEight.
For data analysis we mostly use R but we also use Python, Ruby, STATA and Excel. For interactive visualizations we use D3 and Node.js. For static visualizations, we often use D3 or ggplot2 with Illustrator as well as some internal web-based tools we built using Node.js and React. For databases and backend interfaces we often use Ruby on Rails with MySQL or Postgres. For mapping we mostly use QGIS.
And, of course, we do a ton of reporting and basic research.
We also have many different bots that help us with our work by keeping track of different data sources. They communicate with our various databases and predictive models and interact with our journalists via Slack.