The Associated Press data journalism team combines news applications development, data engineering, news automation, data analysis and data visualization. We work with journalists across the world -- not only our AP colleagues, but also journalists at member and partner organizations. We cover every beat AP covers, from government and politics to environment and education, and we are dedicated to making data journalism more accessible to everyone while maintaining the high standards of all AP journalism.
Our work may appear as a full text story, a crucial paragraph in a longer story or as a graphic or interactive. We deliver election results maps, contribute to breaking news and to long-term investigations and enterprise. We build tools used across the newsroom, allowing reporters who don’t code to create their own interactive digital content. In many cases, the most important product we deliver is the data itself -- vetted and documented, with full details on our methodology -- for other news organizations to create a version of the story that is right for their audience. For the past year, we\'ve been working with data sharing platform data.world to make the data behind our work available, which has led to hundreds of text and visual stories that would not otherwise have been possible.
The team has also made critical contributions to the wider data journalism community in the past year. For the first time the AP Stylebook included a a complete section on data journalism. The chapter, written by members of the data team, covers everything from evaluating data sources to acquiring data to writing with numbers. It also sets basic standards for reproducible analysis and data visualization.
The AP has also started looking for opportunities to collaborate with member news organizations as well as those outside of journalism on issues of vital public interest. Our first effort in this space is Sunshine Hub, an online tool that brings together AP journalists, member news organizations, and First Amendment advocates from around the country to keep tabs on state laws that could affect access to government information. Through the portal, users can identify and tag relevant state legislation, track its progress through the legislature and discuss bills with other members to identify cross-state trends.
The AP team began as an offshoot of our interactive graphics team, with four members and an editor. In the past four years, that team has grown to 11 data journalists with specialties ranging
from statistics and demographics to devops and full stack development. We\'re still a relatively small team given the size of our organization and the number of news organizations we serve, but we multiply our efforts by helping others tell their own stories with the data.
What makes this project innovative?
Over the past year, AP’s data team has applied advanced techniques including natural language processing, network analysis and machine learning on projects examining topics including the effect of charter schools on segregation, the lack of diversity in Trump judicial nominees, the potential flood risk from climate change in Superfund sites and the outsized toll of teen gun violence in some smaller cities. And we have devised novel visualization techniques to make complex concepts such as the \"efficiency gap\" used to measure partisan gerrymandering easier to comprehend.
Perhaps most importantly, the data and methodology for each of these projects was shared on data.world with thousands of AP member news organizations, giving them access to vetted data and story ideas they may not have been able to do otherwise. AP data journalism team members guided reporters from hundreds of news organizations in using data to bolster their local reporting, hosting webinars to guide them through analysis and even pre-writing SQL queries to help them make sense of the numbers. Over a 12-month period, the AP data team distributed roughly two dozen datasets on everything from county-level opioid prescribing to the more than 4,000 grants given by the NRA to individual schools and community groups.
In every case, whether the techniques were common or cutting-edge, they have served the story first, and we\'ve made our methodology clear enough that other news organizations have been able to build their work on our analysis with confidence.
To support this work, we built and open sourced a toolkit for managing data journalism projects that helps us rapidly start new projects, share data and review one another\'s work. We have also developed automation solutions and tools that have expanded our output of graphics and stories, freeing AP graphic artists and reporters focus their efforts on journalism with maximum impact.
What was the impact of your project? How did you measure it?
We measure our impact by watching our stories spread across the front pages and sites of our members and customers. Our data releases have routinely led to dozens of local stories from AP members, each with its own form of impact in user engagement and local response.
We also measure impact in governmental change and response: A partnership with Reveal/The Center for Investigative on modern-day redlining has led to separate investigations from the Pennsylvania attorney general and state treasurer. An analysis of Superfund sites in areas prone to flooding has prompted a GAO investigation, which is still ongoing. A story on federal judiciary diversity sparked a line of questions to Attorney General Jeff Sessions in a subsequent congressional hearing. A story analyzing NRA grant funding led to a number of local school districts to refuse NRA money in future years.
In the past year, other news organizations have queried our data sets more than 6,000 times. Six separate data distributions were accessed by at least 100 unique news organizations, sparking dozens of stories on school segregation, sexual misconduct in statehouses, mortgage loan redlining, NRA grants, Superfund sites in flood-prone areas and FEMA public claims appeals. AP and member reporters used SunshineHub to identified and track more than 150 transparency-related bills in the 2017 legislative session alone, leading to a series of Sunshine Week stories highlighting these issues.
Source and methodology
Most of the data sets used in our projects have been obtained using public records requests. In some cases, we have downloaded or scraped publicly available data. In a handful of cases we have developed our own data sets, whether by aggregating state data to the national level or by having reporters make individual public records requests in each state. The data team has leveraged our news organization’s presence in all 50 states to gather important data on conflicts of interest among state legislators, statehouse policies regarding sexual harassment and juvenile life sentence cases.
In rare cases, when the data does not otherwise exist, we have worked with data from a non-governmental source. One example of this type of source is the work we have done with data collected by the Gun Violence Archive. For this series, which was a partnership with the USA TODAY Network, we began with the data provided by this nonprofit and then used public records requests and our own news searches to verify the cases from which we drew the stories
Our primary toolkit for data analysis is R, which we support with our open source data project tool called datakit. Datakit manages project structure, so that our data journalists can easily generate new projects and maintain the same organizational conventions across projects. It also manages interaction with our issue tracking system, data servers and the data sharing platform data.world. It’s language agnostic, so we can easily plug in R, Python or Ruby and focus on using the right tool for the job.
We have used Python for machine learning tasks and neo4j for network analysis. Our databases are typically built using Postgres or SQLite. We use D3 for interactive visualizations, including our election night maps. We have built data administration tools for data entry and tracking, as well as world-facing news applications using Ruby on Rails and Django. While most of the geographic analysis we’ve done has been accomplished with R, we have also used QGis for some tasks as well as Mapbox and ESRI mapping tools for visualizations.