Project description

The Associated Press, long a provider of election data through its election vote tabulation, saw an opportunity to serve news organizations of all sizes and to democratize data journalism by sharing the data behind AP’s national stories with other news organizations. Whereas a typical data-driven project may yield a handful of stories and a graphic or two, AP knew that getting this data in the hands of local journalists and giving them the support they needed to find their own stories in the data would have a dramatic multiplying effect on the work AP’s data team was doing. More than that, through the data sharing program, AP has created a new network of data journalists who have begun sharing ideas and data with one another.

By sharing cleaned and vetted datasets, AP has saved local journalists hours of painstaking preparatory work that a data project requires: finding the data, interviewing the data source, finding weaknesses and errors in the data, comparing the dataset with other available data to ensure its integrity.

What makes this project innovative?

Sharing data has meant more than simply posting spreadsheets. For each of data project, AP provides extensive documentation, including methodology, background to the story and guidelines for finding a local story. Built into the sharing process are measures that have made finding stories in data as easy as possible for the non-experts: Data webinars provide a walk through of the data, answering journalists’ questions. The AP national story is available in advance, to help guide local reporting by national trends. Built into the platform for distribution are tools that let the non-expert quickly access and visualize just the data relevant to their audience, so that reporters on deadline can quickly see if the national trend applies locally or if their city represents an outlier.

What was the impact of your project? How did you measure it?

When the Detroit News wanted to confront HUD Secretary Ben Carson over a proposed rent hike for recipients of federal housing subsidies, they were able to be precise: renters would see a 20% increase across the board under the proposal. The Secretary walked the proposal back on the spot. When news organizations in dozens of cities across the country wanted to dig into the effects of unfair lending practices in their communities as exposed in an investigation from the Center for Investigative Reporting in partnership with The Associated Press, they were able to compare precise data for their cities to other cities across the country, and state attorneys general in five states and the District of Columbia have all launched investigations of the practice, along with local investigations in other cities spurred on by the reporting. When AP broke the news that the teenager charged in the Parkland school shooting had been trained in an air rifle squad that received funding from an NRA Foundation grant, reporters across the country were able to look into their own school systems’ receipts from the NRA, and several school districts, including Broward County, Florida (where Marjory Stoneman Douglas is located) and Denver, Colorado decided no longer to apply for these grants. Each of these stories illustrates the reach and impact of AP data sharing program in 2018 -- helping to bring the power of data journalism to bear for news organizations across the country, and enabling accountability reporting that affects the daily lives of readers in these communities. This is work that many news organizations -- with staff stretched thin from budget cuts and lacking the technical expertise to wrangle large data sets and mine them for important local stories -- would have been unable to do on their own. The crisis in local news over the past decade coincided with the emergence of data as an increasingly powerful source for journalists. Just when data was exploding in size and the collection of data loomed ever larger in every aspect of public and private life, local news organizations, facing shrinking budgets and smaller staff, were increasingly limited in their capacity to make use of these rich sources. In 2018, AP provided 16 robust data sets for localization, and they have been downloaded nearly 1,400 times by journalists and editors, reaching more than 300 journalists from local newspapers and public television and radio stations to the largest news networks in the U.S. Data sharing projects have routinely led to dozens of localizations, including text stories, videos, interactives and graphics. AP also extended the program to universities, providing ready-to-use data sets for the next generation of journalists. AP provided data to identify shelters where child migrants were being detained, examine local spending of federal funds to combat opioid addiction, track the flow of refugees from countries facing new restrictions and explore the effect of the #MeToo movement on statehouses across the country. In every case, dozens of news organizations were able to bring these national stories home and hold their representatives accountable.

Source and methodology

Sources and methodologies varied across the various projects. For the local life expectancy story, AP used data from The United States Small-Area Life expectancy poject. Using a statistical technique (local Moran), the AP was able to confirm that life expectancy is not randomly distributed across census tracts in the US, but that high or low life expectancy in one tract is often related to the life expectancy in neighboring tracts. Using this technique, the AP was able to identify clusters of tracts where the life expectancy was statistically similar and categorize the clusters in fourways. Interesting stories may be found where tracts don't match their neighbors. For the NRA grants project, AP used public data from IRS form 990s. The AP used fuzzy string matching and manual review to identify all elementary, middle and high schools -- and programs within those schools -- that received funding from the NRA's grant program. For the proposed HUD rent increase project, AP collaborated with the nonpartisan Center on Budget and Policy Priorities. CBPP has access to 2016 household-level renter data from HUD under a research agreement. The impact is calculated by directly applying the proposed rent formula to households and contrasting with current rent. CBPP aggregated the household data at a state level and for the 100 largest metropolitan statistical regions for use by the AP and its members. For the project on thirty years of climate change, AP used data from the U.S. Climate Divisional Dataset provided by the NCDC/NOAA. The NOAA collects daily temperature data from 10,325 U.S. land stations in the Global Historical Climatology Network. The project also included missions data from the U.S. Energy Information Administration. In calculating the temperature difference between the present day and the first half of the 20th century, the AP adopted the methodology used by the U.S. Global Change Research Program in the 2017 Climate Science Special Report. For each geography, a base average was created using monthly temperature data from 1901 to 1960. An average was then created with monthly temperature data from 1988 to 2017. The difference between the present 30-year period and 1901-1960 is the temperature change. To calculate warming trends the AP used a linear regression model to draw a line of best fit through annual measures of temperature from 1988 to 2017. The resulting coefficient estimate is interpreted as the yearly change in temperature for a particular geography.

Technologies Used

Analysis was primarily performed in R or Python. Visualizations used the ggplot library in R and D3 in JavaScript for interactive visualization.

Project members

Meghan Hoyer Larry Fenn Angel Kastanis Nicky Forster Michelle Minkoff Justin Myers Dan Kempton Bob Weston Seth Rasmussen Serdar Tumgoren


Additional links


Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.