Project description

In the category of Portfolio, The Wall Street Journal is proud to nominate Paul Overberg, whose data skills underpinned a wide breadth of important and impactful projects, from identifying a little-noticed demographic trend in the U.S. that has huge implications for how people here live to helping to document fraudulent comments about pending government policies to an economic assessment of South Central Los Angeles (California) 25 year after riots nearly burned the neighborhood to the ground.

The "One Nation, Divisible" project addressed the growing divide between prospering urban areas and waning small-town America–a situation upside down from a couple of decades ago when the urban areas were the troubled spots. These stories grew out of two 2016 Wall Street Journal projects – one on the opioid crisis and another on the economic roots of American political discontent.

For "One Nation, Divisible," Paul led an exhaustive analysis of economic and social statistics and academic journals to reach the conclusion that many of the economic and social ills that plagued U.S. cities in the 1980s and 1990s had become worst in rural areas since the turn of the century. These included measures of teen pregnancy, disability, divorce, maternal death, cancer mortality, unemployment, stagnant incomes.

With this quantitative base, Paul was able to help his colleagues focus reporting in places that best told of the human plight, including Kenton, Ohio; Caledonia, Mo.; West Branch, Mich.; and McMinnville, Tenn.

Paul’s programming skills were also the backbone of an investigative project called Hidden Influence. This article was part of a series on the stealthy ways that the powerful use to shape law and policy, especially in Washington. The article documented how a crucial part of the democratic process – public comment on proposals by federal agencies – has been swamped by people and groups using bots to fill up online dockets. The automation of public comment is designed to bolster advocacy and fund-raising; slow regulation; and bolster potential litigation against a proposed rule.

Paul helped to document how:
+ Thousands of bot-generated comments are filed using stolen IDs, including those of dead people and people who oppose the stance falsely attributed to them.
+ Lobbyists’ software can spawn thousands of programmed variations of a comment to evade duplicate-detecting docket software at regulatory agencies.
+ Trade associations and affiliated lobbyists misled customers and employees to use their identities for the purpose of generating fake comments.

Finally, Paul contributed to the success of many colleagues with demographic analysis. One example: For an anniversary story on 1992 riots in South Central Los Angeles, he created a custom compilation of 1990 Census data to match current boundaries and allow comparisons. In another example, he helped document that Americans are relocating to retirement hot spots,
a sign that the nation’s 74 million baby boomers—those born between 1946 to 1964—have dug out from the 2007-09 recession that locked many of them in place when home and stock values plummeted.

What makes this project innovative?

One Nation, Divisible: Paul helped build this project from the start to rest on primary sources: social statistics from many sectors; academic research in demography, economics, political science and urban studies; and interviews with rural residents themselves. He also did a critical review of and data tests on numerous schemes used to categorize U.S. settlement patterns, from dense cities to frontier homesteads. This prompted The Wall Street Journal not to use any in favor of creating a bespoke approach that balances accuracy, resolution and simplicity.

Hidden Influence/Fake Comments: Several tech-savvy bloggers had written about the likelihood of bots’ presence in the massive docket (more than 22 million comments) of a proposal by the Federal Communications Commission to roll back rules binding Internet service providers. But Paul’s work helped to prove it. His analytical and storytelling skills helped to build a clear, convincing case. His work extracting email addresses from the FCC docket provided the grist that The Wall Street Journal needed to conduct massive surveys. Thousands of responses from real people – many with comments of outrage about their stolen and misrepresented identities -- proved the nature and scope of the fraud, which the investigation showed was occurring in other agencies’ dockets as well. Paul also performed spatial and textual analyses; wrote portions of the story; and supervised the design of graphics.

What was the impact of your project? How did you measure it?

One Nation, Divisible: These stories were among the most widely read stories on our website for weeks. The stories also generated reaction in Congress. Eleven members of the House of Representatives submitted a package of bills aimed at alleviating some of the ills plaguing rural communities.

Hidden Influence/Fake Comments: Members of Congress, citing the WSJ investigation, called on the FCC to add safeguards against fraudulent comments. The watchdog arm of Congress, the General Accountability Office, this year will investigate the FCC’s information security controls, including those for comments. We also heard from Dr. John Woolley, professor of political science at the University of California at Santa Barbara and co-director of the American Presidency Project. He said he would make the story an assignment for his graduate seminar. “As a social scientist it makes me envious of the resources you were able to draw on and creativity of the research,” Dr. Woolley said in an email.

Source and methodology

One Nation, Divisible: Paul downloaded, cleaned and analyzed data from a variety of government and non-government organizations. But before that, he read extensively in the academic literature about geographical frameworks to analyze the United States. He reviewed a variety used by agencies and analysts before adapting one created by the National Center for Health Statistics. Then he standardized numerous county-level data series to this framework to create a common core of data for all six major chapters in the series.

-- Review of academic literature in urban planning, urban economics, sociology, public health, demography
-- Interviews/correspondence with experts in urban-rural studies, including unpublished work
-- Data analysis:
+ U.S. Census Bureau decennial census summary data and microdata (1930-2010)
+ U.S. Census Bureau annual county-level population estimates and components (1980-2016)
+ U.S. Census Bureau American Community Survey summary data and microdata (2005-16)
+ U.S. Centers for Disease Control detailed mortality data (1999-2015)
+ U.S. Census Bureau County Business Patterns data (2000-2015)
+ U.S. National Center for Health Statistics county-level urban categorization data (1983-2013)
+ Institute for Health Metrics and Evaluation/University of Washington enhanced disease mortality data (1980-2014)
+ U.S. Bureau of Justice Statistics National Crime Victimization Survey data (1993-2015)
+ U.S. Agriculture Department Supplemental Nutrition Assistance Program participation data (1990-2015)
+ U.S. Centers for Medicare and Medicaid Services hospital cost reports (2000-2015)
+ U.S. Social Security Administration disability program participation data (2000-2015)
+ National Campaign to Prevent Teen and Unplanned Pregnancy analysis of restricted-access birth data from National Center for Health Statistics (1990-2010)

Hidden Influence/Fake Comments:
Sources: Comments and associated metadata filed on various federal regulatory dockets, including those of the Consumer Financial Protection Bureau, the Federal Communications Commission and the Federal Energy Regulatory Commission.

South Central Los Angeles riots
Sources: Census Bureau, National Historical Geographical Information System/Minnesota Population Center.

Sources: Census Bureau 2017 county and metro population estimates and their components:, as well as three sets of county classifications created by the U.S. Office of Management and Budget, National Center for Health Statistics and Agriculture Department

Technologies Used

On these projects, Paul primarily used SQL Server, Access, Python, QGIS and R to collect, store, clean and analyze data. Because both projects required extensive collaboration with non-technical colleagues, he used Excel to share data; to communicate findings, including graphics; and to shape how stories took shape. He also used an online data tool provided by the Centers for Disease Control (CDC Wonder) to analyze mortality data.



Additional links

Project owner administration

Contributor username


Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.