This entry consists of a selection of deep data journalistic investigative pieces related to political misinformation over the past year. It\'s part of a monumental effort that\'s been featured/referenced in more than 200 major news publications, broadcast television and radio outlets, and research and policy reports. The approach taken in this portfolio includes large scale \'medium\' data analysis, straightforward interpretation, and a focus on transparency and reproducibility in findings. Data work and research conclusions from the posts submitted here have been included in two different New York Times front page stories, three Washington Post front page stories, a March Guardian print edition cover, Wired magazine\'s monumental March 2017 11,000 word feature story, and hundreds more major investigative reporting and political reporting pieces. Results and recommendations from this corpus of work have also appeared in charitable foundation, industry reports and are cited in an increasing range of scholarly work. On October 9, 2017, stories resulting from different branches of the data in this portfolio appeared on the cover of the Washington Post and New York Times print editions on the same day. The visualizations from the Facebook, Instagram, and Twitter on Tableau--connected to the posts in the pieces submitted here--are approaching 75,000 views, and one of the linked data sets (\'IRA Facebook pages\') was the number one data.world shared data repository of 2017. [https://blog.data.world/our-top-10-datasets-and-projects-of-2017-e058fda31d8d]
What makes this project innovative?
This portfolio of work submitted to this year\'s DJA is arguably the single largest collection of cross-platform investigative data journalistic work on election misinformation and politically-focused media manipulation to date. It is vast in scope, sourced from and linked to enormous quantities of open source data, and uses primarily reproducible methods and freely available and/or open source tools. Each of the posts and linked data sets have been cited, downloaded, and re-used by journalists, researchers, and policymakers. There are numerous high-quality complex data visualizations in the Tableau Public sets, including analytics charts, time breakdowns, and public propaganda impact graphs. This portfolio also has examples of large-scale immersive network graphs--including a map of the vast ecosystem of misinformation and political ad tech (along with a 570-page downloadable report of ad tech) and a \'breaking data\' investigation into the YouTube conspiracy ecosystem and disinformation suggestions through algorithmic recommendations for \'crisis actors\'. Additionally, this portfolio contains the first comprehensive analysis on the role of the Internet Research Agency\'s Twitter accounts\' use of mainstream and local media outlets, showing \'troll\' accounts\' broad news-linking patterns. There is also an in-depth analysis and explanation of Cambridge Analytica\'s use of sentiment-mining and geolocation scripts related to Twitter data collection and voter-targeting, and a final post covering Facebook\'s Graph API in technical detail--meant to be an explanatory data piece for journalists covering the March 2017 Cambridge Analytica Facebook revelations. Last but not least, there is a monumental deep data journalism investigation of the Internet Research Agency\'s presence on Instagram based on the author\'s archival efforts paired with a three-month analysis effort.
What was the impact of your project? How did you measure it?
The work published in this selection of investigative data explorations is unequivocally critical of large and typically \'closed\' platforms. For better or for worse, the data and evidence presented here has helped result in policy decisions, including revelations that are shown in Facebook\'s October 6, 2017 admittance to Instagram as part of the Internet Research Agency\'s campaign around the 2016 United States election; a decision by YouTube to add Wikipedia links to conspiracy-related videos; and the removal of a number of identified highly offensive and racist search suggestions. Work in this portfolio has been directly referenced in election investigations by United States Senators, including Sen. Diane Feinstein, and cited at Congressional hearings. This project, as a whole, sheds light on the role of mainstream media, Twitter, and Youtube and highlights the need for more transparency in technology and social platforms. It is not meant to be persuasive, but rather provide a rich data resource based on open data that can provide a template for increased understanding. As time passes, I believe the range of data and investigative work shown in this portfolio will have additional impact on policy and public understanding. I also intend for this work to help journalists and newsrooms directly, acting as a bridge between reporting and investigation, traditional data journalism, and hands-on academic research.
Source and methodology
Each investigation uses a selective source of data and methods appropriate for the subject of analysis--methods are clearly identified in post and link to openly shared data repositories for download and finding reproducibility. All data and tools with the exception of Tableau Deskop/Professional (free for academic research/teaching use) are free or open source.
Analysis tools include Google Apps, Gephi, Tableau Desktop/Professional, Threatcrowd, and Maltego; data access methods include Crowdtangle, Socialblade, Klear and YouTube\'s Instagram\'s and Twitter\'s API; data presentation tools include Gephi, Sheets, and Tableau Public