Jens Finnäs, founder and datajournalist at Journalism++ StockholmMåns Magnusson, statistician, Linköping University
Newsworthy is a news service that helps local newsrooms find stories in data. Newsworthy monitors statistical databases and notifies subscribing newsrooms when it finds local trends and anomalies in data: a theft peak, a new trend in housing prices or a temperature record. We call these items newsleads. Newsleads are findings in data that the journalists should look into.When we find newslead we generate a short text, a chart and an Excel sheet with underlying data. The goal is to give the local reporter all the context needed to either publish a simple news story or dwell into further research on the topic.Newsworthy was launched as an independent news service in Sweden the fall of 2017.
What makes this project innovative?
We have seen plenty of examples of automated journalism (or “robot journalism”) in the last couple of years. But whereas most of these examples have focused on quantitative content creation, Newsworthy tries to automate the process of finding news in data.The team behind Newsworthy consists of leading data journalists and a PhD student in statistics. We use bayesian statistical models to detect significant trends and anomalies, allowing us to process haystacks of data and find the interesting needles.Newsworthy is a news service that aims at being completely code-driven. Everything we do is reproducible next month, quarter, year or whenever the source dataset is updated. This allows us to scale the service as time goes with minimal labour.Being code-driven also means we can be fast and accurate. Within an hour of the publication of new data we are able to send our subscribers newsleads. And by working in code we minimize the risk of errors.
What was the impact of your project? How did you measure it?
Since October 2017 we offered local newsrooms across Sweden a trial version of the service. We have delivered thousands of newsleads to reporters in most of the large local news groups, including public service. For each batch of leads delivered, we see stories done that would unlikely been discovered and written without Newsworthy. Sverige Radio Jämtland was for example able to connect a peak in property sales to public procurement of social housing:http://sverigesradio.se/sida/artikel.aspx?programid=78&artikel=6835325 Apart from the “regular” newsleads sent out when we find exceptional trends and peaks, we also do in-depth reports on specific topics. A public example can be seen here: www.newsworthy.se/sv/report/crime/ [Swedish], where we have been analyzing 20 years of reported Swedish crimes, for 1,244 detailed, local reports.The project has been presented as a conference paper at the Stanford university Computational Journalism conference (see links).
Source and methodology
Newsworthy works on a plethora of public sources such as Brå, Arbetsförmedlingen, SCB, SMHI, Försäkringskassan, Svensk Mäklarstatistik, Eurostat, and others. The list of sources is constantly expanded. The workflow consists of: 1. Scrapers, collecting statistics and storing it in a central database, in a standardized format (building on another J++ project, Statscraper, see below) 2. The “Robot Detective”, that does the statistical analysis. The detective applies a number of statistical algorithms to the data, searching for complex anomalies like trend breaks, but also simple things like new extremes. The output is an “alarm”. 3. The “Robot Reporter”, that puts the alarm in context. The robot reporter checks if the alarm looks newsworthy, taking e.g. earlier alarms and neighbouring regions into account, and if so, what makes the alarm journalistically interesting. The reporter compiles all data needed to describe the event. The output is a ”lead”. 4. The “Robot Writer”, that produces texts, charts and Excel sheets from a lead. 5. A user management system, where journalists can view newsleads from their region, and sign up for email alerts As an offspring of the project we have developed Statscraper, a Python framework for building statistical scrapers.https://github.com/jplusplus/statscraper/
Scrapers: PythonStatistical analysis: RJournalistic analysis: PythonText and graph templating: Node JS