After an election there are two tasks for journalist. First, report the results instantly (which is easy) and second interpret the results (which is not that easy). We did these two things in an automated way and as fast as possible for all election districts in Bavaria and Hesse in autumn 2018. Our approach were auto-generated texts and visualizations based on the results of every single district. We built a statistical model that compared each district’s results with all the results of all other districts. Depending on the result, the sentences were formulated.
The starting point was a request from a SEO editor a few weeks prior the state elections in Bavaria. He wished for one HTML-page for every district, which would improve the search ranking immensely. We said, yes, sure, but we will make it great, not just a dump for percentages and colored bars. We took a user centered approach and we wanted to answer: How did my district vote compared to the other districts?
The result is a map and a search bar as an entry point to the single web pages. Right after choosing a district we show a bar chart to compare the districts with the statewide result. Then we start our analysis: Our algorithm creates an individual set of sentences that apply to the specific district. The main goal is to classify the voting pattern: Did a district vote extraordinarily? Similar to the national level? Just slightly different?
Moreover we wrote statements if a party’s result lies within the first to ten best or worst results. We also checked, if parties, who managed to join the parliament, did not archive the necessary five percent of the votes in a specific districts. We have pointed out if there were any local peculiarities in very small parties that do not play a role in reporting on the statewide level.
We considered a lot of cases and tried to translate them into natural language on a level of quality and great degree of detail that no editor could have managed to perform on 91 district on the evening of election day.
What makes this project innovative?
Oftentimes, automated texts lack what journalism is all about: interpretation and the wider context. We wanted to offer useful interpretation, not just copying and publishing plain data you can find anywhere on the internet and on TV. To do so, we created a mathematical model to tackle automated journalism’s biggest challenge: the translation from numbers to meaning. After analyzing historical election results, we chose the Jenks algorithm as the central piece for finding various deviations and similarities. The big advantage of Jenks is that they refer to the data and cluster them by the range: minimize the variance within groups maximizing the variance between groups. We didn’t use anything that can be labelled with buzzwords like Machine Learning or Artificial Intelligence. Instead we built a classical rule-based decision system. The hardest part was to construct German sentences to be good enough to be published at Süddeutsche Zeitung, a newspaper that is proud of its high text quality. We wanted the sentences to sound so naturally, that a user wouldn’t recognize them as automatically generated. The keys were the beginning of sentences that give a judgment of what’s coming, but are still independent of the previous and following sentences. For example, there is one bullet point that has 18 possible sentences that can show up in different combinations. Juggling all the possible outcomes was challenging. We considered this project an experiment. It was our first project with automated text generation in real time on election night. It worked very well, technically and contentwise. Since we could not know if our model for interpretations will work, we sat ready to finetune the parameters. But it was not necessary, all sentences made sense. So, we used the software again a few weeks later for a state election in Hesse.
What was the impact of your project? How did you measure it?
Election coverage is a low hanging fruit: the access numbers are always very high. Still, our automated coverage has surprised us all. Let’s say it like this: We used mapbox and the bill was way higher than expected. About 1.5 million visitors used the app.
Source and methodology
All scripts for data wrangling, text generation and chart creation are written in R by using packages from the tidyverse. The output are png files for the plots and a json file with the sentences. The input data with election results was very easy to access: During election night the press agency dpa pushes json files with the results on our server. As soon as we got a json file for a certain district, we were able to build the district’s webpage. We decided to use a rule-based text creation process based on the Jenks algorithm. We made the rules transparent in the methodology section. It says: “The rules of the algorithm work according to these specifications: - All results of a party in the 91 voting circles and throughout Bavaria are sorted in ascending order and classified into five groups. - Within a group, the differences between the individual election results of a party are minimized. - Between the groups the differences of the voting circle results are maximized. - Then we calculate per party the difference between the result in the voting circle and that in whole Bavaria. - The higher the sum of these differences, the more unusual a constituency has chosen.”
Katharina Brunner, Felix Ebert und Martina Schories