Project description

Presidential inauguration speeches sets the expectations around the new administration and show the new guidelines, continuities and changes from previous governments. Which can be even more important if this speech comes from a disruptive president such as Jair Bolsonaro.

In order to have a better understanding of these first words of the new Brazil’s president, we compared his inauguration speech with the previous 18 presidents, including the ones during the military regime. This comparison was a empirical approach to check whether he would use similar ideas as the military as many people were expecting, based on his on declarations during the elections.

We learned that Bolsonaro was one of those who more mentioned greetings, and barely spoke about economics or external affairs.

This report was published only a few hours after his inaugural speech, providing a powerful summary for those who missed the inauguration ceremony, to get a broader sense of his ideas and compare to which president his speech would be most close to.

What makes this project innovative?

The project combined both a documental research effort and statistical analysis to perform the comparison between Bolsonaro and the previous presidents. For the first effort, the transcriptions were recovered from the Presidential Library, the Electoral Justice and academic publications. Other effort was to recover the records of these speeches, so the readers could not only read but hear how was the speech given. The most recent ones were available on YouTube and a few of the older were posted in historical archives. All the older speeches were prepared before Bolsonaro’s inauguration. They were processed as a dataset with codes to deal with text analysis. Later, the challenge was to be able to compare the speeches quantitatively. The solution was to use topic modelling to sort all the collected text into comparable categories. At the Bolsonaro inauguration day, we quickly transcribed the speeches and added them to the model, generating comparable quantities with the previous ones. These both approaches resulted in a mixed method with qualitative and quantitative tools that presented an unique historical panorama of Brazil presidents. Using a statistic tool, we categorized 63,000 words in eight categories, in order to provide a straightforward overview from multiple speeches. The audience had access to a wide variety of content, such as infographics that summarized the findings and excerpts in text and audio clips, exemplifying these historical speeches.

What was the impact of your project? How did you measure it?

This project performed above average, in terms of Folha de S. Paulo’s audience and retention time. A national newspaper, Folha is one of the largests media outlets in Brazil. On the printed issue, the project had a spot in the front page.

Source and methodology

The qualitative research was performed weeks in advance from the Inauguration Day. The speeches were found in the online archives of the Presidential Library, the Electoral Justice and from academic publications. We collected each speech into a single PDF archive with OCR or into a plain text. To get comparable quantities from this typology between presidents, we decided to use topic modelling. The topic models estimates clusters of words (topics) that appears close to each other more frequently and estimates the proportion of each topic for each excerpt of text. The modelling required specific text processing to extract the radicals of each word and cleaning to remove repetitive and meaningless words, such as dates or headers from the documents. For each estimated topic, we classified it using a typology developed by an academic expert on presidential speeches themes and ideas. The classification was based on the most frequent word radicals in each topic and on text samples. On the Inauguration Day, we estimated the model once again, but now including Bolsonaro’s speech that had just been transcribed. The model output was organized into infographics to facilitate readership comprehension, and we also wrote short texts highlighting the main findings. For each theme, we also presented a sample audio from historical speeches.

Technologies Used

To recover the text from some PDFs, we used tesseract, an open source text recognizer (OCR) engine available for Ubuntu Bionic 18.04. For text processing and data analysis, we wrote scripts in R 3.5.0. Topic modelling is implemented in the stm library within R.

Project members

Marina Merlo, Simon Ducroquet, Rogerio Pilker and Thiago Almeida



Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.