Especiales Datasketch
Country: Colombia
Organisation: Dataksetch
Best data journalism team
Artificial intelligenceInteractiveCollaborationPersonalisation
Juan Pablo
Marin Diaz
Team Members
Maria Isabel MagañaVerónica ToroJuliana GalvisCamila AchuriAna HernándezDavid DazaAndrea Cervera
Project Description
We wanted to showcase the great you could do with data. We started with small simple pieces but we wanted to build something with more impact. That\'s why we decided to try in depth pieces on different topics.The first one started from a random question we heard: Do the poorest neighborhoods in Bogotá have fewer trees? So we built Árboles de Bogotá We took the public census data of trees in Bogotá, that is 1.2M trees, we told different stories with maps, narratives, and statistics, we even tried to get users to help us build the largest botanical catalogue of trees in Bogotá with not much success. Finally we gathered audio stories from citizens through whatsapp and allowed users to explore all trees through a web app.The second story in the series was about understanding the problem of violence against women, we moved beyond femicides and build many databases from multiple sources to tackle the issue. We even made an online survey that allowed women to compare themselves to other women who shared their story of violence. What we liked the most was that we ended up making a public intervention with an NGO to put preassure on the government to act upon this issue. The third story was about keeping alive our culinary heritage with new digital tools. It happens that for 30 years the Ministry of Culture edits a book about traditional dishes by region in Colombia. We scraped the pdf and created a database of all dishes and ingredients of Colombian cuisine. Different stories from chefs and grandma\'s put the heart on the numbers behind the ingredients. And finally the most recent one is about opening up data for voters to make better decisions in Colombian congress elections. We used different apps and in depth investigations to unveil circles of power and obscure relations between politicians. For the first time in our country data about the criminal investigations of candidates, contracts and even transit tickets was available for every colombian to download. Besides the fun of building these special reports, we are aiming at teaching journalists and enthusiasts about what\'s doable with data so they use the tools we are building. From an open database of political connections ( to an open source Tableau alternative (
What makes this project innovative?
Our team is comprised of 4 technical people and 4 from the humanities. While the tech team moves towards learning how to tell stories with data, the humanities team moves towards learning programming and using our internal tools to simplify the workflow with data. With such a small team we have been able to create many interesting things, not only the stories, but also the software that supports our operations, from public databases to packages that help us build beautiful static websites with data visualizations without the need of programmers.We\'ve experimented a lot, and have had a few hits, from a software package to make maps by copy pasting to have stories told as if you were texting, from automatic data cleaning routines, to have citizens send us their audio stories through whatspp to other filling out a 30minute survey to help other women with the violence they are facing.What\'s most innovative about our approach is that we are changing the model on how data is perceived in newsrooms in Colombia. Now that we are so open with our data and software, other newsrooms start to look out of fashion, at least to the small niche that knows us thus far. We are leading the change of paradigm in which journalists should stop competing and data sources and start competing in analysis and story telling. By the time they get there, we will be ready with the software they will need on a daily basis.
What was the impact of your project? How did you measure it?
Even though our special reports are online, we have managed to have impact offline with different interventions we make with data. From making big mosaics to show the numbers of femicides in Colombia , to have tour buses telling stories of corruption around the city. Our initiatives with analog data visualization have been replicated in different cities in Latin America, like Buenos Aires, Quito, México and San José de Costa Rica. Our digital stories have been featured in news papers in Colombia and were even featured at a TV show for open data initiatives with impact. These, we don\'t want to take as successes thou, we want them only to be a push in our quest for Data Literacy and Data Driven Decision Making at all levels, especially when the public interest is involved. We want to make citizens aware of data, and the power it has to make a better society. That will be our ultimate impact. Still long way to go, but getting there with steady steps.
Source and methodology
Let me tell you a short story. I\'ve had the chance to work in data projects with high impact, like organizing the database of all the victims of 50 years of Colombian conflict. In that case analyzed this large database. I remember seeing the number 14 once. It was way more than a simple number, it was the story of a woman who was sexually assaulted 14 times. On the other side you have people that are not even worth a number. That is the case, among others, of the LGBTI community in Colombia. According to official statistics among those 200.000 victims of the Colombian conflict, there are only 2400 registered LGBTI victims. There is clearly huge underreporting of the numbers, and the reasons behind it are heart breaking. So, in every project we work we move between the extremes of having a large database of untold stories, and on the other side, having no database of untold numbers. We start by collecting the available databases, by FOIR or scraping, when we don\'t have the data we build it ourselves, or we make surveys. In every case we start by finding the human angles, either from the averages or from the specific person story. We then create different interactives and usually have descriptive graphs for every story. We also always open the data and make it available for download with an open license.
Technologies Used
We do almost everything in R. We create our own packages to make it more efficient for us to create static ddj sites, we package many different visualization types into packages, and we are building our own visualization recomendation engine ( Javascript is always in the mix, for the interactives we built with to Javascript + R and for the CMS sites with NojeJS. In terms of database we like google spreasheets when updating data live is necessary, graph databases like Neo4j to structure data and provide backend for our own API\'s and SQLite for static databases ready to be explored.