1. Leprosy of the Land: interactive map of illegal amber mining sites in Ukraine
We used deep learning model to search satellite images of 70,000 km² in northern Ukraine for places with traces of illegal amber mining. For the first time we have estimated the environmental impact caused by crowds of prospectors in search for gems. Since 2012 thousand of hectares in forests / agricultural land became a desert. We used machine learning to detect satellite images with amber-mining and presented first interactive map of these places. Current English version of a project was published in April 2018 when we changed and retrained our model and published new version of map after additional reviews and feedbacks from domain experts.
2. Presidential Election-2019 in Ukraine («Poll of polls» model)
To counter bad habit of politicians who are using biased polls as a political advertisement, for the first time for Ukrainian elections we applied “poll of polls” model. In such way we tried to obtain genuine level of support for politicians from different polls. We published a data along the interactive charts. The results of our model (as of two days before voting) are in close agreement with results of National exit-poll at first round of presidential election.
3. The Storeys of Loneliness
Kyiv reminds of New York at the end of 19th century, when the city had being built without any regulations. Theoretically there are rules and city plans, but practically developers build as many floors as they can. We decided to compare the height of buildings in different cities, using 3D models of 5 cities (Kyiv, Manhattan, Berlin, Bucharest and Warsaw).
4. 0007: Like Bond, James Bond
We have analyzed millions of records from car’s registration database from 2013 to 2018 and found abnormally high amount of number plates with some “symmetrical” or “nice” 4-digit combinations, such as 7777 or 0001. Our investigation confirmed that many car owners pay substantial sums of money to buy such numbers, either from officials – as a bribe, or through middlemen. The approximate size of such gray market is at least tens of million USD.
5. Black Friday in Ukraine. During May-November 2018 once a week we collected data on the prices of goods (clothing, laptops, smartphones, equipment etc.) from popular Ukrainian internet-shops. We compared such prices with ads from Black Friday. Only 7% of declared discounts were honest. The charts in the project display 7.5 thousand products.
6. The Closest School of the District: In 2018 the school admissions system changed so it became harder to enter a \”steeper\” school. Though the new system is more fair for average family, reformers faced other problem – lack of schools in new residential areas. We created the map of Kyiv public schools’ zones with the addresses that belong to each school. Our map shows inconsistencies of new school system: better schools are assigned to fewer houses.
What makes this project innovative?
All of the projects from this portfolio start from the idea to study the topic for the first time or to give new view of existing topics. First project in portfolio was accepted for oral presentation at the Computation + Journalism Symposium, Miami 2019 as “a state of the art example of the use of satellite imagery for journalism”. Our deep neural network based model allowed to automatically search the area of around 70,000 square kilometers and build the map of illegal amber mining. With the story "Black friday" we have tried, for the first time, to prove or disprove popular thoughts about falsehood of Black Friday’s discounts. Project about registration plates, for the first time, gave data-based evidence about existence of huge “vanity fair” gray market of “special” numbers. Model of aggregated polls is a first successful example of probabilistic (bayesian) approach to solve a problem of biased polls in Ukraine. In "The storeys of loneliness" we used unusual type of chart - “city’s barchart”, to compare a “height profile” of a different cities.
What was the impact of your project? How did you measure it?
“Leprosy of the land” had a very good reception and was reposted in form of shorter story many times (for example by Norway broadcaster and by GIJ Network). François Chollet, an author of Keras deep learning library, retweeted information about project to 138K of his followers as interesting application of using neural networks for public goods. Besides presentation on Computation+Journalism - 2019 symposium, we conducted several public lectures about methodology, attended by more than hundred of potential users. Our interactive map was used as one of the arguments during public discussions with government officials.
The other one - "Model of aggregated polls for presidential election" - drew attention from international bodies, ambassadors and political experts (we've detected dozens of retweets from such audience). Some prominent Ukrainian sociologists endorsed our approach and the way we presented polling results. Our readers have got a more clear picture about polling data. Project have reached 745 FB shares, is actively commented in social media, and reached more than 30K unique users.
These 6 projects in portfolio have longer watchtime than average: 1,5-2 times bigger than all data journalism projects last year. Total amount of Facebook shares for these projects is about 2,4 thousands.
Source and methodology
For the "Leprosy of the land" we used machine learning to search ~450 000 satellite images. We used first set of satellite images from known locations of illegal amber mining to create initial training dataset. Each image from this set was labeled as positive (with traces of amber mining) or negative (without such traces). Next we applied transfer learning with ResNet50 Deep Learning network and trained XGBoost binary classifier on top of extracted vector features of each image. Detailed methodology (with code examples) is open-sourced: https://github.com/texty/amber-methodology. Model of aggregated polls for presidential election-2019 in Ukraine: we applied bayesian hierarchical space-state model to aggregate results from different polls (kind of a “poll of polls” model). This approach allowed to interpolate levels of support for periods when there are no polls, and to shrink results from different pollsters – with different methodologies and biases – to some “aggregate” values. We collected data about each available poll result from press releases and published it along the interactive charts. For rest of our projects we use open data from the government and combine it with other data and conduct journalist investigation based on it (plate numbers, schools). Also we actively used data scraping for projects about Black Friday prices and special plate numbers. In case of The “Storeys of loneliness” map data from OpenStreetMap was used. In case of a project about school we created map data about each school coverage manually, based on data from information requests.
Anatoliy Bondarenko, Nadja Kelm, Yevheniya Drozdova, Roman Kulchynskyy, Vlad Herasymenko, Yaroslava Tymoshchuk, Nadiia Romanenko, Mykola Dobysh