Project description

We downloaded 26 million data streams from property listings. The DR’s Investigative Datadesk has collected all publicly available information about properties and grounds in the public register called Tinglysningen. The information is obtained from 30 August 2016 to 20 October 2016. 48 virtual servers in the cloud were used to suck data of all properties in Denmark into a database in DR. The purpose of collecting data on all properties in Denmark is to enable the editorial staff to analyze and understand the value of properties, ownership and borrowing in order to convey new knowledge to the Danes.All information can already be viewed on tinglysning.dk by searching an address or a landmark. What was extraordinary is that DR’s Investigative Database Operations has gathered the information in a form that allows the editors to analyze property data across the data set rather than single-storey.The work have been awarded the Nordic Datajournalism Award for best Feature 2018, with following words from the jury: "Danmarks Radio has analyzed and mapped the Danish housing market based on an spectacular data set. The data set was harvested by the editorial staff at Danmarks Radio. The stories are told in a down-to-earth language with impressive visualizations."

What makes this project innovative?

There are several benches in the process. The biggest downturn is that there is a limit on the number of requests that can be made per ip address (unique address on the internet) per day. The other benches were long black times, crashes on specific queries and capacity on the server's servers. To minimize the risk of overloading server information servers, we chose to run two servers at a time per hour. Without the ip restriction, one server would be ample. But here we created 48 virtual servers with a cloud provider - as well as a no-sql database and a key-value store.The database was configured and a complete list of all addresses in Denmark was entered. Address data is available at aws.dk, which contains all addresses in Denmark. Data maintained by Denmark's Address Register (DAR) under the Data Supply and Efficiency Board.The copy of the listing will fill 26 million rows in an encrypted database. The database contains information about 3.5 million ownership, 4.7 million creditors and 3.5 million counters, and 7.9 million waiters.

What was the impact of your project? How did you measure it?

We published more than 30 articles and infographics with the data as source. The stories covered the properties of Denmark, the worth of all properties, the worth of Lords and Barons, the tax-system of houses in Denmark and even a story that located the “castles” of outlow bikers and gangs in Denmark.

Source and methodology

There are several benches in the process. The biggest downturn is that there is a limit on the number of requests that can be made per ip address (unique address on the internet) per day. The other benches were long black times, crashes on specific queries and capacity on the server's servers.To minimize the risk of overloading server information servers, we chose to run two servers at a time per hour.Without the ip restriction, one server would be ample. But here we created 48 virtual servers with a cloud provider - as well as a no-sql database and a key-value store.The database was configured and a complete list of all addresses in Denmark was entered. Address data is available at aws.dk, which contains all addresses in Denmark. Data maintained by Denmark's Address Register (DAR) under the Data Supply and Efficiency Board.A program developed for the apartment was run on all 48 servers where Tinglysningen is asked once per address. The query per address gives a number of entries in either a dossier or a member book. Each of these is retrieved and saved.The startup was communicated to the IT Information Provider. Likewise, for the good reason, there was contact information in the statements we made.Two months later, the list of addresses was run and data was collected.Data was downloaded in DR City, where it was put into a relational database to connect data and create an overview. Items with wrong values, eg. An interest rate of 450,000 percent was sorted out by the final analysis.The database is encrypted, so only the editor's programmer has access to the information. All data handling is handled with regard to the processing of sensitive data.

Technologies Used

Since scraping are limited to about 5000 requests per IP per day, 48 servers were created in the Microsoft Azure cloud service. Each server was a small windows-server: the actual CPU-usage was very low.All addresses in Denmark was downloaded from a public datastore, scrubbed, and inserted into a DocumentDB instance in Azure. We used Redis to communicate, queue, and control the instances.A Node.js application was created and a deployment-routine was made, so that the application would auto-update and handle errors gracefully. Handling errors became quite important since the public data-API relatively often crashed. The routine was simple: get an address from DocumentDB, ask for data, save data in another table.After 2 months of running, we had the data. A dump from the DocumentDB was downloaded and imported into a local MySQL-server, where data was refined and cleansed. Most queries were made from a simple phpMyAdmin-interface and exported into Excel. Some queries where too big to do in one go, so several tables with derived data were created to make queries go quickly.Some queries were using lists of nobility, politicians etc. Here VBA in Excel was used to process those lists and query the database for each.Graphics and visual identity were created in Adobe Illustrator. Some were used in static form and others where used in a slideshow created with Hype. An interactive component was hand-coded in JavaScript: basically a calculator that would output some data based on the users input.Finally, a map was created using D3 - and the usual tools like Babel, Grunt, Webpack etc.

Project members

MADS RAFTE HEIN (designer), JENS LYKKE BRANDT (developer), BO ELKJÆR (journalist), ALEXANDER HECKLEN (journalist),KRESTEN MORTEN MUNKSGAARD (journalist)

Link

Additional links

Project owner administration

Contributor username

Followers

Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.