Projects submitted to the Data Journalism Awards competition

Right here you will find a list of all the projects submitted to the Data Journalism Awards competition.  

Easy Money
Country: Canada
Organisation: The Globe and Mail
Investigation of the year
Public institutions
Team Members
The investigation was conducted by business reporter Grant Robertson and data journalist Tom Cardoso, both of The Globe and Mail, Canada’s national newspaper.
Project Description
For decades, Canada has been criticized throughout the world for being weak on white-collar crime. Massive stock frauds go unpunished, while lesser-known cases slip by unnoticed, and the people who make millions from these schemes get away with a slap on the wrist, sometimes only to reoffend. But how big is the problem? How easy is it to steal from investors? What tactics are criminals using to flout the system so easily? And why is Canada seemingly worse-off than some other countries? Easy Money was a year-long data investigation into white-collar crime, revealing how fraudsters commit securities offences, make millions, escape with minimal punishment, and then do it all again. The series analyzed four aspects of securities crime, spread over a series of investigative articles with accompanying charts and graphs to illustrate the data collected. The first story probed the high number of repeat offenders in Canada’s capital markets who appear to be systematically gaming the system in the name of big profits, with little threat of serious punishment. The investigation revealed for the first time that as many as 1 in 9 white-collar criminals in Canada are repeat offenders, showing that deterrence efforts aren’t working. Of those, more than 63 per cent were caught in one jurisdiction, sanctioned, then committed new crimes in a different jurisdiction, showing gaps in the system are easily exploited. The second story analyzed the tactics these serial offenders use to exploit such weaknesses. The data revealed surprisingly simple strategies, such as using a fake name, yielded remarkably effective results. The story also paints a picture of the disparity between Canadian and U.S. enforcement, exploring one prominent case were American authorities caught a securities criminal that Canadian investigators were unable to bring to justice, or even prove he existed. The third story exposed the systemic problem of unpaid fines in Canada’s stock markets, calculating for the first time nationally an accurate assessment of just how many of these sanctions are being ignored by criminals. By amassing 30-years of data from across the country, the story revealed more than $1.1-billion worth of penalties go unpaid. Regulators argue the fines are difficult to collect. But the reporters tracked down numerous assets held by criminals that the authorities appear to have missed, including a sprawling home outside Toronto that an offender hid from the law. After being informed of the oversight by the journalists, the regulator announced it was suddenly collecting on the asset. Part four looked at the lack of justice for victims, including one who was treated with indifference after blowing the whistle on a significant fraud. The story included comments from an admitted criminal who said he feels no remorse for his crimes, and has no fear being caught, which backed up what the data analysis showed.
What makes this project innovative?
This work was a first in Canadian business and data journalism, gathering and assembling 30 years of data from different jurisdictions so that it could be analyzed for the public on a national scale. This had never been done before. Given that Canada’s securities industry is regulated by a patchwork of 13 different commissions, the country lacks the kind of national analysis that was conducted in this project. The scarcity of such data has, for years, helped obscure problems within the country’s policing of the financial markets. During the investigation, the reporters created what we believe is the first truly national, publicly available picture of white-collar crime enforcement in the country. A long-held hunch that Canada had an enforcement problem was the genesis of the idea, but data reporting techniques made it possible to digitally analyze thousands of files quickly, in a way that wasn’t available even a decade ago. This made the investigation a reality – and a necessity for the public good. Finally, the key innovation in this investigation was the creation of a brand-new statistic – a national securities crime recidivism rate. The final figure, 11.1 per cent, was developed entirely through the combination of R, JavaScript and Python, which were used to collect, classify and analyze source data and apply a rigorous methodological approach. Due to the patchwork securities regulation system in Canada, the data had never been analyzed in this way, and the calculation of a recidivism figure had never been attempted.
What was the impact of your project? How did you measure it?
The articles drew an immediate response from the Trudeau government, the provincial governments, and the federal Department of Finance, which said it had taken note of the investigation’s findings and would convene a national conversation with the provinces about what could be done to tackle the problem. This conversation is expected in 2018. As well, the governments of the two largest jurisdictions – Ontario and Alberta, which represent 75 per cent of the capital markets in Canada – issued statements within days of publication calling the report’s findings unacceptable, and pledging to find solutions to the issues raised. The journalists intend to hold these provinces, as well as the federal government, to those pledges. The investigation also showed that criminals with unpaid fines are flouting regulators by ignoring those sanctions, even though they have assets hiding in plain sight. After the reporters tracked down a large high-value property that was owned by one offender who was refusing to pay his fines, the country’s largest securities commission announced it was moving to claim the asset in place of the unpaid fines.
Source and methodology
The analysis was done in code, and happened in three parts: first, by scraping data from the Canadian Securities Administrators, an umbrella organization that represents Canada’s 13 capital markets regulators from a public relations perspective, along with data from individual provincial regulators through a combination of JavaScript and R scripts; second, by classifying whether each entry was likely to be a person or company by writing a script in Python; finally, the bulk of the analysis was done using the statistical programming language R (and R Notebooks in particular) to build a reproducible process. The analysis proceeded to filter, sort, categorize and group the data to answer a series of questions, such as “what is the recidivism rate across Canada?” and “how many repeat offenders have been found to operate in multiple provinces or jurisdictions?” To answer the key question of what the national recidivism rate was for securities crime, reporters worked from the basic assumption that repeat offenders would have at least two sanctions in the database. The Globe took people in the database found to be in “breach of order” — i.e., found to have violated a previous order by a securities regulator — and used their data, namely the minimum amount of time between sanctions (the median of the bottom decile, to be precise) to establish a time “floor” with which to filter the database. By filtering for people with multiple offences, using this time variable and eliminating “reciprocal” orders (i.e. orders issued by one regulator that are then pre-emptively mirrored by another), the reporters arrived at a final recidivism rate of 11.1 per cent. To confirm that the algorithm developed in the analysis was correctly identifying repeat offenders, reporters pored through more than 80 individuals’ case files to determine whether they were in fact repeat offenders, and found clear instances of recidivism in at least 25% of cases.
Technologies Used
The reporters used a combination of JavaScript (node.js in particular) and R to create scrapers that could download all of the records across both the Canadian Securities Administrators and individual securities regulators. The person-or-company classifier was a script built using Python. The statistical programming language R (along with RStudio) was then used to build a reproducible cleaning and analysis of the datasets that were downloaded and classified using JavaScript and Python. R Notebooks were used to create a reproducible workflow that could be shared with others for independent verification of the investigation’s findings.