The team is currently made up of Caelainn Barr, the editor of the data projects team; Pamela Duncan, data journalist on the data projects team. The work of Helena Bengtsson, former editor of the data projects team until November 2017, is also included in this application. The data projects team is unique in that our primary focus is on collaboration with other reporters and visual designers in the newsroom - almost all of the work we do is a collaborative effort and the stories usually originate from data. The reporters, developers and editors we collaborated with for the entries named above are - Josh Holder, Niko Kommenda, Nicola Davis, Denis Campbell, Anna Bawden, Caroline Bannock, Richard Adams, Rachel Obordo, Alex Hern, Cath Levett, Paul Scruton and Nick Hopkins.
In the year covered by the awards we have worked on many projects, big and small. The pieces we have submitted include Beyond the Blade, the Paradise Papers, Laundromat, the Colour of Power and an analysis of the deprivation in Kensington & Chelsea in the aftermath of the fire at Grenfell Tower. The projects included in this application showcase the work of a small team of two to three people but one which is passionate about data journalism and the contribution it can make in the newsroom and to society as a whole.
Beyond the Blade
At the beginning of 2017 the Guardian began Beyond the Blade, a project to explore knife crime in the UK. In order to truly understand the issues around knife crime the team needed data. We wanted to know - who are the children and teenagers dying in the UK from knives? The answer could help us and our audience understand more about knife crime and the people it affects. The data team created a structure for reporters to be able to track all the deaths in 2017 but we also wanted to know can the longer term trends in knife deaths tell us anything about the nature of the crimes.
Caelainn Barr spoke to statisticians, police officers and criminologists and found the data exists but isn’t published. It is contained in the Homicide Index, a record-level data set of every homicide in England and Wales, maintained by the Home Office. Barr submitted a Freedom of Information (FOI) request to the Home Office for the number of homicides by ethnicity, gender and age as determined by five-year age bands, by police force area back to 1977. In the event the Home Office didn’t reply Caelainn also submitted FOI requests to all 45 territorial police forces in the UK.
Although some police forces released data the coverage gave an incomplete picture. The Home Office failed to release the data and Caelainn successfully challenged the actions of the Home Office through the Information Commissioner, which resulted in the office releasing the data. In order to track deaths during 2017, Caelainn created a dataset by counting the deaths as they happened, sourcing information from police reports, news-clippings and Google alerts. By combining the data sourced through FOI and our own dataset we had a complete picture of knife deaths in the UK over 40 years.
Caelainn analysed the data and observed that 2017 was on track to be one of the worst years for deaths of children and teenagers in four decades. The figures also challenged the perception of knife crime as an issue that predominantly affects young black men. The data showed most deaths in London were of young black men but outside the capital the picture is very different. The findings caught the attention of policy-makers and upended the narrative that knife crime is primarily a problem among the black community. The data collected by the team also served as the basis for exploring the lives of the victims of knife crime and the circumstances surrounding their deaths
Colour of Power
Colour of Power, a collaboration between the Guardian’s Pamela Duncan and Operation Black Vote, aimed to fairly investigate the make up of Britain’s top political, financial, judicial, cultural and security figures almost two decades after the 2000 Race Relations Act and seven years after the 2010 Equality Act. We expected the number of people of ethnic backgrounds (BAME) to be low but were still shocked at the findings. Of the 1,049 individuals listed just 36 (3.4%) were from ethnic minorities. Just seven (0.7%) were BAME women. This, as the piece rightly points out, represents a “grotesque disconnect” with the composition of the UK population, almost 13% of which is of a minority background.
A leaked cache of bank documents showed a money trail stretching from Azerbaijan to the UK, where money had been laundered and washed through a string of Scottish companies. Our data analysis, in collaboration with an international group of investigative reporters through OCCRP, revealed how the money made its way into the UK, what it was used for and how a little known company structure - a Scottish Limited Partnership - has become a prime vehicle for laundering assets into the European Union.
Another project aimed to find out how often electroconvulsive therapy carried out in England each year. The received wisdom is that this poorly understood treatment is on the wane. However, when the Guardian investigated, we soon found that the existing statistics were out-of-date, “experimental” or not comparable across years. In an attempt to bridge the data gap the Guardian requested figures from every English NHS Mental Health Trust under the Freedom of Information Act. The dataset, compiled by Pamela Duncan in collaboration with Nicola Davis, provided a more comprehensive picture than any other such collection within the past 20 years.
Oxford accused of social apartheid
Helena Bengtsson analysed data from hundreds of Freedom of Information requests sent to individual Oxford and Cambridge colleges on their offers and admissions data. Analysing the admissions data from colleges, broken down by ethnicity, she found that nearly one in three Oxford colleges failed to admit a single black British A-level student in 2015. Similarly 10 out of 32 Oxford colleges did not award a place to a black British pupil with A-levels in 2015, the first time the university has released such figures since 2010. Oriel College only offered one place to a black British A-level student in six years. Similar data released by Cambridge revealed that six colleges there failed to admit any black British A-level students in the same year.
The death of dialect?
Data journalism is about much more than providing numbers for a story: in November 2017 data, this time in the form of words donated by the general public, was at the heart of a story Pamela Duncan wrote on British dialect. We collaborated with the British Library to find two new words in the UK dialect by calling on the Guardian’s reader community to tell us of local dialect in their area. A total of 1,200 words and phrases were contributed, 920 of which were unique entries. The British Library found that many of the words and phrases were previously recorded as dialect in sources ranging from the Oxford English Dictionary, the English Dialect Dictionary, the Dictionary of Scots Dialect and many sources in between.
The Paradise Papers were primarily based on leaked data, which was cross-referenced with lists of data from property, company ownership and global sanctions lists by Helena Bengtsson during her time at the Guardian. The findings served as the basis for leads and further reporting for Guardian and other ICIJ journalists to write stories, including a number of front page pieces.
How Russian trolls infiltrated UK media
In November 2017 data team members Pamela Duncan and Helena Bengtsson, in collaboration with technology journalist, Alex Hern, reported that members of a Russian “troll army” were quoted more than 80 times across British-read media outlets before Twitter revealed their identity and banned them, a Guardian investigation has shown. The Guardian scraped the archives of 14 British news organisations, including the Telegraph, Daily Mail and the BBC, for every usage on their websites of any name from the list of 2,752 Twitter profiles flagged by Twitter as having been run by the Internet Research Agency. The study also examined three US-based news media organisations – BuzzFeed, the Huffington Post and Breitbart – that have substantial British readerships.
Wealth and poverty in Grenfell Tower\'s borough
Caelainn Barr set out to map the deprivation of the Royal Borough of Kensington and Chelsea in the day after the fire at Grenfell Tower, which killed 71 people. Mapping the ward-by-ward deprivation helped to highlight the stark inequalities in one of the country’s most wealthy neighbourhoods. Although many of our projects are medium- to long-term projects, there are times when we are required to turn around stories in a hurry while adhering to the high standards expected of the Guardian data projects team. This data exercise, undertaken within 36 hours of the fire, revealed the huge wealth disparity in what is one of the UK’s richest boroughs.
What makes this project innovative?
The Guardian Data Projects team are a small team located in the heart of the Guardian newsroom. Collaboration is at the very core of what our team does. The team currently consists of Data Projects Editor, Caelainn Barr and Data journalist Pamela Duncan but, for most of 2017, the team was headed by Helena Bengtsson (whose work is also reflected in this application). To the best of our knowledge the data projects team is the only all-female data team in the world.
We have developed our team to work closely with reporters from across the newsroom, involving ourselves in projects from the start, where possible, rather than delivering a data table or numbers for an article when it’s published. We work with data analysis and investigations in focus - collaborating with reporters, editors and the visuals team doing stories that reveal international corruption to daily breaking news stories where data can bring depth and context other sources cannot. We use data to find the subjects we should be reporting on, where to go to report, who to talk to, and what questions to ask. Some of the best pieces of data journalism we work on may not contain any numbers at all, but data is invariably at the core of the stories, features and longer-term projects we are involved in.
We come from a variety of backgrounds: Caelainn Barr has a background in investigations and has worked at the Wall Street Journal and the Bureau of Investigative Journalism; Pamela Duncan started the datablog at the Irish Times. Helena Bengtsson worked as a programmer in the early 90s before studying journalism. She previously worked for many years with the investigative team at the Swedish national broadcaster SVT. We work in close collaboration with the Guardian’s visuals team, which produces graphics and develops interactive digital formats for our journalism.
What was the impact of your project? How did you measure it?
It is not always possible to trace the impact of every story. However, the Beyond the Blade project is a perfect example of how data journalism can have a wider impact. The story attracted the attention of policy-makers, prompting questions in the House of Commons and discussion at select committee hearings. Following the reporting the team was invited to speak to the Solicitor General, Robert Buckland, about making better use of the homicide data held by the government in informing policy decisions. The story was the culmination of many months of investigation and had real impact, demonstrating we cannot make impactful policy if we don’t understand the issues behind knife crime.
Source and methodology
We spend time gathering data, usually from several different sources, and then structure, clean and analyse it to find the story. This can involve using access to information laws, including the FOI Act 2000 in the UK, to uncover previously unpublished figures, scraping and converting and cleaning unstructured information into data we can analyse for stories. Wherever possible, the Guardian data projects team sets out its methodology online in an article accompanying our journalism. Articles outlining the methodology allow us to be transparent about our approach and inform our audience as to how we have completed our work.
The team works with a variety of software tools primarily Excel, SQL, QGIS and R code using RStudio to analyse data. Although we do work with existing data sets to find stories, as in the case of examining inequality in Kensington and Chelsea, we increasingly source data from various unstructured sources and create data sets for our stories. In order to do this we also make extensive use of the FOI Act in the UK, access to information laws across Europe as well as scraping and document conversion tools. Among the other techniques and tools we have used for the above stories are - basic terminal code, regular expressions, AbbyyFinereader, OpenRefine, Webscraper.io, Overview, Google Sheets, DownThemAll! and csvkit.