Project description

Refugees Are is a news analysis platform aimed to better understand the narrative shaped around refugees and migrants in the news, to highlight the issue of xenophobia which is affecting their integration in new societies and starting new lives. The platform extract news on a daily basis from the open source GDELT which tag them as refugees related news, then it crawls the websites of the news and apply sentiment analysis, topic modeling, and location extraction to get more insights. Finally all this info is vizualized on a map and graphs to better explain the daily situation to the public audience, and allow them to take part in voting on news which they find xenophobic or not. This would help the technologies developed behind the platform to improve in its accuracy to reach better auto detection. In addition, it helps increasing empathy in the normal public user who gets to read and engage with the refugees situation on a daily basis. The platform can also be beneficial for NGOs who are trying to help integrating refugees or raise funds to support them.

This project was published as an exploration project at the office of innovation at UNICEF and continued as part of a Master thesis at the National University of Ireland, Galway. After it won Techfugees Global Challenges Awards in Social Inclusion, it is getting support to be turned into a startup.

What makes this project innovative?

The project looks at the language used by the media against refugees around the world from a non biased point of views and no hidden agendas. It uses latest technologies in computational linguistcs and data visulization to tell the story in a better and easier to grasp way split in days. It has different charts and graphs based on the issue it's trying to highlight. Right now, there is no other platforms dedicated to analyze the news focusing on refugees and migrants only.

What was the impact of your project? How did you measure it?

The project was well received publicly and won Techfugees Global Challenge Awards which helped spread the word around it and get more people to use it and even vote on the news. Based on Google Analytics, the website is getting attention from returning users (12%) and average session time is 40 sec. This is considered relatively good since the website is still fresh and under improvements. By growing it, and harnessing the value of the crowd vote, this can be measured better by the accuracy of the classifier behind the sentiment score of the news articles.

Source and methodology

1- Extract daily news related to refugees from (open source news dataset) starting with one month of data (June 2018) 2- Extract locations from the article 3- Apply sentiment analysis to classify it as positive, negative or neutral aricle 4- Extract Topics in the news related to refugees 5- Extract most common words occurring with refugees 6- Visualize it in an easy way for the public to understand 7- Let the public help identifying negative news around refugees by voting on each article 8- Check the results and accuracy of the classifier and keep training to get better results

Technologies Used

Programming language for back end analysis was Python, with libraries: TextBlob, NLTK, Articles, Matplotlib, Gensim. MySql to store users votes on articles. For front-end visulization: Node.js, Leaflet.js, D3.js, bootstrap

Project members

Suad Al Darra



Additional links


Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.