2020 Edition



Gesellschaft für Internationale Zusammenarbeit

This project presents a methodology for modeling crop yield variation in Eastern Zambia (EZ) based upon publicly accessible soil and weather data. For this purpose, the team employed survey data from the GIZ and COMACO to access socioeconomic data for EZ and land productivity for two value chain crops: groundnuts and soybeans. The team made data easily accessible with a dashboard.The final data pipeline allows a combination of various data sources to understand the distribution of soil properties, meteorological variables, or location-specific demographic data

For more info, see the report and for the code gitlab.

Students: Christoph Mony, Kaoru Schwarzenegger, Frederike Lübeck, Vincent Bardenhagen

Mentor: Nima Riahi


A big challenge for a fundraising organisation is the generational change of its donor base. The goal of the project was to enhance demographic understanding of the NGO’s donor base, in particular, to impute the missing birth years of 2/3 donors in the NGO’s database, which can be useful for understanding generational shifts across decades and to promote targeted fundraising campaigns. The team has developed a ready-for-use Python package for the NGO that integrates data integration, data cleaning and machine learning pipeline, facilitating accurate prediction of birth years of the donors. 

For more info, see the report and for the code gitlab.

Students: Rodrigo González, Stephanie Grimmel, Jinyan Tao, Cecilia Valenzuela

Mentor: Lionel Trebuchon

Impact Initiatives

An efficient and effective distribution of humanitarian aid calls for an accurate assessment of the help needed by affected people in crisis regions. This assessment is heavily reliant on data collected via household surveys. Due to limited geographical accessibility of crisis regions and cultural barriers, IMPACT Initiatives often needs to rely on third parties to conduct the surveys, which gives rise to the problem of possible data falsification by the enumerator which can lead to long cleaning processes The team proposed a solution that utilizes a supervised algorithm from the family of ensemble decision trees in order to learn the patterns of potentially falsified interviews.

For more info, see the report and for the code gitlab.

Students: Siyuan Luo, Romina Jafaryanyazdi, Julie Keisler, Barbara Capl

Mentor: Renato Durrer

Internal Displacement Monitoring Centre

In recent years, the importance of social media platforms, such as Twitter, in knowledge transfer and information flow has strongly increased. The team explored Twitter as a data source for Internal Displacement Monitoring by implementing a machine learning pipeline to filter for relevant tweets and extract important information. 80% of tested tweets are labelled correctly by the classifiers, which gives confidence in its performance. Additionally, a custom trained name entity recognition algorithm (NER) was developed to enable extraction of the most important information from the tested tweets.

For more info, see the report and for the code gitlab.

Students: Gokberk Ozsoy, Katharina Boersig, Michaela Wenner, Tabea Donauer

Mentor: Jean-Claude Ton


The team performed an analysis of past helicopter missions for the Swiss air-rescue Rega. The goal is to provide tools that analyze the timing of Rega’s rescue missions and to help with helicopter dispatching in future missions. The team predicted the helicopter flight times for future missions using flight times from past missions, in order to dispatch helicopters more effectively. The dispatching rules are based on expected mission times from the different Rega bases. The team made their analysis and predictive models available to Rega dispatchers within an interactive web application.

For more info, see the report and for the code gitlab.

Students: Philip Jordan, Christoffer Raun, Xiaoyu Sun, Matus Zilinec

Mentor: Yevgeniy Ilyin