Analytics club at eth
There have been a total of 6 editions of Hack4Good since its initiation in Spring 2019.
«NLP for Policy Trend Analysis»
Using naturallanguage processing and data analysis techniques, the team processed PDFscontaining reports and statements related to technology policy. The objectivewas to identify key skills and capabilities linked to emerging technologytrends. The project provided visualizations and interpretable analyses,offering insights that help policy makers understand the evolving technologicallandscape. This work contributes to the understanding of how technology skillsare shaped and implemented across industries and countries, fosteringinnovation and adaptation in the face of rapid technological advancement.
Mentor: David Hofer
“Survey Transcript Processing and Analysis”
IMPACTInitiatives aims to inform humanitarian action by providing timelyinformationabout the need of population and thus, the information coming from thoseplacesneeds to be processed quickly. In this project, we received data from variousinterviewsconductedin different parts of the world. Given these interviews, we needed toconsolidatethem into aconcise Data Saturation Grid, showcasing the resources needed according toeachinterview and providing a summary of the overall issues the areas are facing.
Mentor: Pablo
"Improving communication in WWF"
With a large NGO such as WWF there is a large number of texts being published by many different departments and individuals. With this and the additional time pressure to publish certain statements, the ability of the communications department to check the texts on communication standards is not always given. We are therefore tasked with developing a tool that can catch communication mistakes before anything is published.
Mentor: Chrysander
“Streamlining information flow in the SET Alliance»
**BASE: Basel Agency for Sustainable Energy**
Utilizing advanced natural language processing and web scraping techniques, the team developed an automated system to aggregate and analyze vast amounts of data related to the global energy crisis and sustainability efforts. The primary goal was to streamline the process of gathering relevant information from a multitude of sources and provide timely, insightful reports for policy makers and energy experts. The system focuses on three core functions: creating a curated archive of energy-related articles, sending bi-weekly newsletters highlighting the most referenced topics, and offering a dynamic dashboard with visual analytics on the frequency of key energy-related terms over time and across different regions.
Mentor: Fredrik
The team worked on the development of a multimodal model to classify crop damage by integrating image data, climate data, and remote sensing information. The model was designed to identify early signs of crop damage, even in the absence of images, leveraging the combined power of multiple data sources. This approach aims to provide timely and accurate predictions of crop health, assisting farmers and policymakers in making informed decisions to mitigate the impact of climate-related risks on agriculture.
Mentor: Pepa
“Anomly detection with night-time light data in armed conflicts»
In conflict situations, timely reporting is critical for enabling swift humanitarian responses. This project investigates the potential of using nighttime light (NTL) data to remotely identify signs of conflict-related activities, such as escalations in violence or forced displacement. By analyzing changes in nighttime light patterns, we aim to provide an additional data source to support humanitarian efforts in affected regions.
To achieve this, the team developed a robust data pipeline to automatically download and process NASA’s VNP46A2 Nighttime Light product. Preliminary statistical analysis was then performed to detect anomalies in light patterns, which can serve as early indicators of conflict, particularly in the war-affected regions of Ukraine. This approach offers a remote sensing method that could enhance conflict monitoring and help facilitate timely responses to emerging crises.
Mentor: Jan
«Detection and analysis of town burnings with satellite data»
In Darfur, armed conflicts have been leading to several war crimes, including civilian settlements being burned down. The project aims at identifying these events in a semi-automated manner, through remote-sensed fire detection and settlement data. FIRMS data and an edited Grid 3 settlement dataset was used to create a Telegram bot that will send daily alerts to HRW when a potential village burning occurred, and allow for a historical analysis of the fires in the region.
Mentor: Chiara
Using natural language processing techniques, the team analysed Twitter data gathered through the Twitter Research API with the goal of linking countries’ STI (Science, Technology and Innovation) strategy priorities and instruments to the public discourse on Twitter to help policy makers understand the impact of their policies in domains such as innovation for sustainability and the green transition. The participants had the opportunity to present their findings to an international audience of country delegates and policy experts at a meeting hosted by the OECD in Paris in December 2022.
Students: Aurèle Bohbot, Fabian Otto, Mingyang Yuan, Songyi Han
Mentor: Malte Toetzke
The local geography dictates the agrarian population in northern Nepal to live in scattered settlements. Over the past decades, with the contribution of Helvetas, many trail bridges have been built, facilitating communication and movement of goods, services, and people. This project measures the impact of trail bridges in Nepal using satellite images with a focus on the change of settlement patterns and explores the potential of satellite imagery in large-scale countrywide impact measurement.
Students: Yuchang Jiang, Franz Görlich, Radenko Tanasić, Manolis Vardas
Mentor: Nando Metzger
Nitrogen emissions caused by agriculture and agricultural construction projects damage various sensitive ecosystems such as forests, dry meadows, or peatlands. WWF and other environmental NGOs regularly control whether these projects respect the environmental law. Due to the large number of corresponding applications, the various publication sources and the limited personnel resources, a systematic preliminary examination of building applications is not possible. To help WWF looking through all applications, the team created a pipeline, easily usable, to identify building permits with potentially adverse impact on biodiversity, which thus require further investigation.
Students: Agustina La Greca, Lluı̀s Pastor Pérez, Phillip Trummer, Lilian
Bonnet
Mentor: Stephan Artmann
Every year, about one third of the total food produced for human consumption is wasted. The lack of adequate facilities where the produce can be stored post harvest plays a key role, especially in developing economies in Africa, Asia, and Latin America. A mobile application, called Coldtivate, has been developped by BASE in order to help local entrepreneurs offering cooling room services operate, as well as provide intelligence for farmers to take informed decisions about when and where to sell their produce. The task of the team was to create a pipeline to evaluate key metrics identified by BASE through the application data and produce a monthly report. As our solution, we provide a Python script capable of automatically producing monthly reports by executing a list of SQL queries.
Students: Ambarish Prakash, Fredrik Nestaas, Lucien Walewski, Shangen Li
Mentor: Nima Riahi
IMPACT Initiatives conduct household surveys in crisis regions, collecting data to inform and improve the local distribution of humanitarian aid. In inaccessible areas, the data collection is often done by third parties – for example, local NGOs with better access to the area – which ultimately requires plenty of human intervention to guarantee the accuracy and validity of the data. Should IMPACT Initiatives find a suspicious looking or unlikely survey response, they have no choice but to delete all corresponding entries in order to ensure correctness of their results. To improve the situation, the team has developed an end-to-end interpretable anomaly detection system that can be used to identify survey falsification across different projects.
Students: Andrew Zehr, Anna Theorin Johansson, Jacob Rothschild
Mentor: Vincent Bardenhagen
The Internal Displacement Monitoring Centre (IDMC) is the world’s definitive source of data and analysis on internal displacement. Among other sources and authorities, IDMC relies on media monitoring to gather information on internal displacement. For this purpose, media sources play a significant role, particularly when governments lack the capacity or will to collect data. The goal of this project is to help IDMC develop a better media monitoring tool, replacing the current black box approach with a transparent model which has functionalities that better reflect the needs of IDMC’s monitoring experts.
Students: Jin Zhang, Kirina van der Bijl, Lomàn Vezin, Shih-Chi Yang
Mentor: Marco Mancini
Accurate demand forecasting is crucial for humanitarian equipment suppliers, such as Médecins Sans Frontières (MSF) Supply, to optimally allocate their resources to save lives. This project shows that with two models, based on ARIMA and the Gaussian Process Regression, forecasting of demand data can be improved compared to the model currently used by MSF. Additionally, important insights to forecastability of items as well as patterns in the ordering behavior are provided.
Students: Kathrin Durizzo, Frithiof Ekström, Carlos Garcia Meixide, Jonathan Koch
Mentor: Yevgeniy Ilyin
Helvetas manages around 300 projects per year in 30 different countries. For each project, a standardized report is manually generated which is required to be interpreted (in terms of progress, indicators, results, summary, etc). A natural language processing algorithm was developed which not only translates the reports, extracts and consolidates the relevant information, but also classifies the projects and understands the main topics, challenges and trends being tackled, enabling Helvetas management team to better communicate their impact, allocate resources and plan for the upcoming year.
Students: Catalina Dragusin, David Simon Tetruashvili, Jackson Stanhope, Tom Haidinger
Mentor: Marco Mancini
Assessment of situational, demographic and infrastructural information in crisis regions is vital for organizing and executing the humanitarian response for local refugee communities. When key informants (KIs) in the region are difficult to reach and information becomes sparse, information reliability becomes paramount. In close collaboration with IMPACT initiatives, data collected throughout a study of KIs – local community leaders and experts from selected regions in Niger, Uganda, Afghanistan and Jordan – were analyzed with regard to each KI’s reliability. The data consisted of pairs of questions about social or infrastructural properties of their community and the KIs’ answers, respectively. After extensive data engineering, the driving features leading to high KI-reliability were investigated using tree-based regression models and explainable AI methods.
Students: Samyak Shah, Ayoung Song, Claus Wirnsperger, Feichi Lu
Mentor: Nima Riahi
In Switzerland, people hike a lot and in a normal year 20,000 people are injured and almost 40 incidents per year are fatal. That’s why Rega, the Swiss air rescue team, is constantly striving to make its rescue service faster and more reliable. Based on a patient’s location, the team developed a Lasso model, trained on previous missions, that predicts how long it would take each helicopter to reach the destination and then sends the fastest one.
Students: Colin Kälin, Rajiv Manichand, Elia Saquand, Hugues Sibille
Mentor: Stephan Artmann
This project aims to support the GIZ’s mission to increase adoption of sustainable farming practices in western Kenya. The GIZ provided multiple farming Datasets out of which the team extracted insights and value. The team has developed a modular pipeline for data preprocessing and model training, including enrichment with publicly available geographical data. Furthermore, analysis of the data was conducted when possible, and recommendations on data acquisition were formulated to alleviate the issues the team encountered in the future.
Students: Antoine Basseto, Oscar Pitcho, Nando Metzger, Afshan Anam Saeed
Mentor: Lionel Trébuchon
Tracking and tracing of the Nationally Determined policy implementations by countries is currently time-consuming and not systematically done, hampering accountability and sustainable progress. Therefore, GIZ requires a tool to quickly and automatically screen large amounts of documents (legal texts, speeches, tweets, …) for finding and contextualizing actual implementations of a nation’s commitments and policy priorities. The approach developed consists of a combination of search tools, including a deep analysis tool and a faster simple keyword matching method. This allows automating the cumbersome task of manually screening dozens of documents regardless of the traced policy.
Students: Emily Robitschek, Raphael Sgier, Jonathan Doorn, Paul Türtscher
Mentor: Fran Peric
This project presents a methodology for modeling crop yield variation in Eastern Zambia (EZ) based upon publicly accessible soil and weather data. For this purpose, the team employed survey data from the GIZ and COMACO to access socioeconomic data for EZ and land productivity for two value chain crops: groundnuts and soybeans. The team made data easily accessible with a dashboard.The final data pipeline allows a combination of various data sources to understand the distribution of soil properties, meteorological variables, or location-specific demographic data.
Students: Christoph Mony, Kaoru Schwarzenegger, Frederike Lübeck, Vincent Bardenhagen
Mentor: Nima Riahi
A big challenge for a fundraising organisation is the generational change of its donor base. The goal of the project was to enhance demographic understanding of the NGO’s donor base, in particular, to impute the missing birth years of 2/3 donors in the NGO’s database, which can be useful for understanding generational shifts across decades and to promote targeted fundraising campaigns. The team has developed a ready-for-use Python package for the NGO that integrates data integration, data cleaning and machine learning pipeline, facilitating accurate prediction of birth years of the donors.
Students: Rodrigo González, Stephanie Grimmel, Jinyan Tao, Cecilia Valenzuela
Mentor: Lionel Trebuchon
An efficient and effective distribution of humanitarian aid calls for an accurate assessment of the help needed by affected people in crisis regions. This assessment is heavily reliant on data collected via household surveys. Due to limited geographical accessibility of crisis regions and cultural barriers, IMPACT Initiatives often needs to rely on third parties to conduct the surveys, which gives rise to the problem of possible data falsification by the enumerator which can lead to long cleaning processes. The team proposed a solution that utilizes a supervised algorithm from the family of ensemble decision trees in order to learn the patterns of potentially falsified interviews.
Students: Siyuan Luo, Romina Jafaryanyazdi, Julie Keisler, Barbara Capl
Mentor: Renato Durrer
In recent years, the importance of social media platforms, such as Twitter, in knowledge transfer and information flow has strongly increased. The team explored Twitter as a data source for Internal Displacement Monitoring by implementing a machine learning pipeline to filter for relevant tweets and extract important information. 80% of tested tweets are labelled correctly by the classifiers, which gives confidence in its performance. Additionally, a custom trained name entity recognition algorithm (NER) was developed to enable extraction of the most important information from the tested tweets.
Students: Gokberk Ozsoy, Katharina Boersig, Michaela Wenner, Tabea Donauer
Mentor: Jean-Claude Ton
The team performed an analysis of past helicopter missions for the Swiss air-rescue Rega. The goal is to provide tools that analyze the timing of Rega’s rescue missions and to help with helicopter dispatching in future missions. The team predicted the helicopter flight times for future missions using flight times from past missions, in order to dispatch helicopters more effectively. The dispatching rules are based on expected mission times from the different Rega bases. The team made their analysis and predictive models available to Rega dispatchers within an interactive web application.
Students: Philip Jordan, Christoffer Raun, Xiaoyu Sun, Matus Zilinec
Mentor: Yevgeniy Ilyin
Real-Time Displacement Forecast for Natural Hazards: The team developed a complete framework of real-time displacement prediction caused by storm and flood events. It allows access to and combinations of various data sources reflecting hazard intensity, exposed population, vulnerability and the people displaced in the past, benefiting training a machine learning algorithm that allows the forecast of displaced people in future events.
Students: Janik Baumer, Vincent Bardenhagen, Daniel Benesch, Mian Zhong, Xiang Li, Gaël Perrin, Nathan Rouff, Anastasia Sycheva
Mentor: Lionel Trebuchon, Pelayo Choya
One approach to providing humanitarian aid is in the form of cash-based assistance. To enable efficient distribution of funds, IMPACT Initiatives conducts a monthly market research in Syria to compute the price of a Survival Minimum Expenditure Basket (SMEB), that “represents the minimum […] items required to support a 6-person household for a month”. These items vary by season and include food- and non-food items. In this work, the influence of external factors on the price of the SMEB in Syria were analyzed.
Students: Damian Durrer, Dominique Heyn, Heiko Kromer, Jonathan Mendieta
Mentor: Antonios Garas
Humanitarian aid in the form of cash-based assistance is becoming increasingly popular in regions of conflict. IMPACT Initiatives provides data-driven solutions to better inform humanitarian cash programming in Syria. This assistance relies on the calculation of a Survival Minimum Expenditure Basket (SMEB), corresponding to the minimum amount of cash necessary to purchase items required to support a 6-person household for one month in a specific geographical area. Price data for these items are acquired through informants in the regions of interest. In this project, the team investigated methods to impute missing price values.
Students: Pepa Arán Paredesa, Olivier Dietricha, Mariëlle van Kootena, and Pierre Wintera
Mentor: Mario Tomasello
Using the commodity price data collected by IMPACT Initiatives’ Market Monitoring exercise in Syria, the team developed higher-level indicators that characterize price changes over time. The resulting framework is both flexible in which facet of the data to analyze, in addition to being generalizable to similar data sets of other crisis-affected countries. The output is kept in a form that is easy to interpret so that it can clearly help inform humanitarian actions.
Students: Peshal Agarwal, Jelena Čuklina, Andrei Kolar, Anna Maddux
Mentor: Amit Gupta
It is becoming more and more of a common practice for aid organizations and NGOs to deliver cash-based assistance to war-town countries such as Syria. Meanwhile, it is critical to forecast the minimum cost of monthly survival (SMEB, ”survival minimum expenditure basket”) to ensure that the aid delivered is neither too high nor too low. The goal of this project was to develop a model to forecast the SMEB price. It turns out that the baseline model which simply uses the value from the previous month as a prediction for the upcoming month outperformed all other tested models. For a forecast of several months into the future, the team presents a variant of an ARIMA model.
Students: Aneesh Dahiya, Jaco Fuchs, Christoph Glanzer, Julia Ortheden
Mentor: Anastasia Pentina
Data collection and analysis is of great importance for humanitarian aid decisions. This is a challenging problem because obtaining a general yet clear picture of humanitarian needs in crises regions is labor intensive and costly. The team explored the idea of using a generative model to decrease survey time. They showed that using a Bayesian Network, one could pick questions during the interview to maximize the information gain. This approach could potentially save time for the first-hand data collection or alternatively allow collections of more data points.
Students: Martin Buttenschön, Natallie Baikevich, Luca Pedrelli, Georgios Papadimitriou
An important objective of humanitarian aid is to identify households in need. This is usually done by reaching out to people to answer a questionnaire detailing their living situation. Based on their answers, sectoral Index Indicators (PiN) are calculated. This is a time-consuming and labor-intensive process. The team analyzed a different approach aiming to predict the PiN on demographic variables. Furthermore, they provided a prototype of a visualization app to further help to identify people in need.
Students: Stephan Artmann, Viktoria de La Rochefoucauld, Nico Messikommer, Francesco Saltarelli
In order to assist decision-making in the humanitarian crisis in Nigeria, the team tried to identify undiscovered patterns on the Multi-Sectorial Needs Assessment dataset collected by REACH. First, they developed a random forests model that was able, to a low degree, to predict the overall level of need of a household. Second, they identified sets of co-occurring sectorial needs. Third, they showed that the current methodology doesn’t allow to accurately predict the reported needs of the households.
Students: Marco Mancini, Yilmazcan Ozyurt, Ylli Muhadri, Maria R. Cervera
The traditional approach of Multi-Sector Need Assessment structures questionnaires for assessing people in humanitarian crises into predefined sectors. The team found that about 80% of the information captured by a set of preselected questions can be recovered from only four latent factors. They showed that these factors exhibited a high linear dependence with some of the sectors, stressing the importance of these particular sectors. The exact semantics of these latent variables, however, may go well beyond the traditional sectors and is an open topic for future research. Thus, the team suggests that rethinking the traditional sector approach can lead to more concise data acquisition and analysis.
Students: Shirzart Enwer, Belinda Müller, Swaneet Sahoo