Theses
We are always looking for motivated students who are interested in writing about a topic connected to our current research projects!
We are always looking for motivated students who are interested in writing about a topic connected to our current research projects!
To ensure effective supervision, please prepare a short exposé (2–4 pages) and submit it to the assigned supervisor. The exposé should outline your chosen thesis topic(s), your motivation, a tentative structure for your work, and your desired submission deadline. Include information about your current transcript of records. Provide details on your current familiarity with the topic at hand and any preliminary progress (i.e., what literature you have read so far) you have made.
Bachelor's (or Master's) Thesis Topic: Developing Teaching Materials on Data Literacy for Church Leadership
This thesis focuses on creating and adapting teaching materials on data literacy and evidence-based decision-making specifically tailored for church leadership. Drawing from established resources such as the Data Literacy and Evidence Building book, the project aims to design practical, accessible materials that equip leaders in church contexts to make informed, data-driven decisions in increasingly complex environments. The project will involve reviewing existing materials, and adapting them to the specific needs of church leadership under the guidance of Prof. Ongono and Prof. Wollbold from the Faculty of Catholic Theology and the guidance of Dr. Haensch from the Statistics Institute. It will also involve creating synthetic data as well as a selection of simple R or Python notebooks for training under the guidance of Dr. Haensch from the Statistics Institute. This work will contribute to strengthening leadership training by providing a foundation for data-informed strategies while respecting the values and unique challenges of church organizations.
If you are interested, please contact anna-carolina.haensch@stat.uni-muenchen.de.
Exploring Public Discourse on TikTok: Analyzing Social Movements, Election Campaigns and more (Bachelor and Master Theses)
TikTok has become a central platform for public discourse, shaping political campaigns, social movements, and cultural trends. In this thesis, you will analyze TikTok data to understand how content spreads, influences public opinion, or mobilizes communities. The thesis can focus on content analysis (e.g., text and video data), interaction patterns (e.g., likes, comments, shares), or sentiment analysis to understand audience engagement.
Please note that accessing TikTok data requires submitting a data access application, which should be filed one month before the planned start of your thesis.
If interested, please email anna-carolina.haensch@stat.uni-muenchen.de.
Strategies for reducing annotation costs by implementing an LLM annotator (Master Thesis)
Training machine learning (ML) models relies on annotated (or also called labeled) training data. Large Language Models (LLMs) offer great potential for data annotation. However, human annotations are likely still needed for difficult or ambiguous annotations. A reasonable collaboration setup of LLM and human annotators could assign the easier instances to the LLM and the more complicated ones to humans. This approach could reduce annotation costs and let the human annotators focus on the more ambiguous cases. Best practices for allocating annotation tasks between humans and LLM, in particular for subjective tasks, are yet to be developed. In this thesis you could develop and test algorithms for allocating tasks between the two annotators and study their impact on quality and cost. Indicators to route an instance to the human (expert ?) annotator could for example be a self-assessment of the LLMs certainty. If interested, please email your CV and a brief explanation of interest in the topic to jacob.beck@stat.uni-muenchen.de and CC soda@stat.uni-muenchen.de.
Task Structure Effects in LLM Annotations
The way in which an annotation task is structured affects the annotations that human annotators provide, a result called annotation sensitivity. For example, the order in which annotations are collected, and the number of screens, can change whether tweets are annotated as containing hate speech or offensive language (https://dl.acm.org/doi/10.1007/978-3-031-21707-4_19; https://aclanthology.org/2024.uncertainlp-1.8/). With the growing use of LLMs as annotators, we wonder whether LLMs also show annotation sensitivity. Since LLMs are built on data produced by humans, it might be that the models inherit similar biases. In this thesis, you could replicate findings from the above studies with LLM annotators. If interested, please email your CV and a brief explanation of interest in the topic to jacob.beck@stat.uni-muenchen.de and CC soda@stat.uni-muenchen.de.
Masterarbeit zu Recht, evidenzbasierter Politik, Nachhaltige Raumplanung und Data Science
In der Masterarbeit sollen Möglichkeiten von Data Science für die Bewertung der Nachhaltigkeit von Flächenmanagement und Stadtplanung ausgelotet werden. Das Projekt beschäftigt sich mit Umsetzung von regionalen oder nationalen Vorgaben zur Klimafolgenanpassung und Klimaschutz in der Stadtplanung. Die zentrale Frage ist, wie Data Science genutzt werden kann, um politische Maßnahmen und Verwaltungshandeln zu bewerten. Die Aufgaben sind:
Bestand: Datensatz von Bauleitplänen, welche u.a. detaillierte Angaben auf Gebäudeebene zu Umweltzustand, Umweltrisiken und notwendigen Maßnahmen enthalten und genaue Angaben zur Gebäudeart, Höhe und Bebauungsdichte enthalten. Datensatz von Regionalplänen, in denen Vorgaben für Bauleitplanung gemacht werden. Zusätzlich Möglichkeit Hochwasserkarten, Klimarisikokarten u.ä. zu beziehen, ebenso Gerichtsprozesse und Klagen im Bezug auf die Pläne. Bundesländer: NRW, Bayern und Region Rhein-Main-Neckar.
Mit dem Forschungsteam kann eine eigene Fragestellung entwickelt werden. Die Arbeit erfordert selbstständige Arbeitsweise, Interesse an interdisziplinärem Arbeiten, erste Kenntnisse an den Themen Nachhaltigkeit und Klimawandel und gute Deutschkenntnisse. Es besteht die Möglichkeit, im Rahmen der Masterarbeit eine Stelle als studentische Hilfskraft anzubieten. Bei Interesse bitte eine E-Mail mit einem CV und einem kurzen Anschreiben an felicitas.sommer@tum.de und bolei.ma@lmu.de senden.
GIST: Greenhouse Gas Insights and Sustainability Tracking (Bachelor and Master Theses, you will work with Python)
Financial regulators and central banks are increasingly integrating sustainability aspects into their operations. The Corporate Sustainability Reporting Directive (CSRD) mandates that ~50000 European companies will have to publish sustainability reports in the future, a great source of data for statistical analysis.
One particular challenge is that companies communicate their sustainability information through unstructured PDF reports that contain both numerical and textual data. To make this information amenable to quantitative research, GIST applies Natural Language Processing (NLP) and Large Language Models (LLMs) for data extraction.
Possible tasks include:
If you are interested, please contact malte.schierholz@stat.uni-muenchen.de.
Harnessing Machine Learning for Early Detection of Cognitive Impairment
Mild cognitive impairment (MCI), affecting over 15% of adults aged 50 and above, often progresses to dementia, underscoring the importance of early detection. This thesis project focuses on developing innovative, machine learning-based diagnostics for MCI using non-invasive data collection methods. Unlike traditional approaches that rely on extensive neuropsychological testing unsuitable for widespread screening, this project proposes the use of machine learning algorithms to analyze computer use behaviors, particularly mouse movement data.
Participants in a large Internet panel, engaging with surveys on various digital devices, will have their mouse movements recorded. This data will serve as the foundation for developing algorithms capable of predicting levels of cognitive functioning and identifying early signs of MCI based on how participants interact with standardized tasks and questionnaires. This project presents a unique opportunity for students to contribute to critical advancements in medical diagnostics, offering a cost-efficient, automated, and unobtrusive method to potentially delay the onset of severe dementia symptoms. If interested, please contact felix.henninger@stat.uni-muenchen.de and CC soda@stat.uni-muenchen.de.
AutoML for Fairness by Abstaining
Automated Machine Learning and Hyperparameter optimization techniques can be used to tune fairness-aware machine learning models that trade off predictive accuracy and a fairness measure (for example, equality of opportunity). However, a recent study challenges the assumption that there is a fairness-accuracy tradeoff, and suggests that only a few noisy samples per dataset are responsible for the perceived unfairness. When learning to abstain from such noisy samples using a bagging-based classifier, the study claims that standard models already produce fair predictions. However, this comes with the cost of having several data points not classified, and ideally, one would like to minimize the number of data points the classifier abstains from. The goal of this thesis is to find out if we can:
Alternatively, one can try to reproduce findings from other studies, such as the ones from Perrone et al. or Cruz and Hardt in light of these findings. If you are interested, please contact christoph.kern@stat.uni-muenchen.de, matthias.feurer@stat.uni-muenchen.de and cc anna-carolina.haensch@stat.uni-muenchen.de.
Multiple Imputation of Partially Observed Covariates in Discrete-Time Survival Analysis (Master Thesis)
We are seeking a motivated Master's student to embark on a methodological thesis project aimed at extending the scope of substantive-model compatible (SMC)-FCS multiple imputation (MI) techniques in discrete-time survival analysis (DTSA) to accommodate time-varying variables. Building on our existing work, which has successfully extended SMC-FCS MI for time-invariant covariates, this project will tackle the additional complexities introduced by time-varying variables. The successful candidate will conduct comprehensive Monte Carlo simulations to evaluate the extended methodology, and contribute to refining the practice of discrete-time survival analysis in the presence of missing data. If you are interested, please contact anna-carolina.haensch@stat.uni-muenchen.de.
Implement MICE in Python (Master Thesis)
Are you a student with medium to strong Python skills looking to enhance your expertise? Consider implementing Multiple Imputation by Chained Equations (MICE) in Python for your master thesis. MICE is a statistical method for handling missing data, critical for reliable data analysis. This project offers the opportunity to gain hands-on experience with advanced data imputation techniques. Prior knowledge of missing data techniques is advantageous but not required; you will have the chance to learn on the job. By participating, you will produce a master thesis with significant practical applications, positioning yourself strongly in the data science field. Please contact anna-carolina.haensch@stat.uni-muenchen.de if you are interested.
Cross-Cultural Examination of Algorithmic Fidelity: Comparing LLMs and Survey Results (Master Thesis)
In this thesis project, you'll extend the current research on "algorithmic fidelity" (Argyle et al 2023) in large language models to a new socio-cultural context. Choose a country and an election or a unique survey topic, and compare the model's output to actual survey results.
Your tasks will include:
This project presents an opportunity to make significant contributions to a novel intersection of AI and social science, providing valuable insights into language models.
Please contact anna-carolina.haensch@stat.uni-muenchen.de if you are interested.
Dynamic Fairness and Algorithmic Decision-Making (Master Thesis)
Public agencies are increasingly automating the allocation of scarce public resources by making use of risk prediction models. While a wide range of studies focuses on bias in the application of such models, the long-term fairness implications of algorithmically assisted decisions are not fully understood. Building on the emerging literature of dynamic fairness, this project aims at studying feedback loops and the long-term consequences of algorithmic decision-making in social contexts. If you are interested, please contact christoph.kern@stat.uni-muenchen.de and cc anna-carolina.haensch@stat.uni-muenchen.de.
Policy Learning for Fair and Effective Interventions (Master Thesis)
ML methods are increasingly used in combination with ideas from the causal inference literature to explore heterogeneous treatment effects. Such approaches are useful, for example, for personalizing treatments in medicine or for selecting optimal treatment regimes in the delivery of welfare state measures. While topics such as explainability and transparency have already been studied in the past (see, e.g. policy trees), the connection of the causal learning literature to the fairML literature is still weak. However, it is well known that there are many biases present in data used for developing personalized treatments in medicine or in access to welfare state measures. Therefore, we seek students interested in exploring the connection between causal learning and fairML. If you are interested, please contact christoph.kern@stat.uni-muenchen.de, r.bach@uni-mannheim.de, and cc anna-carolina.haensch@stat.uni-muenchen.de.
We also welcome a thesis topic of your own! Please do not hesitate to contact us.