Theses
We are always looking for motivated students who are interested in writing about a topic connected to our current research projects!
We are always looking for motivated students who are interested in writing about a topic connected to our current research projects!
To ensure effective supervision, please prepare a short exposé (2 pages) and submit it to the assigned supervisor. The exposé should outline your chosen thesis topic(s), your motivation, a tentative structure for your work, and your desired submission deadline. Provide details on your current familiarity with the topic at hand and any preliminary progress (i.e., what literature you have read so far) you have made. Please also include your current transcript of records.
Financial regulators and central banks are increasingly integrating sustainability aspects into their operations. The Corporate Sustainability Reporting Directive (CSRD) mandates that ~50000 European companies will have to publish sustainability reports in the future, a great source of data for statistical analysis.
One particular challenge is that companies communicate their sustainability information through unstructured PDF reports that contain both numerical and textual data. To make this information amenable to quantitative research, GIST applies Natural Language Processing (NLP) and Large Language Models (LLMs) for data extraction.
Possible tasks include:
If you are interested, please contact malte.schierholz@stat.uni-muenchen.de.
Are you interested in how AI can enhance democratic processes? This thesis offers a unique opportunity to explore whether and how Large Language Models (LLMs) can make gathering public opinion more actionable for policymakers without misrepresenting subgroup views.
The project builds on publicly accessible data from the EU’s policy feedback platform to fine-tune LLMs on current issues, and quantitatively assess how well they represent public discourse overall and for population subgroups. You will also explore privacy considerations and the user experience of policymakers working with such a tool.
This project builds on the chair's expertise in trustworthy ML, LLM alignment and LLM-assisted surveys. We seek motivated students with an interest in public policy and qualitative data analysis. Prior experience working with LLMs through APIs is helpful, but can be developed during the project.
Interested? Then reach out with your CV/transcript to mail@marcusnovotny.com & christoph.kern@stat.uni-muenchen.de for a first meeting. Looking forward to hearing from you!
When should government agencies invest in better prediction models versus simply expanding their capacity to screen more people? This fundamental question affects public spending on algorithmic systems for unemployment assistance, poverty targeting, child welfare, and healthcare. Our recent paper, "The Value of Prediction in Identifying the Worst-Off" (ICML 2025), develops a theoretical framework for answering this question through the Prediction-Access Ratio (PAR): a metric that quantifies the relative value of improving prediction accuracy versus expanding screening capacity. The paper demonstrates through mathematical analysis and real-world case studies that conventional wisdom often gets this tradeoff wrong, capacity expansion is frequently more cost-effective than incremental prediction improvements.
We are now seeking a talented master's student to translate this theoretical framework into a practical toolkit that data scientists and policy analysts can use.
Project Goals
You will design and implement an open-source Python package that enables practitioners to: 1) Explore tradeoffs between prediction improvements and alternative policy levers, such as expanding capacity or improving treatments, for their specific problem context and data 2) Simulate interventions such as collecting more data, and adding features, scaling residuals 3) Conduct cost-benefit analyses incorporating realistic cost structures (fixed vs. recurring, amortization)
If interested, please send CV, transcript, and a brief statement (max 1 page) explaining your motivation and relevant experience to Unai.FischerAbaigar@stat.uni-muenchen.de
This thesis will explore the potential of Data Science for assessing the sustainability of land management and urban planning. The project deals with the implementation of regional or national guidelines for climate change adaptation and climate protection in urban planning. The central question is how Data Science can be used to evaluate political measures and administrative actions. The tasks are:
Dataset: Planning documents, which contain detailed information on the environmental condition, environmental risks, and necessary measures at the building level, as well as precise data on building type, height, and building density. Dataset of regional plans that outline requirements for urban planning. Additionally, there is the possibility to access flood maps, climate risk maps, etc., as well as court cases and lawsuits related to the plans. Federal states: NRW, Bavaria, and the Rhine-Main-Neckar region.
A specific research question can be developed in collaboration with the research team. The work requires an independent approach, an interest in interdisciplinary work, basic knowledge of sustainability and climate change topics, and good knowledge of German. There is the opportunity to offer a student assistant position as part of the Master's thesis. If interested, please send an email with a CV and a short cover letter to felicitas.sommer@tum.de and bolei.ma@lmu.de.
Current benchmarks for evaluating values, opinions and behaviors in LLMs are static, US-centric, and lack in generalization and representativeness. At the same time, social scientists have built high-quality data infrastructures to accurately measure attitudes and values across populations and subgroups. This project seeks to utilize social surveys to illustrate how these data sources can be used for LLM evaluations. Building on recent efforts such as folktables and folktexts, the goal is to build a data processing pipeline and interface that allows AI researchers to access data distributions from selected social surveys for model evaluation and alignment.
For this thesis project, we are seeking motivated students with strong programming skills and interest in model evaluations and social data science. If interested, please email your CV/transcript to christoph.kern@stat.uni-muenchen.de.
The paper Valid Survey Simulations with Limited Human Data: The Roles of Prompting, Fine-Tuning, and Rectification (Krsteski et al., 2025) provides a strong foundation for a Bachelor’s or Master’s thesis that both replicates and extends its core approach. The thesis could reproduce the main experiments comparing prompting, fine-tuning, and rectification for large language model–based survey simulations, while focusing on two extensions: (a) applying the framework to a non-US, non-English context such as German-language survey data (e.g., from ALLBUS or SOEP) to examine linguistic and cultural transferability, and (b) advancing the rectification method by introducing subgroup-specific or adaptively weighted correction terms (e.g., by gender, education, or income). This would allow the thesis to test the robustness of the original findings across different contexts and evaluate whether more granular rectification schemes can better mitigate systematic model bias in synthetic survey data.
Due to Caro Haensch’s availability, supervision of this thesis topic will be possible again from April 2026 onwards. If you are interested, please get in touch in mid-April 2026.
Training machine learning (ML) models relies on annotated (or also called labeled) training data. Large Language Models (LLMs) offer great potential for data annotation. However, human annotations are likely still needed for difficult or ambiguous annotations. A reasonable collaboration setup of LLM and human annotators could assign the easier instances to the LLM and the more complicated ones to humans. This approach could reduce annotation costs and let the human annotators focus on the more ambiguous cases. Best practices for allocating annotation tasks between humans and LLM, in particular for subjective tasks, are yet to be developed. In this thesis you could develop and test algorithms for allocating tasks between the two annotators and study their impact on quality and cost. Indicators to route an instance to the human (expert ?) annotator could for example be a self-assessment of the LLMs certainty. If interested, please email your CV and a brief explanation of interest in the topic to jacob.beck@stat.uni-muenchen.de and CC soda@stat.uni-muenchen.de.
We are seeking a motivated Master's student to embark on a methodological thesis project aimed at extending the scope of substantive-model compatible (SMC)-FCS multiple imputation (MI) techniques in discrete-time survival analysis (DTSA) to accommodate time-varying variables. Building on our existing work, which has successfully extended SMC-FCS MI for time-invariant covariates, this project will tackle the additional complexities introduced by time-varying variables. The successful candidate will conduct comprehensive Monte Carlo simulations to evaluate the extended methodology, and contribute to refining the practice of discrete-time survival analysis in the presence of missing data. If you are interested, please contact anna-carolina.haensch@stat.uni-muenchen.de.
Due to Caro Haensch’s availability, supervision of this thesis topic will be possible again from April 2026 onwards. If you are interested, please get in touch in mid-April 2026.
The exponential growth of scientific literature makes comprehensive literature reviews increasingly challenging for individual researchers, and slows the turnover of core ideas (Pan et al. 2018). We aim to address this issue by developing a transparent and reusable LLM pipeline to automatically summarize empirical evidence across fields. The envisioned system will parse full paper texts to systematically extract variables and causal links, and summarize them in a graphical user interface. The contribution lies in both an exemplary review of a field of your choice, and the software artefact for reuse.
This project offers an exciting master thesis opportunity on the science of science. A gold standard dataset of ~150 manually coded studies is available for training purposes, and supervisors have deep expertise in the topic. You will gain practical experience in both literature analysis and scientific software development. Ideal candidates have a background in statistics and/or computer science, and are motivated to deepen their experience in developing effective & reproducible prompting strategies.
Interested? Then reach out with your CV/transcript to marcus.novotny@stat.uni-muenchen.de & christoph.kern@stat.uni-muenchen.de for a first meeting. Looking forward to hearing from you!
Public agencies are increasingly automating the allocation of scarce public resources by making use of risk prediction models. While a wide range of studies focuses on bias in the application of such models, the long-term fairness implications of algorithmically assisted decisions are not fully understood. Building on the emerging literature of dynamic fairness, this project aims at studying feedback loops and the long-term consequences of algorithmic decision-making in social contexts. If you are interested, please contact christoph.kern@stat.uni-muenchen.de and cc anna-carolina.haensch@stat.uni-muenchen.de.
Research in algorithmic fairness and responsible AI has proposed numerous technical definitions of model fairness and associated metrics (e.g. Mitchell et al. 2022). These metrics typically encode different normative perspectives and are often in conflict with each other – i.e., the same model may not be able to comply with all metrics at the same time. These incompatibilities raise critical questions regarding which fairness concepts should be prioritized in a given application context.
While prior work, such as Makhlouf et al. (2022), provide guidelines in the form of “Fairness Trees” to help navigate the various proposed metrics, this project aims to use participatory approaches to understand how public stakeholders would choose between different fairness concepts and metrics in practice. We therefore seek students interested in exploring the connections between responsible AI, fairness perceptions and participatory approaches. If you are interested, please contact christoph.kern@stat.uni-muenchen.de and cc anna-carolina.haensch@stat.uni-muenchen.de.
The Federal Statistical Office of Germany (Destatis) offers a wide range of topics for scientific theses on Bachelor and Master level. Please first check directly with Destatis whether the desired topic is still available. Afterwards, submit a short exposé (1–2 pages) on the topic, your CV, and a transcript of records to us (soda@stat.uni-muenchen.de) so that we can try to arrange university supervision through us.