SODA Lab @ DAGStat 2025: Statistics in Times of AI
Researchers from the SODA Lab at LMU Munich presented at the 7th Joint Statistical Meeting of the Deutsche Arbeitsgemeinschaft Statistik (DAGStat), held from March 24–28, 2025, at Humboldt-Universität zu Berlin. DAGStat is a collaborative network of scientific societies and professional associations committed to advancing statistical theory and methodology.
The 2025 conference, held under the theme “Statistics in Times of AI”, brought together over 1,000 statisticians and data scientists to explore innovations across disciplines.
Our researchers delivered talks and posters on a range of cutting-edge topics at the intersection of statistical methodology, machine learning, sustainability, and survey research:
- Ailin Liu presented a poster on "Examining Survey Mouse Movements as Indicators of Individual Cognitive Functioning (PDF, 1,314 KB)", which used deep learning and shape values to analyze how mouse movement data might reveal early signs of cognitive decline.
- Anna Steinberg gave a talk on "ClimXtract: An Open-Source Data Extraction Pipeline for Company-Level Greenhouse Gas Emissions (PDF, 1,354 KB)", showcasing a large language model-powered pipeline for extracting emission metrics from corporate reports.
- Lisa Bondo Andersen presented "Enhancing Survey Data Quality with Mouse Tracking and Random Forests for Ordinal Cognitive Assessments (PDF, 1,341 KB)", exploring how paradata like cursor paths and hovers can be used to detect survey difficulty and cognitive processes.
- Malte Schierholz introduced the initiative “Applied Data Analysis for the Public Sector”, which empowers government agencies through cloud-based, hands-on training in data science and AI applications.
- Laia Domenech Burin co-presented two contributions: one on “Digitizing German Land Use Plans Using Automated Extraction Pipelines (PDF, 49 KB)”, and another as co-author of the “ClimXtract” project.
- Jacob Beck presented "Addressing Data Gaps in Sustainability Reporting: A Benchmark Dataset for Greenhouse Gas Emission Extraction (PDF, 586 KB)", focusing on the creation of a gold-standard dataset to validate information extraction models.
- Cornelia Gruber, in collaboration with SODA colleagues, presented work on “Sources of Uncertainty in Supervised Machine Learning (PDF, 66 KB)” and chaired a session on neural networks and active learning. Her contributions included both conceptual and applied perspectives.
We thank all our contributors for representing the SODA lab and advancing the conversation around data, AI, and society.