Fairness Aspects of Machine Learning in Official Statistics

Project Description

In this joint project with the German Federal Statistical Office (Destatis), we explore topics at the intersection of machine learning, official statistics, and algorithmic fairness. Research packages in this project focus on the reliable and subgroup-sensitive use of machine learning in official statistics and on investigating coverage and representation errors in various forms of (training) data and their downstream fairness implications. This work is motivated by the increasing use of new data forms that enable a cost-efficient and timely collection of detailed information, but also face strong selectivity problems with regard to inclusion propensities of different social groups. High quality microdata from official statistics may offer the possibility to check data from heterogeneous sources for coverage problems, e.g. by comparing socio-demographic distributions on fine-grained regional scales. On this basis, a systematic data auditing is to be carried out with which coverage problems can be identified and documented. This work meets the increasing significance and availability of new data sources and the call for standardized evaluation and documentation of the quality of training data from a Fair ML perspective.