Improving Inference from Non-Random Data for Social Science Research

Project Description

New types of data from digital traces and new forms of access to data from administrative processes open the possibility to observe individual and social behavior as well as change in behavior at high frequencies and in real time. The caveat with these new data is their (usually) unknown quality. At the same time, traditional survey data collection vehicles face rising costs, and many social science researchers are tempted to replace expensive probability-based surveys in favor of less expensive data collections. Those are often cheaper because they are collected from volunteer samples of unknown populations with unknown selection and unknown inclusion probabilities. Misrepresentation of societal groups in digital trace data and other alternative data sources can severely affect the utility of such data to both derive valid inference and accurate predictions for a given target population. The usefulness of alternative data sources thus depends on the effectiveness of bias mitigation techniques to correct for self-selection processes. This research project combines methodology from social science and computer science to account for misrepresentation in data and develops and compares pseudo-weighting and post-processing techniques to improve inference from various data sources.

Project Team

Name	Email
Kreuter, Frauke	soda@stat.uni-muenchen.de
Kern, Christoph	christoph.kern@stat.uni-muenchen.de
Haensch, Anna-Carolina	anna-carolina.haensch@stat.uni-muenchen.de
Beck, Jacob	jacob.beck@stat.uni-muenchen.de
Fischer Abaigar, Unai	Unai.FischerAbaigar@stat.uni-muenchen.de

Publications

Kern, C., Kim, M., Zhou, A. (2024). Multi-Accurate CATE is Robust to Unknown Covariate Shifts. Transactions on Machine Learning Research (TMLR). https://openreview.net/pdf?id=VOGlTb27ob
Kim, M. P., Kern, C., Goldwasser, S., Kreuter, F. and Reingold, O. (2022). Universal Adaptability: Target-Independent Inference that Competes with Propensity Scoring. Proceedings of the National Academy of Sciences of the United States of America (PNAS) 119(4). https://doi.org/10.1073/pnas.2108097119
Kern, C., Li, Y., and Wang, L. (2020). Boosted Kernel Weighting – Using Statistical Learning to Improve Inference From Nonprobability Samples. Journal of Survey Statistics and Methodology. https://doi.org/10.1093/jssam/smaa028

Improving Inference from Non-Random Data for Social Science Research

Project Description

Project Team

Publications

What are you looking for?