Machine Learning Approaches to Latent Variable Modeling

Project Description

Multi-item batteries are frequently used in social scientific surveys to examine latent traits and to scale individuals on a single construct. Individual latent variable scores based on observed responses to items are used for psychopathological diagnoses as well as assessment of abilities and personality in occupations and education. If these latent traits are to be meaningfully used for substantive analyses, one must assume measurement invariance. However, especially in the context of large scale surveys, the measurement invariance assumption rarely holds because of the heterogeneous nature of the survey samples. Measurement non-invariance is also referred to as differential item functioning (DIF). By using data-driven, algorithmic approaches, it is possible to detect subgroups with DIF when little theoretical guidance on the relevant subgroups is available. We propose and compare model-based recursive partitioning (MOB) techniques for detecting DIF with a focus on measurement models with multiple latent variables. Such models may be referred to as multidimensional graded response (MGR) models. Additionally, we propose a method we call latent variable forest (LV Forest) for estimating unbiased latent variable scores. LV Forest can be used for latent variable score estimation, especially if the assumed latent variable model does not fit the data and/or includes parameter estimates that are unstable with respect to construct irrelevant covariates. Furthermore, we propose a method to efficiently estimate parameter instability in MGR models and to drastically reduce computation time of MOB for MGR models.

Contact Person

Prof. Dr. Christoph Kern


  • Classe, F. L., & Steyer, R. (2023). A probit multistate IRT model with latent item effect variables for graded responses. European Journal of Psychological Assessment. Advance online publication.