Project Description
This line of research is dedicated to understanding what drives and impacts the process of human annotation of Machine Learning (ML) training data. Building on a variety of literature from survey methodology, social psychology and computer sciences we conduct experimental research in order to detect sources of bias in the annotation process. Our studies indicate that annotation is sensitive to slight changes in the annotation task design, the task order and certain annotator demographics.
Publications
- Eckman, S., Ma, B., Kern, C., Chew, R., Plank, B., & Kreuter, F. (2025): Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR). arXiv:2501.06826
- Beck, J., Eckman, S., Ma, B., Chew, R., Kreuter F (2024). Order Effects in Annotation Tasks: Further Evidence of Annotation Sensitivity. EACL Workshop UncertaiNLP. https://aclanthology.org/2024.uncertainlp-1.8/
- Eckman, S., Plank, B., & Kreuter, F. (2024). Position: Insights from Survey Methodology can Improve Training Data. ICML. https://proceedings.mlr.press/v235/eckman24a.html
- Kern, C., Eckman, S., Beck, J., Chew, R., Ma, B., & Kreuter, F. (2023). Annotation sensitivity: Training data collection methods affect model performance. EMNLP Findings. https://aclanthology.org/2023.findings-emnlp.992/
- Beck, J. (2023). Quality aspects of annotated data: A research synthesis. AStA Wirtschafts-und Sozialstatistisches Archiv, 1-23. https://link.springer.com/article/10.1007/s11943-023-00332-y
- Beck, J., Eckman, S., Chew, R., & Kreuter, F. (2022). Improving Labeling Through Social Science Insights: Results and Research Agenda. In International Conference on Human-Computer Interaction (pp. 245-261). Cham: Springer Nature Switzerland. https://link.springer.com/chapter/10.1007/978-3-031-21707-4_19