Annotated Data Quality

Project Description

This line of research is dedicated to understanding what drives and impacts the process of human annotation of Machine Learning (ML) training data. Building on a variety of literature from survey methodology, social psychology and computer sciences we conduct experimental research in order to detect sources of bias in the annotation process. Our studies indicate that annotation is sensitive to slight changes in the annotation task design, the task order and certain annotator demographics.

Publications

  • Beck, J., Eckman, S., Chew, R., & Kreuter, F. (2022, June). Improving Labeling Through Social Science Insights: Results and Research Agenda. In International Conference on Human-Computer Interaction (pp. 245-261). Cham: Springer Nature Switzerland. Link
  • Kern, C., Eckman, S., Beck, J., Chew, R., Ma, B., & Kreuter, F. (2023). Annotation Sensitivity: Training Data Collection Methods Affect Model Performance. Findings of the Association for Computational Linguistics: EMNLP 2023. Link
  • Beck, J. (2023). Quality aspects of annotated data: A research synthesis. AStA Wirtschafts-und Sozialstatistisches Archiv, 1-23. Link
  • Beck, J., Eckman, S., Ma, B., Chew, R., & Kreuter, F. (2024). Order Effects in Annotation Tasks: Further Evidence of Annotation Sensitivity. Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024) (pp. 81–86). Association for Computational Linguistics (ACL). Link