Aligning Generative AI Models to Human Values

Project Description

Generative AI models, such as ChatGPT, Stable Diffusion, and Llama, are currently in widespread use, with their potential for impactful applications yet to be fully realized. To harness the benefits of future AI advancements, ensuring the consistent alignment of these systems with human values is imperative. Next to identifying the human values we aim to manifest in these models, the technical challenge lies in finding reliable methods for models to consistently adhere to these specified values. In this project, we study the models’ internal representations of moral values and develop methods to reinforce those values to see them realized in the generated output.

Contact Person

Sarah Ball

Publications

  • Kaufmann, T., Ball, S., Beck, J., Hüllermeier, E., & Kreuter, F. (2023). On the Challenges and Practices of Reinforcement Learning from Real Human Feedback. In ECML PKDD 2023 Workshop Towards Hybrid Human-Machine Learning and Decision Making. Link