Aligning Generative AI Models to Human Values

Project Description

Generative AI models, such as ChatGPT, Stable Diffusion, and Llama, are currently in widespread use, with their potential for impactful applications yet to be fully realized. To harness the benefits of future AI advancements, ensuring the consistent alignment of these systems with human values is imperative. Next to identifying the human values we aim to manifest in these models, the technical challenge lies in finding reliable methods for models to consistently adhere to these specified values. In this project, we study the models’ internal representations of moral values and develop methods to reinforce those values to see them realized in the generated output.

Contact Person

Sarah Ball

Send an email

More

Project Team

Name	Email
Ball, Sarah	sarah.ball@stat.uni-muenchen.de
Kreuter, Frauke	soda@stat.uni-muenchen.de

Publications

Kaufmann, T., Ball, S., Beck, J., Hüllermeier, E., & Kreuter, F. (2023). On the Challenges and Practices of Reinforcement Learning from Real Human Feedback. In ECML PKDD 2023 Workshop Towards Hybrid Human-Machine Learning and Decision Making. Link

Aligning Generative AI Models to Human Values

Project Description

Contact Person

Project Team

Publications

What are you looking for?