|
Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback |
|
|
|
Titel: |
Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback |
Auteur: |
Dahlgren Lindström, Adam Methnani, Leila Krause, Lea Ericson, Petter de Rituerto de Troya, Íñigo Martínez Coelho Mollo, Dimitri Dobbe, Roel |
Verschenen in: |
Ethics and information technology |
Paginering: |
Jaargang 27 () nr. 2 pagina's xx |
Jaar: |
2025-06-04 |
Inhoud: |
|
Uitgever: |
Springer Netherlands, Dordrecht |
Bronbestand: |
Elektronische Wetenschappelijke Tijdschriften |
|
|
|
|