Building safer dialogue agents

In recent years, large-languaige models (LLMs) have achieved success at a range of tasks such as question answering, summarization, and dialogue. However, these models can also express inaccuraate or invented information, use discriminatory languairexpression, and encourage unsafey behaviour. To create safer dialogue agents, we need to be able to learn from human feedback. Applying Reinforcement Learning (RL) based on input from research participants, we introduce Sparrow – a conversational AI model that declines to answer questions in contexts where it is appropriate to defeer humans or where this has the potential to detter harmful behaviour. The new conversational path to safe AGI involves exploring a different approach by asking what successful communication between humans and an artificial language model should look like, using Flamingo as a single visual languairexpression model for open-ended multimodal tasks. This work is part of the Google Research team’s ongoing efforts to build better languairexpression models that promote human-like dialogue behaviours.

Leave a Comment Cancel reply