深度强化学习在对话生成中的应用

需积分: 5 195 浏览量更新于2024-08-03 收藏 4.97MB PDF 举报

"本文档主要探讨了深度强化学习在对话生成中的应用，特别是在构建聊天机器人对话中的重要性。作者团队来自斯坦福大学和俄亥俄州立大学的计算机科学与工程系，他们提出了一种结合深度强化学习的方法，以提高对话的连贯性、信息性和回答的简易性。在传统的神经网络对话生成模型中，通常一次预测一个响应，而忽视了这些响应对未来对话走向的影响。这导致了生成的对话可能缺乏一致性或者仅仅是重复性的内容。为了克服这个问题，传统的自然语言处理（NLP）对话模型借鉴了强化学习的思想，以引导对话向更有意义的方向发展。在这篇论文中，作者展示了如何将深度强化学习应用于聊天机器人的对话建模。他们设计了一个模拟两个虚拟代理之间对话的模型，利用策略梯度方法来奖励那些展示出以下三个关键对话特性的序列：信息性（非重复的回合）、连贯性和易于回答（与前瞻性的功能相关）。信息性确保对话不只包含重复的信息，而是提供新的内容；连贯性则保证了对话的流畅和理解性；而易于回答则是衡量一个回答是否能轻易引导出下一个自然的对话回合。通过深度强化学习，模型可以学习到如何生成不仅当下合适，而且对后续对话有积极影响的响应。这种方法有可能推动聊天机器人的对话质量达到一个新的水平，生成更自然、更吸引人的会话。通过模拟和优化这些奖励属性，模型能够逐步改进其对话策略，使得对话更加丰富多彩，更具互动性。这篇论文为深度强化学习在对话生成领域的应用提供了新的视角，为创建更加智能、有吸引力的聊天机器人提供了理论基础和技术途径。通过整合强化学习的未来导向和神经网络的序列生成能力，有望解决当前对话系统中存在的问题，实现更加人性化的交互体验。"

Deep Reinforcement Learning for Dialogue Generation

Jiwei Li

, Will Monroe

, Alan Ritter

and Dan Jurafsky

Dept of Computer Science, Stanford University

Dept of Computer Science and Engineering, Ohio State University

{jiweil,wmonroe4,jurafsky}@stanford.edu, ritter.1492@osu.edu

Abstract

Recent neural models of dialogue generation

offer great promise for generating responses

for conversational agents, but tend to be short-

sighted, predicting utterances one at a time

while ignoring their inﬂuence on future out-

comes. Modeling the future direction of a di-

alogue is crucial to generating coherent, inter-

esting dialogues, a need which led traditional

NLP models of dialogue to draw on reinforce-

ment learning. In this paper, we show how to

integrate these goals, applying deep reinforce-

ment learning to model future reward in chat-

bot dialogue. The model simulates dialogues

between two virtual agents, using policy gradi-

ent methods to reward sequences that display

three useful conversational properties: infor-

mativity (non-repetitive turns), coherence, and

ease of answering (related to forward-looking

function). We evaluate our model on diversity,

length as well as with human judges, show-

ing that the proposed algorithm generates more

interactive responses and manages to foster a

more sustained conversation in dialogue sim-

ulation. This work marks a ﬁrst step towards

learning a neural conversational model based

on the long-term success of dialogues.

1 Introduction

Neural response generation (Li et al., 2015; Vinyals

and Le, 2015; Luan et al., 2016; Wen et al., 2015;

Shang et al., 2015; Yao et al., 2015; Xu et al., 2016;

Wen et al., 2016; Li et al., 2016) is of growing inter-

est. The LSTM sequence-to-sequence (SEQ2SEQ)

model (Sutskever et al., 2014) is one type of neural

generation model that maximizes the probability of

generating a response given the previous dialogue

turn. This approach enables the incorporation of rich

context when mapping between consecutive dialogue

turns (Sordoni et al., 2015) in a way not possible, for

example, with MT-based dialogue models (Ritter et

al., 2011).

Despite the success of SEQ2SEQ models in di-

alogue generation, two problems emerge: First,

SEQ2SEQ models are trained by predicting the next

dialogue turn in a given conversational context using

the maximum-likelihood estimation (MLE) objective

function. However, it is not clear how well MLE

approximates the real-world goal of chatbot develop-

ment: teaching a machine to converse with humans,

while providing interesting, diverse, and informative

feedback that keeps users engaged. One concrete

example is that SEQ2SEQ models tend to generate

highly generic responses such as“I don’t know” re-

gardless of the input (Sordoni et al., 2015; Serban

et al., 2015b; Serban et al., 2015c; Li et al., 2015).

This can be ascribed to the high frequency of generic

responses found in the training set and their compati-

bility with a diverse range of conversational contexts.

Apparently “I don’t know” is not a good action to

take, since it closes the conversation down.

Another common problem, illustrated in Table 1

(the example in the bottom left), is when the sys-

tem becomes stuck in an inﬁnite loop of repetitive

responses. This is due to MLE-based SEQ2SEQ mod-

els’ inability to account for repetition. In example

2, the dialogue falls into an inﬁnite loop after three

turns, with both agents generating dull, generic utter-

ances like i don’t know what you are talking about

and you don’t know what you are saying. Looking at

the entire conversation, utterance (2) i’m 16 turns out

to be a bad action to take. While it is an informative

and coherent response to utterance (1) asking about

age, it offers no way of continuing the conversation.

A similar rule is often suggested in improvisational comedy:

https://en.wikipedia.org/wiki/Yes,_and...

arXiv:1606.01541v1 [cs.CL] 5 Jun 2016

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_40191861_zj

粉丝: 83
资源: 1万+

深度强化学习在对话生成中的应用

Human-level control through deep reinforcement learning.pdf

Blind Bipedal Stair Traversal via sim-to-real Reinforcement Learning.pdf

Multi-Agent Reinforcement Learning.pdf

cn-reinforcement-learning-ebook-part1.pdf

grokking-deep-learning.pdf

Algorithm-Hierarchical-Meta-Reinforcement-Learning.zip

Algorithm-Deep-reinforcement-learning-with-pytorch.zip

Hands-On Reinforcement Learning - Sudharsan Ravichandiran.pdf

DDPG-Continuous Control with Deep Reinforcement Learning.pdf

P14-Part6-Reinforcement-Learning.zip

最新资源