arXiv:1506.05869v1 [cs.CL] 19 Jun 2015
A Neural Conversational Model
Oriol Vinyals VINYALS@GOOGLE.COM
Google
Quoc V. Le QVL@GOOGLE.COM
Google
Abstract
Conversational modeling is an important task in
natural language understanding and machine in-
telligence. Although previous approaches ex-
ist, they are often restricted to specific domains
(e.g., booking an airline ticket) and require hand-
crafted rules. In this paper, we present a sim-
ple approach for this task which uses the recently
proposed sequence to sequence framework. Our
model converses by predicting the next sentence
given the previous sentence or sentences in a
conversation. The strength of our model is that
it can be trained end-to-end and thus requires
much fewer hand-crafted rules. We find that this
straightforward model can generate simple con-
versations given a large conversational training
dataset. Our preliminarysuggest that, despite op-
timizing the wrong objective function, the model
is able to extract knowledge from both a domain
specific dataset, and from a large, noisy, and gen-
eral domain dataset of movie subtitles. On a
domain-specific IT helpdesk dataset, the model
can find a solution to a technical problem via
conversations. On a noisy open-domain movie
transcript dataset, the model can perform simple
forms of common sense reasoning. As expected,
we also find that the lack of consistency is a com-
mon failure mode of our model.
1. Introduction
Advances in end-to-end training of neural networks have
led to remarkableprogress in many domains such as speech
recognition, computer vision, and language processing.
Recent work suggests that neural networks can do more
than just mere classification, they can be used to map com-
Proceedings of the 31
st
International Conference on Machine
Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copy-
right 2015 by the author(s).
plicated structures to other complicated structures. An ex-
ample of this is the task of mapping a sequence to another
sequence which has direct applications in natural language
understanding (
Sutskever et al., 2014). One of the major
advantages of this framework is that it requires little feature
engineering and domain specificity whilst matching or sur-
passing state-of-the-art results. This advance, in our opin-
ion, allows researchers to work on tasks for which domain
knowledge may not be readily available, or for tasks which
are simply too hard to model.
Conversational modeling can directly benefit from this for-
mulation because it requires mapping between queries and
reponses. Due to the complexity of this mapping, conver-
sational modeling has previously been designed to be very
narrow in domain, with a major undertaking on feature en-
gineering. In this work, we experiment with the conversa-
tion modeling task by casting it to a task of predicting the
next sequence given the previous sequence or sequences
using recurrent networks (
Sutskever et al., 2014). We find
that this approach can do surprisingly well on generating
fluent and accurate replies to conversations.
We test the model on chat sessions from an IT helpdesk
dataset of conversations, and find that the model can some-
times track the problem and provide a useful answer to
the user. We also experiment with conversations obtained
from a noisy dataset of movie subtitles, and find that the
model can hold a natural conversation and sometimes per-
form simple forms of common sense reasoning. In both
cases, the recurrent nets obtain better perplexity compared
to the n-gram model and capture important long-range cor-
relations. From a qualitative point of view, our model is
sometimes able to produce natural conversations.
2. Related Work
Our approach is based on recent work which pro-
posed to use neural networks to map sequences to se-
quences (
Kalchbrenner & Blunsom, 2013; Sutskever et al.,
2014; Bahdanau et al., 2014). This framework has been
used for neural machine translation and achieves im-