W find action movies this weekend
↓ ↓ ↓ ↓ ↓
S O B-genre O B-date I-date
I find movie
Figure 2: An example utterance with annotations
of semantic slots in IOB format (S) and intent (I),
B-date and I-date denote the date slot.
2 Proposed Framework
The proposed framework
2
is illustrated in Fig-
ure 1. It includes a user simulator (left part) and
a neural dialogue system (right part). In the user
simulator, an agenda-based user modeling compo-
nent based at the dialogue act level is applied to
control the conversation exchange conditioned on
the generated user goal, to ensure the user behaves
in a consistent, goal-oriented manner. An NLG
module is used to generate natural language texts
corresponding to the user dialogue actions. In a
neural dialogue system, an input sentence (recog-
nized utterance or text input) passes through an LU
module and becomes a corresponding semantic
frame, and an DM, which includes a state tracker
and policy learner, is to accumulate the semantics
from each utterance, robustly track the dialogue
states during the conversation, and generate the
next system action.
2.1 Neural Dialogue System
Language Understanding (LU): A major task
of LU is to automatically classify the domain of a
user query along with domain specific intents and
fill in a set of slots to form a semantic frame. The
popular IOB (in-out-begin) format is used for rep-
resenting the slot tags, as shown in Figure 2.
~x = w
1
, ..., w
n
, <EOS>
~y = s
1
, ..., s
n
, i
m
where ~x is the input word sequence and ~y contains
the associated slots, s
k
, and the sentence-level in-
tent i
m
. The LU component is implemented with
a single LSTM, which performs intent prediction
and slot filling simultaneously (Hakkani-T
¨
ur et al.,
2016; Chen et al., 2016):
~y = LSTM(~x) . (1)
The LU objective is to maximize the conditional
probability of the slots and the intent ~y given the
2
The source code is available at: https://github.
com/MiuLab/TC-Bot
word sequence ~x:
p(~y | ~x) =
n
Y
i
p(s
i
| w
1
, . . . , w
i
)
!
p(i
m
| ~y).
The weights of the LSTM model are trained us-
ing backpropagation to maximize the conditional
likelihood of the training set labels. The predicted
tag set is a concatenated set of IOB-format slot
tags and intent tags; therefore, this model can be
trained using all available dialogue actions and ut-
terance pairs in our labeled dataset in a supervised
manner.
Dialogue Management (DM): The symbolic
LU output is passed to the DM in the dialogue
act form (or semantic frame). The classic DM in-
cludes two stages, dialogue state tracking and pol-
icy learning.
• Dialogue state tracking: Given the LU sym-
bolic output, such as request(moviename;
genre=action; date=this weekend), three
major functions are performed by the state
tracker: a symbolic query is formed to inter-
act with the database to retrieve the available
results; the state tracker will be updated based
on the available results from the database and
the latest user dialogue action; and the state
tracker will prepare the state representation
s
t
for policy learning.
• Policy learning: The state representation
for the policy learning includes the lat-
est user action (e.g., request(moviename;
genre=action; date=this weekend)), the
latest agent action (request(location)), the
available database results, turn information,
and history dialogue turns, etc. Conditioned
on the state representation s
t
from the state
tracker, the policy π is to generate the next
available system action a
t
according to π(s
t
).
Either supervised learning or reinforcement
learning can be used to optimize π. Details
about RL-based policy learning can be found
in section 3.
Prior work used different implementation ap-
proaches summarized below. Dialogue state track-
ing is the process of constantly updating the state
of the dialogue, and Lee (2014) showed that there
is a positive correlation between state tracking per-
formance and dialogue performance. Most pro-
duction systems use manually designed heuris-
tics, often based on rules, to update the dialogue