Graph LSTM with Context-Gated Mechanism for Spoken Language
Understanding
Linhao Zhang
1
, Dehong Ma
1
, Xiaodong Zhang
1
, Xiaohui Yan
2
, Houfeng Wang
1
1
MOE Key Lab of Computational Linguistics, Peking University, Beijing, 100871, China
2
CBG Intelligence Engineering Dept, Huawei Technologies, China
{zhanglinhao, madehong, zxdcs, wanghf}@pku.edu.cn
yanxiaohui2@huawei.com
Abstract
Much research in recent years has focused on spoken lan-
guage understanding (SLU), which usually involves two
tasks: intent detection and slot filling. Since Yao et al.(2013),
almost all SLU systems are RNN-based, which have been
shown to suffer various limitations due to their sequential na-
ture. In this paper, we propose to tackle this task with Graph
LSTM, which first converts text into a graph and then utilizes
the message passing mechanism to learn the node representa-
tion. Not only the Graph LSTM addresses the limitations of
sequential models, but it can also help to utilize the seman-
tic correlation between slot and intent. We further propose a
context-gated mechanism to make better use of context infor-
mation for slot filling. Our extensive evaluation shows that the
proposed model outperforms the state-of-the-art results by a
large margin.
Introduction
Spoken language understanding (SLU) is an essential part
of dialog system. It usually involves two tasks: intent de-
tection (ID) and slot filling (SF). Typically, ID is regarded
as a semantic utterance classification problem, and differ-
ent classification methods can be applied (Haffner, Tur, and
Wright 2003; T
¨
ur et al. 2011; Deng et al. 2012). Meanwhile,
SF is usually treated as a sequence labeling problem. Pop-
ular approaches to perform SF include support vector ma-
chines (SVMs) and conditional random fields (CRFs) (Laf-
ferty, McCallum, and Pereira 2001).
Yao et al.(2013) adapted RNN language models to per-
form SLU, outperforming previous CRF-based models by
a large margin. RNN-based methods (including LSTM and
GRU) have since defined the state-of-the-art in SLU research
(Mesnil et al. 2015; Liu and Lane 2016; Zhang and Wang
2016; Goo et al. 2018; Niu et al. 2019).
Despite their success, these RNN-based models have been
shown to suffer various limitations. Firstly, their inherently
sequential nature precludes parallelization within training
examples (Vaswani et al. 2017). Secondly, local n-grams are
not fully exploited in their models. In SLU, slots are not only
determined by the associated items, but also local context.
Copyright
c
2020, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Figure 1: An example of SLU utterance with intent and an-
notated slots using the IOB scheme. The B- prefix before a
tag indicates that the tag is the beginning of a slot, and an I-
prefix before a tag indicates that the tag is inside a slot. An
O tag indicates that a token belongs to no slot.
As shown in Figure 1, the corresponding slot label for Seat-
tle is B-fromloc, but it could also be B-toloc, if the utterance
is transformed into show flights from San Diego to Seat-
tle. Thirdly, the sequential nature of RNN-based methods
leads to weaker power in capturing long-range dependen-
cies, which accounts for a large portion of SF errors (T
¨
ur,
Hakkani-T
¨
ur, and Heck 2010).
In this paper, we propose to use Graph LSTM to tackle
these problems. There are many variants of Graph LSTM
(Liang et al. 2016; Peng et al. 2017; Zayats and Ostendorf
2018; Song et al. 2018; Zhang, Liu, and Song 2018). In this
paper, we choose the S-LSTM (Zhang, Liu, and Song 2018)
because it is ideally suited for this task.
The main idea of S-LSTM is to model the hidden states
of all words simultaneously rather than sequentially, hence
can solve the non-parallelization problem. Specifically, the
S-LSTM views the whole sentence as a single graph, which
consists of word-level nodes and a sentence-level node.
These nodes are updated simultaneously through message
passing mechanism. Since message passing is conducted be-
tween consecutive word-level nodes, and between sentence-
level node and each word-level node, both local n-grams and
long-range dependencies are better captured.
Compared to other variants of Graph LSTM, the S-LSTM
has a special sentence-level node, making it ideally suited to
utilize the semantic correlation between slot and intent. We
note that intent and slot are not independent but intrinsically
correlated. As the example shown in Figure 1, an utterance
is more likely to contain departure and arrival cities if its
intent is to find a flight, and vice versa. For joint ID and
SF, we use the final word-level nodes of S-LSTM for slots