A Survey on Spoken Language Understanding: Recent Advances and New
Frontiers
Libo Qin , Tianbao Xie , Wanxiang Che
∗
, Ting Liu
Research Center for Social Computing and Information Retrieval
Harbin Institute of Technology, China
{lbqin, tianbaoxie, car, tliu}@ir.hit.edu.cn
Abstract
Spoken Language Understanding (SLU) aims to
extract the semantics frame of user queries, which
is a core component in a task-oriented dialog sys-
tem. With the burst of deep neural networks and
the evolution of pre-trained language models, the
research of SLU has obtained significant break-
throughs. However, there remains a lack of a
comprehensive survey summarizing existing ap-
proaches and recent trends, which motivated the
work presented in this article. In this paper, we
survey recent advances and new frontiers in SLU.
Specifically, we give a thorough review of this
research field, covering different aspects includ-
ing (1) new taxonomy: we provide a new per-
spective for SLU filed, including single model
vs. joint model, implicit joint modeling vs. ex-
plicit joint modeling in joint model, non pre-trained
paradigm vs. pre-trained paradigm; (2) new fron-
tiers: some emerging areas in complex SLU as
well as the corresponding challenges; (3) abundant
open-source resources: to help the community, we
have collected, organized the related papers, base-
line projects and leaderboard on a public website
1
where SLU researchers could directly access to the
recent progress. We hope that this survey can shed
a light on future research in SLU field.
1 Introduction
Spoken Language Understanding (SLU) is a core compo-
nent in task-oriented dialog system, which aims to cap-
ture the semantics of user queries. It typically consists
of two tasks: intent detection and slot filling
[
Tur and
De Mori, 2011
]
. Take the utterance “I like to watch ac-
tion movie” in Figure 1 as an example, the outputs in-
clude an intent class label (i.e., WatchMovie) and a
slot label sequence (i.e., O, O, O, B-movie-type,
I-movie-type, I-movie-type).
Intent detection can be defined as a sentence classifica-
tion problem. In recent years, many neural-network based
∗
Corresponding Author
1
https://github.com/yizhen20133868/Awesome-SLU-Survey
I like to watch action movie
WatchMovie
Utterance
Slot
Intent
O O O
Figure 1: An example with intent and slot annotation (BIO format).
m-type denotes movie-type.
classification methods such as convolutional neural network
(CNN)
[
Xu and Sarikaya, 2013
]
and recurrent neural net-
work(RNN)
[
Ravuri and Stolcke, 2015
]
have been investi-
gated. Slot filling can be formulated as a sequence labeling
task and popular sequence labeling methods such as condi-
tional random field (CRF)
[
Raymond and Riccardi, 2007
]
,
RNN-based models
[
Xu and Sarikaya, 2013
]
and Long Short-
Term Memory Network (LSTM)
[
Ravuri and Stolcke, 2015
]
have been explored.
Traditional approaches consider slot filling and intent de-
tection as two separate tasks, which ignore the shared knowl-
edge across the two tasks. Intuitively, intent detection and
slot filling are not independent and highly tied. For exam-
ple, if the intent of a user query is WatchMovie, it is more
likely to contain the slot movie name rather than the slot mu-
sic name. Thus, it’s promising to consider the interaction be-
tween the two tasks. To this end, dominant models in the lit-
erature adopt joint models for leveraging shared knowledge
across the two tasks, such as vanilla multi-task
[
Zhang and
Wang, 2016
]
, slot-gated
[
Goo et al., 2018; Li et al., 2018
]
,
stack-propagation
[
Qin et al., 2019
]
and bi-directional inter-
action
[
E et al., 2019; Qin et al., 2021b
]
. With the popularity
of deep learning and the emergence of pre-trained language
models, SLU direction has made significant progress in re-
cent years. As shown in Figure 2, in slot filling and intent
detection tasks, we clearly observe that performance has even
surpassed 97.0% and 98.0% on ATIS
[
Hemphill et al., 1990
]
while 97% and 99% on SNIPS
[
Coucke et al., 2018
]
that are
the most wildly used datasets in SLU community. This leaves
us with a question: Have we achieved SLU tasks perfectly?
In this paper, we introduce a survey to answer the above
question including: 1) a comprehensive summary of recent
progress in SLU field; 2) research challenges and frontiers
for complex SLU tasks are concluded. Our survey observes
that mainstream work remains the simple setting: single do-
arXiv:2103.03095v1 [cs.CL] 4 Mar 2021