深度学习驱动的口语语言理解：最新进展与未来方向

需积分: 9 170 浏览量更新于2024-08-25 收藏 482KB PDF 举报

"对话系统中口语语言理解的研究不断推进，涉及深度神经网络和预训练语言模型的应用，旨在从用户查询中提取语义框架信息，服务于任务型对话系统的关键组件如对话状态追踪模块和自然语言生成模块。" 在对话系统中，口语语言理解（SLU）扮演着至关重要的角色。它通过解析用户的口头输入，提取出语句的框架语义表示，这一过程对于理解用户意图至关重要。随着深度学习技术的爆发和预训练语言模型的发展，SLU的研究取得了显著的突破。首先，我们关注到一个新的分类视角。SLU可以分为单一模型和联合模型。单一模型通常专注于一个特定的任务，例如识别意图或抽取槽值，而联合模型则尝试同时处理多个任务，以获得更全面的语义理解。在联合模型内部，又有隐式联合建模和显式联合建模之分，前者在模型内部隐含地处理多个任务，后者则明确地将各个任务结合在一起。此外，研究还对比了非预训练范式与预训练范式，预训练模型通常能提供更好的语言理解能力，因为它们在大规模数据上进行了预先学习，能更好地捕捉语言的复杂性和上下文依赖性。其次，SLU的新前沿领域包括对多模态和跨语言SLU的关注。多模态SLU考虑了除了语音之外的其他输入，如视觉信息或文本，这在处理如智能助手或车载导航等场景时尤其有用。跨语言SLU则旨在实现不同语言之间的理解和生成，推动了全球化和多语种服务的发展。此外，研究者还探索了对抗性训练和数据增强策略来提高SLU模型的鲁棒性，减少对大量标注数据的依赖，并通过迁移学习将知识从已有的大规模数据转移到小规模任务中。同时，自注意力机制和Transformer架构的应用使得模型能够更有效地处理长距离依赖关系，提升理解精度。最后，SLU领域的挑战依然存在，包括但不限于对话理解的实时性、处理模糊和不完整输入的能力、以及应对开放域对话的灵活性。未来的研究可能会更多地关注如何利用无监督或弱监督学习方法，以及如何在保证性能的同时降低模型复杂性，以实现更高效、更适应实际应用场景的SLU系统。 SLU是一个快速发展的领域，其进步不仅依赖于深度学习技术的进步，也依赖于对人类语言理解本质的深入洞察。通过持续的理论创新和应用实践，我们可以期待对话系统在未来变得更加智能和人性化。

展开

A Survey on Spoken Language Understanding: Recent Advances and New

Frontiers

Libo Qin , Tianbao Xie , Wanxiang Che

∗

, Ting Liu

Research Center for Social Computing and Information Retrieval

Harbin Institute of Technology, China

{lbqin, tianbaoxie, car, tliu}@ir.hit.edu.cn

Abstract

Spoken Language Understanding (SLU) aims to

extract the semantics frame of user queries, which

is a core component in a task-oriented dialog sys-

tem. With the burst of deep neural networks and

the evolution of pre-trained language models, the

research of SLU has obtained signiﬁcant break-

throughs. However, there remains a lack of a

comprehensive survey summarizing existing ap-

proaches and recent trends, which motivated the

work presented in this article. In this paper, we

survey recent advances and new frontiers in SLU.

Speciﬁcally, we give a thorough review of this

research ﬁeld, covering different aspects includ-

ing (1) new taxonomy: we provide a new per-

spective for SLU ﬁled, including single model

vs. joint model, implicit joint modeling vs. ex-

plicit joint modeling in joint model, non pre-trained

paradigm vs. pre-trained paradigm; (2) new fron-

tiers: some emerging areas in complex SLU as

well as the corresponding challenges; (3) abundant

open-source resources: to help the community, we

have collected, organized the related papers, base-

line projects and leaderboard on a public website

where SLU researchers could directly access to the

recent progress. We hope that this survey can shed

a light on future research in SLU ﬁeld.

1 Introduction

Spoken Language Understanding (SLU) is a core compo-

nent in task-oriented dialog system, which aims to cap-

ture the semantics of user queries. It typically consists

of two tasks: intent detection and slot ﬁlling

[

Tur and

De Mori, 2011

]

. Take the utterance “I like to watch ac-

tion movie” in Figure 1 as an example, the outputs in-

clude an intent class label (i.e., WatchMovie) and a

slot label sequence (i.e., O, O, O, B-movie-type,

I-movie-type, I-movie-type).

Intent detection can be deﬁned as a sentence classiﬁca-

tion problem. In recent years, many neural-network based

∗

Corresponding Author

https://github.com/yizhen20133868/Awesome-SLU-Survey

I like to watch action movie

WatchMovie

Utterance

Slot

Intent

O O O

-m-

type

I-m-type I-m-type

Figure 1: An example with intent and slot annotation (BIO format).

m-type denotes movie-type.

classiﬁcation methods such as convolutional neural network

(CNN)

[

Xu and Sarikaya, 2013

]

and recurrent neural net-

work(RNN)

[

Ravuri and Stolcke, 2015

]

have been investi-

gated. Slot ﬁlling can be formulated as a sequence labeling

task and popular sequence labeling methods such as condi-

tional random ﬁeld (CRF)

[

Raymond and Riccardi, 2007

]

RNN-based models

[

Xu and Sarikaya, 2013

]

and Long Short-

Term Memory Network (LSTM)

[

Ravuri and Stolcke, 2015

]

have been explored.

Traditional approaches consider slot ﬁlling and intent de-

tection as two separate tasks, which ignore the shared knowl-

edge across the two tasks. Intuitively, intent detection and

slot ﬁlling are not independent and highly tied. For exam-

ple, if the intent of a user query is WatchMovie, it is more

likely to contain the slot movie name rather than the slot mu-

sic name. Thus, it’s promising to consider the interaction be-

tween the two tasks. To this end, dominant models in the lit-

erature adopt joint models for leveraging shared knowledge

across the two tasks, such as vanilla multi-task

[

Zhang and

Wang, 2016

]

, slot-gated

[

Goo et al., 2018; Li et al., 2018

]

stack-propagation

[

Qin et al., 2019

]

and bi-directional inter-

action

[

E et al., 2019; Qin et al., 2021b

]

. With the popularity

of deep learning and the emergence of pre-trained language

models, SLU direction has made signiﬁcant progress in re-

cent years. As shown in Figure 2, in slot ﬁlling and intent

detection tasks, we clearly observe that performance has even

surpassed 97.0% and 98.0% on ATIS

[

Hemphill et al., 1990

]

while 97% and 99% on SNIPS

[

Coucke et al., 2018

]

that are

the most wildly used datasets in SLU community. This leaves

us with a question: Have we achieved SLU tasks perfectly?

In this paper, we introduce a survey to answer the above

question including: 1) a comprehensive summary of recent

progress in SLU ﬁeld; 2) research challenges and frontiers

for complex SLU tasks are concluded. Our survey observes

that mainstream work remains the simple setting: single do-

arXiv:2103.03095v1 [cs.CL] 4 Mar 2021

下载后可阅读完整内容，剩余7页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

syp_net

粉丝: 158

深度学习驱动的口语语言理解：最新进展与未来方向

口语对话系统对话管理深度综述：进展与挑战

双语对话驱动的汉维口语翻译技术研究：最新进展与应用

深度学习驱动的口语理解联合建模：最新进展与展望

7-1+人机对话技术研究进展与思考.pdf

语言功能评定及失语症康复治疗研究进展PPT课件.pptx

口语理解中改进循环神经网络的应用.pdf

基于深度学习的口语理解联合建模算法综述.pdf

深度学习驱动的人机对话系统中意图识别进展综述

对话式推荐系统：现状与探索

李德毅院士解密：大数据推动下的人类语音理解进展

最新资源