语音分离驱动的多人远程科学互动新方法

82 浏览量更新于2024-08-26 收藏 173KB PDF 举报

本文探讨了一种创新的语音交互方法，特别应用于多人语音远程科学交互（Telescience Interaction）。在当前的科技背景下，随着远程科学教育和协作的需求增加，传统的声音通信方式面临着挑战，尤其是在嘈杂的环境中进行精确交流。为了克服这些问题，研究者们提出了基于深度聚类（Deep Clustering）与局部优化的语音分离技术。深度聚类作为一种无监督的机器学习方法，通过自组织的方式对音频信号进行分组，能够有效地将多个人的语音信号分离，降低背景噪音对对话的影响。通过引入局部优化策略，这种方法进一步提高了语音分离的质量，确保了每个参与者的语音清晰度，这对于远程科学讨论至关重要，因为精确的语音理解是有效沟通的基础。接着，论文构建了一个结合了语音识别、语义理解和语音合成的科学交互系统。语音识别模块将分离后的语音转换为文本，使得系统能够理解用户提出的问题或指令；语义理解模块则解析这些文本内容，提取出核心科学概念和问题，以便于后续处理；最后，语音合成技术将系统的回答转化为语音输出，形成一个完整的交互流程。实验结果显示，这种将语音分离技术融入多人语音科学交互的方法显著提升了交互的效率和质量。它使得多个参与者能够在远程环境下，无论身处何处，都能进行清晰、准确的科学对话，促进了科研合作的进行。因此，这项研究不仅对提高远程科学教育的体验有重大意义，也对未来的科学研究团队协作有着深远影响。关键词：远程科学（Telescience）、语音分离、语音识别、语义理解、语音合成。该研究为解决远程科学交流中的声音干扰问题提供了一种新颖且实用的解决方案，具有很高的学术价值和实际应用潜力。

Multi-person Speech Telescience Interaction with Speech Separation

Taotao Fu

1,2

Key Laboratory of Space Utilization, Technology and

Engineering Center for

Space Utilization, Chinese Academy of Sciences

University of Chinese Academy of Sciences

Beijing, China

E-mail: 472567528@qq.com

Ge Yu

, Lili Guo

, Ji Liang

Key Laboratory of Space Utilization, Technology and

Engineering Center for

Space Utilization, Chinese Academy of Sciences

E-mail: yuge@csu.ac.cn; guolili@csu.ac.cn

liangji@csu.ac.cn

Abstract—In this paper, we proposed a method of speech

interaction that combines speech separation to solve

telescience interaction. Primarily, a speech separation

method is proposed based on Deep Clustering with local

optimization to achieve a better local separation and reduces

the distortion of speech. Then, a telescience interaction was

constructed by combining speech recognition, semantic

understanding and speech synthesis. The results show the

proposed method make it possible to accomplish a multi-

person speech telescience interaction through combining

speech separation.

Keywords-Telescience; Semantic understanding; Speech

separation; Speech recognition; Speech synthesis

I. INTRODUCTION

Tele-science implies the ability to conduct remote

operations (in space) by making rapid adjustments to

instrumental parameters and experiment procedures in

order to optimize performance and obtain the best possible

data [1]. A number of experiments in an exploratory

environment involving frequent control, such as in the

space physics and material sciences, are preferably

operated by the scientists on the ground because of the

complexity of experiments and high workload of the crew

in space. There are two essential processes for space tele-

science. One is to display the information of experiments

by receiving the telemetry data, and the other is to conduct

the facilities onboard by uploading the checked tele-

commands from the ground.

Nowadays, most scientists still need to rely on

numbers or picture on a screen to evaluate their ongoing

experiment's progress and set the target parameters to be

uploaded using a keyboard or a mouse click-by-click [2],

which seems sluggish and inefficient.

Therefore, it is necessary to explore an intuitive

human-computer interface consisted of speech interaction

that allows the experiment to become more interesting and

efficient through the direct involvement of scientists in the

experiment. Also a more nature free-hand interaction with

the device in the virtual experiment environment can

provide users a home institute-like space for planning,

scheduling, operations and correlative analysis.

Language is the most natural and convenient way of

communication for mankind since ancient times.

Correspondingly, speech interaction is one of the most

direct and natural interaction modes in human-computer

interaction. In the era of artificial intelligence of future, we

will certainly liberate our hands thoroughly through the

interaction of speeches. Speech interaction mainly includes

speech recognition, natural language processing and

speech synthesis and other fields that tend to mature with

the rapid development of artificial intelligent(AI). There

are many advantages of speech interaction, such as react

immediately when heard the command, simplicity of

operator, extensive usage scenarios and Conjectural

discourse meaning. Bruno [3] adopted speech interaction

for training navy officers’ adaptive training simulation.

Ishihara [4] proposes a method for manufactured objects

such as anime figures to exhibit highly realistic behavioral

expressions to improve speech interaction between a user

and an object. Robert [5] combined speech and deictic

gestures to instruct the car about desired interventions

which include spatial references to the current

environment.

There could be speech interaction integration into

remote scientific experiments, so as to improve the

interaction naturally and flexibility, and ameliorate the

quality and efficiency of the experiment. However, in the

existing speech interaction system, there are noise or

“cocktail-party [6]” problem, which will cause low

recognition rate and low accuracy. Therefore, it is

necessary to consider how to separate each speech from

mixed speech, thereby enhancing the efficiency of remote

scientific interaction.

In this paper, a method of speech interaction that

combines speech separation is proposed to address

telescience interaction. The following sections of this

article are organized as follows. Methods based on deep

clustering (DC) with local optimization been proposed to

solve the problem of obtaining each voice from a mixed

voice firstly. Secondly, we construct a speech telescience

through combining speech recognition, semantic

understanding and speech synthesis. At last, several

experiments are performed to validate our proposed

method.

II. M

ETHODS

In this paper, the speech recognition module based on

IFLYTEK is used for collecting sound continuously and

obtaining corresponding text information. Thirdly, the

results obtained are processed by rule matching, semantic

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38661008

粉丝: 3
资源: 878

语音分离驱动的多人远程科学互动新方法

最新资源