突破口语限制：深度学习驱动的机器人实时物体选择系统

需积分: 5 44 浏览量更新于2024-09-07 收藏 3.37MB PDF 举报

《交互式通过无约束口语指导拾取现实世界物体》（Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions）是一项由日本Preferred Networks的研究团队完成的重要成果。在这个研究中，他们针对机器人与人类的有效交流提出了一个关键性的解决方案。传统的语音理解和机器人控制面临两大挑战：一是口语表达的复杂性和多样性，即口语中的自然语言结构丰富且表达方式多样；二是人类指令本身的内在模糊性，这可能导致指令解析上的困难。为了克服这些难题，该团队开发了一个全面的系统，专注于处理无约束口语（Unconstrained Spoken Language）。核心在于深度学习技术的应用，特别是结合了基于深度学习的对象检测算法，以及自然语言处理技术。这些技术的整合使得机器人能够理解并执行不受限制的口语指令，即使在含有歧义的情况下也能进行有效的沟通和理解。他们的方法首先依赖于先进的对象检测模型，通过对口语指令中的关键词、语境和上下文进行解析，识别出用户所指的具体物体。然后，利用自然语言处理技术，如语法分析和语义理解，来解析复杂的口语指令结构，消除可能的歧义。此外，他们还引入了对话机制，通过与用户的交互来进一步确认或澄清指令，提高了指令的准确性和执行效率。实验部分，研究人员在模拟环境和实际工业机器人臂上进行了广泛的测试。结果显示，他们的系统在理解自然语言指令、识别物体以及解决口语指令中的歧义方面表现出色，从而显著提升了机器人与人类的交互性能，为未来的智能机器人与人类在日常环境中的无缝协作奠定了坚实基础。这篇论文对自然语言处理在机器人领域的应用进行了深入探索，展示了如何通过深度学习和对话策略来实现对无约束口语的理解，为构建更加灵活和高效的人机交互系统提供了重要的理论支持和技术手段。这一研究成果对于推动人工智能和机器人技术的发展具有重要意义。

Interactively Picking Real-World Objects with

Unconstrained Spoken Language Instructions

Jun Hatori

∗

, Yuta Kikuchi

∗

, Sosuke Kobayashi

∗

, Kuniyuki Takahashi

∗

Yuta Tsuboi

∗

, Yuya Unno

∗

, Wilson Ko, Jethro Tan

†

Abstract— Comprehension of spoken natural language is an

essential skill for robots to communicate with humans effec-

tively. However, handling unconstrained spoken instructions

is challenging due to (1) complex structures and the wide

variety of expressions used in spoken language, and (2) inherent

ambiguity of human instructions. In this paper, we propose the

ﬁrst comprehensive system for controlling robots with uncon-

strained spoken language, which is able to effectively resolve

ambiguity in spoken instructions. Speciﬁcally, we integrate deep

learning-based object detection together with natural language

processing technologies to handle unconstrained spoken instruc-

tions, and propose a method for robots to resolve instruction

ambiguity through dialogue. Through our experiments on both

a simulated environment as well as a physical industrial robot

arm, we demonstrate the ability of our system to understand

natural instructions from human operators effectively, and

show how higher success rates of the object picking task can

be achieved through an interactive clariﬁcation process.

I. INTRODUCTION

As robots become more omnipresent, there is also an

increasing need for humans to interact with robots in a handy

and intuitive way. For many real-world tasks, use of spoken

language instructions is more intuitive than programming,

and is more versatile than alternative communication meth-

ods such as touch panel user interfaces [1] or gestures [2] due

to the possibility of referring to abstract concepts or the use

of high-level instructions. Hence, using natural language as

a means to interact between humans and robots is desirable.

However, there are two major challenges to realize the

concept of robots that interpret language and act accord-

ingly. First, spoken language instructions as used in our daily

lives have neither predeﬁned structures nor a limited vocabu-

lary, and often include uncommon and informal expressions,

e.g., “Hey man, grab that brown ﬂuffy thing”, see Figure 1.

Second, there is inherent ambiguity in interpreting spoken

languages, since humans do not always put effort in making

their instructions clear. For example, there might be multiple

“ﬂuffy” objects present in the environment like in Figure 1,

in which case the robot would need to ask back: e.g.

“Which one?”. Although proper handling of such diverse and

ambiguous expressions is a critical factor towards building

domestic or service robots, little effort has been made to

∗

The starred authors are contributed equally and ordered alphabetically.

†

All authors are associated with Preferred Networks, Inc. {hatori,

kikuchi, sosk, takahashi, tsuboi, unno, wko, jettan}@preferred.jp

Accompanying videos are available at the following links:

https://youtu.be/_Uyv1XIUqhk (the system submitted to ICRA-

2018) and http://youtu.be/DGJazkyw0Ws (with improvements af-

ter ICRA-2018 submission)

Fig. 1: An illustration of object picking via human–robot interac-

tion. Our robot asks for clariﬁcation if the given instruction

has interpretation ambiguity.

date to address these challenges, especially in the context of

human–robot interaction.

In this paper, we tackle these two challenges in spoken

human–robot communication, and develop a robotic system

which a human operator can communicate with using uncon-

strained spoken language instructions. To handle complex

structures and cope with the diversity of unconstrained

language, we combine and modify existing state-of-the-art

models for object detection [3], [4] and object-referring

expressions [5], [6] into an integrated system that can handle

a wide variety of spoken expressions and their mapping

to miscellaneous objects in a real-world environment. This

modiﬁcation makes it possible to train the network without

explicit object class information, and to realize zero-shot

recognition of unseen objects. To handle inherent ambiguity

in spoken instructions, our system also focuses on the pro-

cess of interactive clariﬁcation, where ambiguity in a given

instruction can be resolved through dialogue. Moreover, our

system agent combines verbal and visual feedback as shown

in Figure 1 in such a way that the human operator can provide

additional explanations to narrow down the object of interest

similar to how humans communicate. We show that spoken

language instructions are indeed effective in improving the

end-to-end accuracy of real-world object picking.

Although the use of natural language instructions has

received attention in the ﬁeld of robotics [7]–[10], our work

is the ﬁrst to propose a comprehensive system integrating the

process of interactive clariﬁcation while supporting uncon-

strained spoken instructions through human–robot dialogue.

To evaluate our system in a complex, realistic environment,

arXiv:1710.06280v2 [cs.RO] 28 Mar 2018

下载后可阅读完整内容，剩余7页未读，立即下载

神马都布吉岛

粉丝: 1
资源: 5

突破口语限制：深度学习驱动的机器人实时物体选择系统

Fast Data Processing with Spark(PACKT,2ed,2015)

Core HTML5 Canvas

Python-Interactively-Beginner

Region-restricted model for fast volume rendering to interactively change visualized objects

Learn-React-Interactively

How To Run Rapid Clone (adcfgclone.pl) Non-Interactively (Doc ID

Debugging with GDB --2007年

Interactively Shaping Agents via Human Reinforcement (2).pdf

Debugging with GDB --2001年5.3

Debugging with GDB --2003年6.0

最新资源