多模态抓取与跨域识别：杂乱环境中的机器人新物拾取与放置系统

需积分: 10 79 浏览量更新于2024-09-08 收藏 5.32MB PDF 举报

本文介绍了一种创新的机器人抓取与放置系统，专注于在复杂环境中处理已知和未知物体的捡取与识别任务。该系统的亮点在于其无需针对新型物体进行专门的任务特定训练数据，从而实现跨类别抓取和识别。系统的核心技术包括一个多模态的抓取框架和跨域图像匹配算法。首先，系统采用了一种对象无关的抓取框架，它通过视觉感知将观察到的环境映射到行动上。这一框架利用深度学习方法，尤其是密集像素级的概率映射，来推断四种基本抓取动作（如指尖、侧握、夹持等）的可行性。这种多模态的抓取策略允许系统灵活适应各种形状和大小的物体，提高了在混乱场景中的抓取成功率。抓取后，系统引入了一个跨域图像匹配算法，用于识别抓取到的物体。该算法将现场拍摄的物体图片与预先获取的产品图片库进行比对，利用深度学习的特征提取和匹配技术，即使对于从未见过的物体也能实现准确的分类。这得益于广泛存在的产品图片资源，使得系统能够无缝地应用于新型物体，而无需额外的数据收集或模型重新训练。实验结果显示，该系统的多模态抓取策略在各种不同类型的物体中表现出高成功率，无论是已知还是新型的物体。识别算法在识别能力上同样出色，不论是已知物品还是初次接触的新品，都能达到高精度。这项技术是2017年亚马逊机器人挑战赛中麻省理工学院-普林斯顿团队夺冠的关键组成部分，他们的系统在“存放”任务中取得了第一。论文作者来自普林斯顿大学和麻省理工学院，所有相关的代码、数据集和预训练模型均在线公开，便于学术界和工业界进一步研究和应用。视频链接也提供给了读者，以便于直观了解系统的实际操作过程。这篇论文提出了一个强大的工业机器人系统，结合了立体视觉和深度学习技术，实现了在无需额外定制的情况下，对复杂环境中的新旧物体进行高效且准确的抓取和识别，为机器人自动化和智能物流等领域开辟了新的可能性。

Fig. 3. Multi-functional gripper with a retractable mechanism that enables

quick and automatic switching between suction (pink) and grasping (blue).

system at training time, both as physical objects and as

representative product images (images of objects available

on the web); while the “novel” objects are provided only at

test time in the form of representative product images.

Overall approach. The system follows a grasp-ﬁrst-then-

recognize work-ﬂow. For each pick-and-place operation, it

ﬁrst uses FCNs to infer the pixel-wise affordances of four

different grasping primitive actions: from suction to parallel-

jaw grasps (Section IV). It then selects the grasping primitive

action with the highest affordance, picks up one object,

isolates it from the clutter, holds it up in front of cameras,

recognizes its category, and places it in the appropriate bin.

Although the object recognition algorithm is trained only on

known objects, it is able to recognize novel objects through

a learned cross-domain image matching embedding between

observed images of held objects and product images (Section

V).

Advantages. This system design has several advantages.

First, the affordance-based grasping algorithm is model-free

and agnostic to object identities and generalizes to novel

objects without re-training. Second, the category recognition

algorithm works without task-speciﬁc data collection or re-

training for novel objects, which makes it scalable for appli-

cations in warehouse automation and service robots where

the range of observed object categories is large and dynamic.

Third, our grasping framework supports multiple grasping

modes with a multi-functional gripper and thus handles a

wide variety of objects. Finally, the entire processing pipeline

requires only a few forward passes through deep networks

and thus executes quickly (Table II).

System setup. Our system features a 6DOF ABB IRB

1600id robot arm next to four picking work-cells. The robot

arm’s end-effector is a multi-functional gripper with two

ﬁngers for parallel-jaw grasps and a retractable suction cup

(Fig. 3). This gripper was designed to function in cluttered

environments: ﬁnger and suction cup length are speciﬁcally

chosen such that the bulk of the gripper body does not

need to enter the cluttered space. Each work-cell has a

storage bin and four statically-mounted RealSense SR300

RGB-D cameras (Fig. 2): two cameras overlooking the

storage bins are used to infer grasp affordances, while the

other two pointing towards the robot gripper are used to

recognize objects in the gripper. Although our experiments

were performed with this setup, the system was designed to

suction down suction side grasp down flush grasp

Fig. 4. Multiple motion primitives for suction and grasping to ensure

successful picking for a wide variety of objects in any orientation.

be ﬂexible for picking and placing between any number of

reachable work-cells and camera locations. Furthermore, all

manipulation and recognition algorithms in this paper were

designed to be easily adapted to other system setups.

IV. MULTI-AFFORDANCE GRASPING

The goal of the ﬁrst step in our system is to robustly

grasp objects from a cluttered scene without relying on their

object identities or poses. To this end, we deﬁne a set of

four grasping primitive actions that are complementary to

each other in terms of utility across different object types and

scenarios – empirically maximizing the variety of objects and

orientations that can be picked with at least one primitive.

Given RGB-D images of the cluttered scene at test time, we

infer the dense pixel-wise affordances for all four primitives.

A task planner then selects and executes the primitive with

the highest affordance (more details of this planner can be

found in the Appendix).

A. Grasping Primitives

We deﬁne four grasping primitives to achieve robust

picking for typical household objects. Fig. 4 shows example

motions for each primitive. Each of them are implemented

as a set of guarded moves, with collision avoidance and

quick success or failure feedback mechanisms: for suction,

this comes from ﬂow sensors; for grasping, this comes from

contact detection via force feedback from sensors below

the work-cell. Robot arm motion planning is automatically

executed within each primitive with stable IK solves [26].

These primitives are as follows:

Suction down grasps objects with a vacuum gripper ver-

tically. This primitive is particularly robust for objects

with large and ﬂat suctionable surfaces (e.g. boxes, books,

wrapped objects), and performs well in heavy clutter.

Suction side grasps objects from the side by approaching

with a vacuum gripper tilted an an angle. This primitive is

robust to thin and ﬂat objects resting against walls, which

may not have suctionable surfaces from the top.

Grasp down grasps objects vertically using the two-ﬁnger

parallel-jaw gripper. This primitive is complementary to

the suction primitives in that it is able to pick up objects

with smaller, irregular surfaces (e.g. small tools, deformable

objects), or made of semi-porous materials that prevent a

good suction seal (e.g. cloth).

剩余10页未读，继续阅读

南山二毛

粉丝: 1w+
资源: 72

多模态抓取与跨域识别：杂乱环境中的机器人新物拾取与放置系统

Vue动态加载图片在跨域时无法显示的问题及解决方法

JPA 开发指导文档

vrtk3.2.1Demo

Vue项目中使用jsonp抓取跨域数据的方法

图像翻译-具有权重共享GAN的不配对跨域图像翻译算法实现-附项目源码+流程教程-优质项目实战.zip

生成对抗网络的语义不变跨域图像生成

利用多种特征表示法进行准确，高效的跨域视觉匹配

颜色分类leetcode-fttl-with-keras:使用Keras进行跨域图像分类的迁移学习和微调

论文研究-图正则化迁移稀疏概念编码的跨域图像分类.pdf

跨域图像匹配：数据驱动的视觉相似性研究

最新资源