深度学习驱动的机器人抓取检测与多模态处理

需积分: 50 8 浏览量更新于2024-07-17 2 收藏 1.81MB PDF 举报

"这篇论文探讨了使用深度学习技术来解决机器人抓取检测的问题，特别是针对包含物体的RGB-D场景中的抓取识别。作者们提出了一个两步级联系统，包括两个深度网络，以快速且鲁棒地评估大量候选抓取动作。此外，他们还提出了一种基于多模态组正则化的结构化正则化方法，以有效处理多模态输入数据。此方法在RGB-D机器人抓取数据集上表现优越，并成功应用到两种不同的机器人平台上执行抓取任务。" 在当前的机器人领域，抓取检测是实现自动化操作的关键环节，特别是在复杂的环境中，如家庭或工业环境，机器人需要能够准确识别并抓住不同形状和大小的物体。传统的抓取检测方法通常依赖于手动设计的特征，这既耗时又难以适应复杂情况。因此，这篇论文引入深度学习来自动化这个过程，减少对人工特征工程的依赖。论文中提到的第一个挑战是评估大量的候选抓取动作。为了解决这个问题，研究者设计了一个两阶段的级联系统。第一阶段的网络具有较少的特征，运行速度快，能迅速排除不合理的抓取候选。而第二阶段的网络拥有更丰富的特征，虽然运行速度较慢，但只对第一阶段选出的顶级候选进行进一步评估，这样能够在保持效率的同时提高准确性。第二个挑战是如何有效地处理多模态输入。多模态数据，如结合颜色（RGB）和深度（D）信息，可以提供更丰富的环境理解。论文提出了一个基于多模态组正则化的结构化正则化策略，这种方法可以优化权重分配，从而更好地融合来自不同模态的信息。通过这种方式，模型能更好地理解并处理来自不同传感器的数据，增强了抓取检测的性能。实验结果显示，这种方法在康奈尔大学的RGB-D机器人抓取数据集上表现出色，同时已经在两个不同的机器人平台上验证了其实用性。这表明该方法不仅在理论上有价值，而且在实际应用中也有很高的潜力，对于推动机器人技术的发展，尤其是自主抓取能力的提升，具有重要的意义。

Fig. 2: Detecting and executing grasps: From left to right: Our system obtains an RGB-D image from a Kinect mounted on the robot,

and searches over a large space of possible grasps, for which some candidates are shown. For each of these, it extracts a set of raw features

corresponding to the color and depth images and surface normals, then uses these as inputs to a deep network which scores each rectangle.

Finally, the top-ranked rectangle is selected and the corresponding grasp is executed using the parameters of the detected rectangle and the

surface normal at its center. Red and green lines correspond to gripper plates, blue in RGB-D features indicates masked-out pixels.

may use different subsets of the modalities. In this work, we

will give a structured regularization method which guides the

learning algorithm to select such subsets, without imposing

hard constraints on network structure.

Structured Learning and Structured Regularization: Sev-

eral approaches have been proposed which attempt to use a

specially-designed regularization function to impose structure

on a set of learned parameters without directly enforcing it.

Jalali et al. [26] used a group regularization function in the

multitask learning setting, where one set of features is used for

multiple tasks. This function applies high-order regularization

separately to particular groups of parameters. Their function

regularized the number of features used for each task in a set of

multi-class classiﬁcation tasks solved by softmax regression.

Intuitively, this encodes the belief that only some subset of

the input features will be useful for each task, but this set of

useful features might vary between tasks.

A few works have also explored the use of structured

regularization in deep learning. The Topographic ICA algo-

rithm [24] is a feature-learning approach that applies a similar

penalty term to feature activations, but not to the weights

themselves. Coates and Ng [8] investigate the problem of

selecting receptive ﬁelds, i.e., subsets of the input features

to be used together in a higher-level feature. The structure

of the network is learned ﬁrst, then ﬁxed before learning the

parameters of the network.

III. DEEP LEARNING FOR GRASP DETECTION:

SYSTEM AND MODEL

In this work, we will present an algorithm for robotic grasp

detection from a single RGB-D view. Our approach will be

based on machine learning, but distinguish itself from previous

approaches by learning not only the weights used to rank

prospective grasps, but also the features used to rank them,

which were previously hand-engineered.

We will do this using deep learning methods, learning a

set of RGB-D features which will be extracted from each

candidate grasp, then used to score that grasp. Our approach

will include a structured multimodal regularization method

which improves the quality of the features learned from

RGB-D data without constraining network structure.

In our system for robotic grasping, as shown in Fig. 2, the

robot ﬁrst obtains an RGB-D image of the scene containing

objects to be grasped. A small deep network is used to score

potential grasps in this image, and a small candidate set of the

top-ranked grasps is provided to a larger deep network, which

yields a single best-ranked grasp.

In this work, we will represent potential grasps using

oriented rectangles in the image plane as seen on the left in

Fig. 2, with one pair of parallel edges corresponding to the

robotic gripper [28]. Each rectangle is thus parameterized by

the X and Y coordinates of its upper-left corner, its width,

height, and orientation in the image plane, giving a ﬁve-

dimensional search space for potential grasps. Grasps will be

ranked based on features extracted from the RGB-D image

region contained inside their corresponding rectangle, aligned

to the gripper plates, as seen in the center of Fig. 2.

To translate a rectangle such as that shown on the right in

Fig. 2 into a gripper pose for grasping we ﬁnd the point with

the minimum depth inside the central third (horizontally) of

the rectangle. We then use the averaged surface normal around

this point to determine the approach vector for the gripper.

The orientation of the detected rectangle is translated to a

rotation around this vector to orient the gripper. We use the

X-Y coordinates of the rectangle center along with the depth

of the closest point to determine a grasping point in the robot’s

coordinate frame. We compute a pre-grasp position by shifting

10 cm back from the grasping point along this approach vector

and position the gripper at this point. We then approach the

object along the approach vector and grasp it.

Using a standard feature learning approach such as sparse

auto-encoder [21], a deep network can be trained for the

problem of grasping rectangle recognition (i.e., does a given

rectangle in image space correspond to a valid robotic grasp?).

剩余16页未读，继续阅读

qq_37595932

粉丝: 0
资源: 6

深度学习驱动的机器人抓取检测与多模态处理

抓取检测数据集Cornell生成抓取检测标签.mat文件

质心提取算法

REGBD数据集 多模态行人

基于深度学习的机器人抓取位置检测方法.pdf

基于EM算法的工业机器人抓取控制研究.pdf

基于多模态信息的机器人视觉识别与定位研究.pdf

nao机器人抓取程序

多模态卷积神经网络的物体抓取检测.pdf

基于ROS视觉定位的机器人智能抓取系统研究_王海玲

基于高斯混合模型的工业机器人适应性抓取.pdf

最新资源

REGBD数据集多模态行人