面向场景的抓取估计：通过学习抓取的向量表示

需积分: 5 2 浏览量更新于2024-08-05 收藏 3.97MB PDF 举报

"这篇论文介绍了一种新的机器人抓取技术，称为GraspVDN（Grasp Vector Decision Network），它是一种面向场景的抓取估计方法，通过学习向量表示来理解和规划在复杂环境中的有效抓取策略。" 在《GraspVDN：基于学习向量的面向场景抓取估计》这篇论文中，作者们提出了一种新颖的抓取估计框架，旨在解决机器人操纵任务中的关键问题——抓取估计。传统的抓取估计通常忽视了环境对抓取姿态的约束，而GraspVDN则考虑了这些约束，并在训练过程中仅使用满足这些约束的样本。论文的核心是将有效的平行夹爪抓取表示为二维（2D）图像中的向量。这个向量表示方式有助于简化复杂环境下的抓取识别。作者们设计了一个全卷积网络（Fully Convolutional Network, FCN）来同时预测这些向量的起点和方向。这种设计使得网络能够从杂乱的场景中一次性检测出多个抓取候选位置，只需输入一个RGB图像即可。通过这种方法，转换后的向量可以被转化为具有6自由度（6-DOF）的抓取动作，这为机器人提供了更精确的抓取控制。在评估中，他们使用了大规模的GraspNet-1Billion数据集，结果显示，GraspVDN达到了可比较的性能，证明了其在复杂环境下的有效性。论文作者包括Zhipeng Dong、Hongkun Tian、Xuefeng Bao、Yunhui Yan和Fei Chen，他们在机器人学和深度学习领域有着深厚的背景。这项工作为机器人抓取技术带来了创新，特别是在处理真实世界中具有挑战性的环境时，它能更好地适应和处理各种复杂的物体和场景。总结来说，GraspVDN是机器人智能抓取领域的一个重要进展，它通过学习向量表示法提高了抓取估计的准确性与效率，有望推动机器人操作任务在复杂环境中的实际应用。这篇研究对于未来机器人自主导航和物体操纵的研究具有深远的影响。

Complex & Intelligent Systems

Object-oriented methods

Object-oriented methods, in their canonical form, try to

superpose a predeﬁned rigid 3D model of the target object to

its matching geometries in the perceived scene. The matching

reveals the pose of the target object in the scene, which is then

leveraged to derive corresponding grasp poses. Numerous

studies utilizing distinct sensory data address the matching

with various techniques for deﬁning the 3D models and per-

forming the registration [6].

Sun et al. [7] matched the segmented 3D point cloud with

a primitive geometric model of the target to derive the reg-

istration. After a rough pose matching with RANSAC [8],

the method reﬁnes the pose with the iterative closest point

(ICP) algorithm. Its accuracy substantially depends on the

matching quality of RANSAC yet is limited by t he repre-

sentativeness of primitive models. To tackle the inability of

RANSAC-based methods to scale to large databases, a shape

completion framework is proposed in [9] and simpliﬁed in

[10] to enable grasp estimation. Since if the shape and tex-

ture of the perceived object are complete, the object-oriented

method could be more accurate. A 3D convolutional neu-

ral network (CNN) was trained on a dataset of over 440,000

3D exemplars to learn to complete a segmented point cloud.

The completion could generalize to new objects, allowing

previously unseen items to be grasped. Yet, it still performs

the grasp planning with GraspIt! [11] in an out of context

manner, making it an object-oriented method.

Another research line for completing the perceived geom-

etry and estimating its pose is via multi-view fusion [12].

This kind of methods is able t o alleviate the damping factors

of perception such as poor lighting conditions, clutter, and

occlusions. Though, a precise estimation generally requires

having an accurate computer-aided design (CAD) model [13]

[14] for the target object.

Scene oriented

Scene-oriented approaches pursue an understanding of the

whole scene [16]. This kind of method can be generalized to

new objects and environments, and dynamically reacts to the

environment [17–20].

Grasping new objects in unknown (complex) scenes is a

challenging problem in the ﬁeld of robotics [21]. In recent

years, end-to-end grasp estimation methods on this problem

have thrived. These methods deal with the objects in context

(the scene), which could be deﬁned as scene-oriented grasp

estimation. They take images or point clouds as input and

produce viable grasp poses as output. This idea originated

in the work of Saxena et al. [22], which enables the robot to

grasp objects it has never seen before. The algorithm neither

requires nor tries to build or complete a 3D model of the

object. Instead, given two (or more) images of an object, it

attempts to identify a few grasp points to locate the gripper

with a supervised learned model. This set of sparse points is

then triangulated to obtain a 3D location at which to attempt

agrasp.

Subsequently, Zeng et al. [23] proposed to utilize multi-

view RGB-D data self-monitoring and data-driven learning

methods to obtain the grasping poses of objects. The sys-

tem can estimate the object’s 6-DOF grasping poses reliably

in a variety of scenes and adapts to the scene. Zapata-

Impata et al. [24] proposed an optimal grasp estimation

method for 3D point clouds based on the local perspective

of unknown objects. This approach is ﬂexible and stable to

work with objects in ever-changing scenes but is limited to a

non-cluttered environment. Mousavian et al. [25] introduced

6-DOF GraspNet for generating diverse grasps for unknown

objects. The method leveraged a trained variational auto-

encoder (VAE) t o sample multiple grasps for an object. It

also presented a scheme to move the gripper closer to a suc-

cessful grasp pose. Wang et al. [26] proposed a method for

robot grasping both rigid and soft objects. This method gen-

erates the grasping pose directly along its central axis without

relying on a CAD model. An ambidextrous grasping frame-

work is proposed in literature [4] as a signiﬁcant extension

of the previous versions of Dex-Net research. The approach

learns strategies by training on a set of grippers using a

domain randomized dataset and geometric analysis models.

Wu et al. [27] proposed an end-to-end Grasp Proposal Net-

work (GPNet) predicting a diverse set of 6-DOF grasp for an

unseen object observed from a single and unknown camera

view. GPNet builds on a crucial design for the grasp proposal

module that deﬁnes anchors of grasp centres at discrete but

regular 3D grid corners, being ﬂexible to support either pre-

cise or diverse grasp predictions.

Chu et al. [28] presented a grasping detection system to

predict grasp candidates for novel objects in RGB-D images.

The system test on the Cornell grasping dataset as well as

a self-collected multi-object multi-grasp dataset showed the

effectiveness of the design. Ten Pas et al. [29] generated grasp

hypotheses that do not require a precise segmentation of the

object. They proposed incorporating prior knowledge about

object categories to increase grasp classiﬁcation accuracy.

Since the algorithm does not segment the objects, it can detect

grasps that treat multiple objects as a single atomic object.

Liang et al. [30] proposed an end-to-end grasp evaluation

model (PointNetGPD) to address the challenging problem of

localizing robot grasp conﬁgurations directly from the point

cloud. It is lightweight and can directly process the 3D point

cloud locating within the gripper for grasp evaluation. In

[31,32], Generative Grasping Convolutional Neural Network

(GG-CNN) was presented as a grasp synthesis model which

directly generates grasp poses from a depth image on a pixel-

wise basis, instead of sampling and classifying individual

grasp candidates like other deep learning techniques.

123

剩余11页未读，继续阅读

锡城筱凯

粉丝: 3369
资源: 8

面向场景的抓取估计：通过学习抓取的向量表示

软件需求分析英文课件：Chap 9-Iteration 2--More GRASP.ppt

软件需求分析课件：Chap 9-Iteration 2--More GRASP.ppt

Packt.Machine.Learning.for.Mobile.1788629353.rar

让机械臂进行抓取的代码

return [ action.grasp for action in self._action(state, num_actions=num_actions) ]

for grasp_point_array in local_max: grasp_point = tuple(grasp_point_array) grasp_angle = ang_img[grasp_point] g = Grasp(grasp_point, grasp_angle) if width_img is not None: g.length = width_img[grasp_point] g.width = g.length/2 grasps.append(g)

列举有关3d抓取检测任务近两年的论文

grasp算法代码

计算中心点坐标并发布-grasp.py订阅坐标并进行处理的代码

最新资源