深度学习驱动的机器人稳健抓取策略研究

机器人抓取

深度学习

需积分: 10 72 浏览量更新于2024-07-17 收藏 18.86MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"这篇技术报告是来自加利福尼亚大学伯克利分校电气工程与计算机科学系的Jeffrey Mahler博士的研究成果，标题为《高效策略学习：实现稳健的机器人抓取》(EECS-2018-120 Efficient Policy Learning for Robust Robot Grasping)。该文献主要探讨了深度学习在机器人抓取任务中的应用，旨在通过高效的学习策略提升机器人的抓取稳定性。深度学习近年来在人工智能领域，特别是机器人技术中发挥了关键作用。在机器人抓取这一特定问题上，深度学习能够帮助机器人理解和预测物体的形状、质地以及抓取的最优方式。Jeffrey Mahler的论文可能涵盖了如何利用神经网络模型来模拟和优化抓取动作，以适应不同形状、大小和重量的物体。报告中可能详细讨论了以下几点： 1. **数据驱动的抓取策略**：通过收集大量的抓取尝试数据，训练深度神经网络以理解成功的抓取模式，并预测在不同情境下的最佳抓取策略。 2. **强化学习的应用**：强化学习是深度学习的一个分支，适用于解决与环境交互的问题。在机器人抓取中，机器人可能会通过不断试错，即反复抓取并根据结果调整策略，以学习最佳的抓取行为。 3. **模型预测控制**：可能涉及到利用深度学习预测未来状态，从而提前规划抓取动作，提高成功率和效率。 4. **鲁棒性研究**：考虑到实际环境的不确定性，如物体位置变化、传感器噪声等，研究可能探讨了如何使学习到的策略具备一定的鲁棒性，即在面对这些不确定因素时仍能保持稳定性能。 5. **实验与评估**：报告可能包含了实际机器人平台上的实验，验证了所提出的深度学习方法在真实世界抓取任务中的效果。 6. **未来工作展望**：除了介绍已完成的研究，作者可能还提出了进一步改进或扩展研究的潜在方向，比如如何将这种方法应用到更复杂的环境中，或者与其他AI技术结合以增强机器人的自主决策能力。总体而言，这篇报告对那些希望了解深度学习如何提升机器人抓取能力的读者来说是一份宝贵的资源。它不仅深入探讨了理论框架，还可能提供了实际操作的指导，有助于推动机器人技术在工业、服务和家用场景中的广泛应用。"

资源详情

资源推荐

4.1 Graphical model for robust parallel-jaw grasping of objects on a table surface

based on point clouds. Object shapes O are uniformly distributed over a discrete

set of object models and object poses T

are distributed over the object’s stable

poses and a bounded region of a planar surface. Grasps u = (p, ϕ) are sampled

uniformly from the object surface using antipodality constraints. Given a coef-

ﬁcient of friction γ, we evaluate an analytic reward metric R for a grasp on an

object. A synthetic 2.5D point cloud y is generated from 3D meshes based on the

camera C in pose T

and is corrupted with multiplicative and Gaussian Process

noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Dex-Net 2.0 pipeline for training dataset generation. (Left) The database contains

1,500 3D object mesh models. (Top) For each object, we sample hundreds of

parallel-jaw grasps to cover the surface and evaluate robust analytic grasp metrics

using sampling. For each stable pose of the object we associate a set of grasps

that are perpendicular to the table and collision-free for a given gripper model.

(Bottom) We also render point clouds of each object in each stable pose, with the

planar object pose and camera pose sampled uniformly at random. Every grasp

for a given stable pose is associated with a pixel location and orientation in the

rendered image. (Right) Each image is rotated, translated, cropped, and scaled

to align the grasp pixel location with the image center and the grasp axis with

the middle row of the image, creating a 32 × 32 grasp image. The full dataset

contains over 6.7 million grasp images. . . . . . . . . . . . . . . . . . . . . . . . 52

4.3 Architecture of the Grasp Quality Convolutional Neural Network (GQ-CNN).

(Left) Planar grasp candidates u = (i, j, ϕ, z) are generated from a depth image

and transformed to align the image with the grasp center pixel (i, j) and ori-

entation ϕ. The architecture contains four convolutional layers in pairs of two

separated by ReLU nonlinearities followed by 3 fully connected layers and a sep-

arate input layer for the z, the distance of the gripper from the camera. The use

of convolutional layers was motivated by the relevance of depth edges as features

for learning in previous research [10, 96, 104] and the use of ReLUs was moti-

vated by image classiﬁcation results [88]. The network estimates the probability

of grasp success (robustness) Q

∈ [0, 1], which can be used to rank grasp candi-

dates. (Right) The ﬁrst layer of convolutional ﬁlters learned by the GQ-CNN on

Dex-Net 2.0. The ﬁlters appear to compute oriented image gradients at various

scales, which may be useful for inferring contact normals and collisions between

the gripper and object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4 Dex-Net 2.0 Architecture. (Center) The Grasp Quality Convolutional Neural

Network (GQ-CNN) is trained oﬄine to predict the robustness candidate grasps

from depth images using a dataset of 6.7 million synthetic point clouds, grasps,

and associated robust grasp metrics computed with Dex-Net 1.0. (Left) When

an object is presented to the robot, a depth camera returns a 3D point cloud,

where pairs of antipodal points identify a set of several hundred grasp candidates.

(Right) The GQ-CNN rapidly determines the most robust grasp candidate, which

is executed with the ABB YuMi robot. . . . . . . . . . . . . . . . . . . . . . . 56

4.5 Grasp robustness predicted by a Grasp Quality Convolutional Neural Network

(GQ-CNN) trained with Dex-Net 2.0 over the space of depth images and grasps

for a single point cloud collected with a Primesense Carmine. (Left) As the center

of the gripper moves from the top to the bottom of the image the GQ-CNN

prediction stays near zero and spikes on the most robust grasp (Right), for which

the gripper ﬁts into a small opening on the object surface. This suggests that the

GQ-CNN has learned a detailed representation of the collision space between the

object and gripper. Furthermore, the sharp spike suggests that it may be diﬃcult

to plan robust grasps by randomly sampling grasps in image space. We consider

planning the most robust grasp using the cross-entropy method on the GQ-CNN

response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.6 Example input color images and maps of the grasp robust estimated by the GQ-

CNN over grasp centers for a constant grasp axis angle in image space and height

above the table, with the grasp planned by our CEM-based robust grasping policy

shown in black. CEM is able to ﬁnd precise robust grasping locations encoded by

the GQ-CNN that are very close to the global maximum for the given grasp axis

and height. The GQ-CNN also appears to assign non-zero robustness to several

grasps that completely miss the object. This is likely because no such grasps are

in the training set, and future work could augment the training dataset to avoid

these grasps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.7 Experimental setup for benchmarking grasping with the ABB YuMi. (Left) In

each trial a human operator sampled an object pose by shaking the object in a

box and placing it upside down in the workspace. Then RGB-D image was taken

with a Primsense Carmine 1.08, the image was processed using inpainting [70],

and the object was segmented using color background subtraction. The grasp

planner under evaluation then planned a gripper pose and the YuMi executed

the grasp. Grasps were considered successful if the gripper held the object after

lifting, transporting, and shaking the object. (Top-Right) The training set of 8

objects with adversarial geometric features such as smooth curved surfaces and

narrow openings for grasping known objects. (Bottom-Right) The test set of

10 household objects not seen during training. The dataset was selected to test

performance on challenging objects of varying material, geometry, and surface

reﬂectance properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

xii

4.8 Receiver operating characteristic comparing the performance of learning models

on Adv-Synth. The GQ-CNN models all perform similarly and have a signiﬁcantly

higher true positive rate when compared to ML-RF and ML-SVM. . . . . . . . . 65

4.9 Experimental setup for evaluating the Dex-Net 2.0 in novel scenarios. (Left) The

test set of 40 household objects used for evaluating the generalization performance

of the Dex-Net 2.0 grasp planner. The dataset contains rigid, articulated, and

deformable objects. (Right) The experimental setup for order fulﬁllment with the

ABB YuMi. The goal is to grasp and transport three target objects to a shipping

container (box on right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.10 Examples of failed grasps planned using the GQ-CNN from Dex-Net 2.0. The

most common failure modes were related to: (left) missing sensor data for an

important part of the object geometry, such as thin parts of the object surface,

and (right) collisions with the object that are misclassiﬁed as robust. . . . . . . 68

4.11 Visualization of t-SNE for the GQ-CNN on the Dex-Net 2.0 validation set il-

lustrating the separation of positive (blue) and negative (red) examples. The

network appears to start separating the positive and negative grasps and images

in the fc4 layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.13 Visualization of the maximum activations from the Dex-Net 2.0 validation set

for a selection of three neurons from the conv2

2 layer of GQ-CNN. The neurons

appear to respond to oriented parallel lines and circular patterns. . . . . . . . . 69

4.12 Visualization of t-SNE for the fc4 response of GQ-CNN to a set of 200 synthetic

datapoints (yellow) and 200 datapoints collected from a physical robot system

(green). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.1 Overview of the Dex-Net 2.1 POMDP model and simulator. We sample from the

initial state distribution ρ

by uniformly sampling m 3D CAD object models from

a dataset and dropping them in random poses in the pybullet dynamic simula-

tor [26] to form a heap. The state x

includes object shapes and poses in the heap.

We generate demonstrations of robot grasping using an algorithmic supervisor Ω

from Dex-Net 2.0 [105] that indexes the most robust collision-free parallel-jaw

grasp u

from a pre-planned grasp database using knowledge of the full state. We

aggregate synthetic point cloud observations y

and collected rewards R

to form

a labeled dataset for training a policy that classiﬁes the supervisor’s actions on

the partial observations using imitation learning. We preprocess training data by

transforming the point clouds to align the grasp center and axis with the center

pixel and middle row to improve GQ-CNN classiﬁcation performance [96, 105]. . 73

5.2 Precision-recall curve for the top four machine learning models on a ﬁxed vali-

dation subset of the Dex-Net 2.1  = 0.9 dataset containing approximately 20k

datapoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

xiii

5.3 Experimental setup for benchmarking bin picking policies. (Left) For each exper-

iment, a subset of N validation objects are randomly dropped into a bin (green

rim, center), at which point the YuMi iteratively plans grasps from point clouds

and attempts to lift and transport the objects to a packing box (blue rim, right

side).(Middle) A set of 50 test objects with various shapes, sizes, and material

properties. A subset of 25 are rigid and opaque, and 25 others have transparency

(e.g. goggles), moving parts (e.g. can opener), or deformable material (e.g.

cloth). (Right) Example color and depth images from the physical setup with

example grasp planned with the Dex-Net 2.1  = 0.9 policy. . . . . . . . . . . . 79

5.4 Example of grasp planning with the Dex-Net 2.1  = 0.9 robust grasping policy

on a heap of novel objects from the test set. The iterations (left to right) show

the set of grasps sampled during the progression of the Cross Entropy Method

for grasp optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.5 t-SNE embedding of the ﬁrst 100 principal components of fc4 features for the val-

idation subset of the Dex-Net 2.1 ( = 0.9) dataset (best viewed in color). (Left)

The embedding for features from the original Dex-Net 2.0 GQ-CNN. (Right) The

GQ-CNN ﬁne-tuned on Dex-Net 2.1 ( = 0.9). Each datapoint shows the rotated

and translated depth image that is input to the to GQ-CNN (see [105]). Images

corresponding to actions taken by the algorithmic supervisor are outlined in green

while images corresponding to failed random actions are outlined in red. . . . . 83

5.6 Comparison of grasps planned by the Dex-Net 2.0 (singulated object) and Dex-

Net 2.1 (bin picking) grasping policies. . . . . . . . . . . . . . . . . . . . . . . . 84

6.1 Complete and energy-bounded cages. Left: a complete cage. The blue object is

constrained to a bounded component of the free conﬁguration space by the rigid

arrangement of the two gripper ﬁngers (black). Middle and right ﬁgure: Two

energy-bounded cages with respect to a force direction f e.g. from gravity or

constant velocity pushing with Coulomb friction. The blue object is constrained

by both the gripper and the force ﬁeld. The rightmost conﬁguration requires

more energy to escape than the middle conﬁguration. . . . . . . . . . . . . . . . 90

6.2 Deﬁnition of caging and energy-bounded caging. The top row depicts gripper

jaws G (in black) and an object O (in blue) in three conﬁgurations. The bot-

tom row illustrates conceptually the corresponding point q

∈ SE(2) in conﬁg-

uration space. While a complete cage corresponds to an initial pose q

com-

pletely enclosed by forbidden space Z, the energy-bounded cage on the right

instead correpsonds to a case where q

is enclosed by Z

= Z ∪ U(q

, u) where

U(q

, u) = {q ∈ C : U(q, q

) > u} for U that is strictly increasing with increasing

vertical coordinate. The smallest value of u such that q

is not enclosed is called

the minimum escape energy, u

∗

. . . . . . . . . . . . . . . . . . . . . . . . . . . 92

xiv

6.3 Simplicial complex approximation of conﬁguration space. (Left) We sample a

set of poses Q and their penetration depth. (Right) An approximation of the

forbidden space Z ⊂ SE(2) from Fig. 6.2 by unions of balls around sampled

points Q results in an α-shape simplicial complex A(X) (gray triangles) that is

a subset of Z. The triangles of the weighted Delaunay triangulation D(X) that

are not in A(X) approximate the free space (red triangles). . . . . . . . . . . . . 93

6.4 Persistence diagram for ranking energy-bounded cages. Left: polygonal part and

gripper polygons serve as input. We sample object poses X in collision and

generate an α-shape representation (shown in gray in the three middle ﬁgures).

Given an energy potential, we insert simplices in D(X)−A(X) in decreasing order

of energy potential, creating a ﬁltration of simplicial complexes. Voids (yellow

and orange) are born with the addition of edges σ

and σ

(red) at threshold

potential levels p

and p

respectively, and die with the additions of the last

triangle in each void at potential p

(red). The associated second persistence

diagram reveals voids corresponding to energy-bounded cages. In particular,

conﬁguration q

is persistent for a larger energy diﬀerence than conﬁguration q

(right ﬁgure). The escape energy of each conﬁguration is equal to the diﬀerence

in potentials: u

= p

− p

and u

= p

− p

, and by the ﬁltration ordering this

implies that q

has higher escape energy than q

. . . . . . . . . . . . . . . . . . 95

6.5 Highest energy conﬁgurations and push directions synthesized using EBCS-2D

ranked from left to right for seven example polygonal objects (blue) and grippers

(black) under a linear planar pushing energy ﬁeld with a push force bound of

= 1.0. Displayed are three objects for each of the following grippers: (left-to-

right, top-to-bottom) parallel-jaw grippers with rectangular jaws, a Barrett hand

with ﬁxed preshape, a Zymark Zymate gripper with ﬁxed opening width, and

a four ﬁnger disc gripper. Below each object the escape energy ˆu estimated by

EBCS-2D using s = 200, 000 pose samples, which is the distance the object would

have to travel against the pushing direction, and to the right is the synthesized

push energy direction f. For each test case we searched over 5 energy directions

from −

and checked push reachability as described in Section 6.2.3 except

for the four ﬁnger gripper, for which we ran only EBCS-2D to illustrate complete

cages. EBCA-2D synthesizes several complete cages for the four ﬁnger gripper. 99

剩余209页未读，继续阅读

zhangjing0376

粉丝: 0
资源: 3

深度学习驱动的机器人稳健抓取策略研究

Efficient_goal-oriented_push-grasping_synergy

Incremental Learning for Robust Visual Tracking.

推荐几篇增量学习领域比较优秀的文献

python 英文评论数据同义词替换

Robust Std. Err.是Huber-White 标准误的结果

Closed-loop Rescheduling using Deep Reinforcement Learning

读 learning to reweight examples for Robust Deep learning 有感

pdf robust optimization. princeton university press, 2009

推荐30个bert压缩模型

用英语并使用sci期刊的语言风格来说Vue.js的概念和作用

Confounding-robust policy improvement的主要方法

.cache\\huggingface\\hub\\models--audeering--wav2vec2-large-robust-12-ft-emotion-msp-dim\\refs\\main'

pthread_mutex_lock.c:450: __pthread_mutex_lock_full: Assertion `e != ESRCH || !robust' failed.

thread_mutex_lock.c:428: __pthread_mutex_lock_full: Assertion `e != ESRCH || !robust' failed.

Combining Prior Knowledge and Data for Robust Controller Design

Android Studio for Platform

spring boot

matlab优化算法 100例

最新资源