xi
4.4 Dex-Net 2.0 Architecture. (Center) The Grasp Quality Convolutional Neural
Network (GQ-CNN) is trained offline to predict the robustness candidate grasps
from depth images using a dataset of 6.7 million synthetic point clouds, grasps,
and associated robust grasp metrics computed with Dex-Net 1.0. (Left) When
an object is presented to the robot, a depth camera returns a 3D point cloud,
where pairs of antipodal points identify a set of several hundred grasp candidates.
(Right) The GQ-CNN rapidly determines the most robust grasp candidate, which
is executed with the ABB YuMi robot. . . . . . . . . . . . . . . . . . . . . . . 56
4.5 Grasp robustness predicted by a Grasp Quality Convolutional Neural Network
(GQ-CNN) trained with Dex-Net 2.0 over the space of depth images and grasps
for a single point cloud collected with a Primesense Carmine. (Left) As the center
of the gripper moves from the top to the bottom of the image the GQ-CNN
prediction stays near zero and spikes on the most robust grasp (Right), for which
the gripper fits into a small opening on the object surface. This suggests that the
GQ-CNN has learned a detailed representation of the collision space between the
object and gripper. Furthermore, the sharp spike suggests that it may be difficult
to plan robust grasps by randomly sampling grasps in image space. We consider
planning the most robust grasp using the cross-entropy method on the GQ-CNN
response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6 Example input color images and maps of the grasp robust estimated by the GQ-
CNN over grasp centers for a constant grasp axis angle in image space and height
above the table, with the grasp planned by our CEM-based robust grasping policy
shown in black. CEM is able to find precise robust grasping locations encoded by
the GQ-CNN that are very close to the global maximum for the given grasp axis
and height. The GQ-CNN also appears to assign non-zero robustness to several
grasps that completely miss the object. This is likely because no such grasps are
in the training set, and future work could augment the training dataset to avoid
these grasps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.7 Experimental setup for benchmarking grasping with the ABB YuMi. (Left) In
each trial a human operator sampled an object pose by shaking the object in a
box and placing it upside down in the workspace. Then RGB-D image was taken
with a Primsense Carmine 1.08, the image was processed using inpainting [70],
and the object was segmented using color background subtraction. The grasp
planner under evaluation then planned a gripper pose and the YuMi executed
the grasp. Grasps were considered successful if the gripper held the object after
lifting, transporting, and shaking the object. (Top-Right) The training set of 8
objects with adversarial geometric features such as smooth curved surfaces and
narrow openings for grasping known objects. (Bottom-Right) The test set of
10 household objects not seen during training. The dataset was selected to test
performance on challenging objects of varying material, geometry, and surface
reflectance properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61