基于卷积神经网络的实时抓取检测

需积分: 50 64 浏览量更新于2024-09-10 收藏 1.14MB PDF 举报

"实时抓取检测使用卷积神经网络" 本文主要探讨了利用卷积神经网络进行实时抓取检测的方法，特别是在机械臂应用中的创新技术。机械臂在工业自动化、物流和机器人领域扮演着重要角色，而高效准确的抓取检测是实现其自主操作的关键。作者Joseph Redmon和Anelia Angelova提出了一种基于深度学习的新方法，该方法能够实时预测物体的可抓取边界框，无需传统的滑动窗口或区域提议技术。这一单阶段回归模型不仅提高了检测精度，而且运行速度快，能够在GPU上实现每秒13帧的处理速度，这在实时环境中具有显著优势。传统的抓取检测方法往往依赖于对图像进行逐窗口分析或先验生成的区域提议，这些方法计算量大，效率较低。而Redmon和Angelova的网络模型通过端到端学习，直接预测出物体的抓取边界框，减少了中间步骤，提升了整体性能。与现有最先进的方法相比，该模型的检测精度提高了14个百分点，显示出优越的性能。此外，研究者还改进了模型，使其能同时进行分类任务，即在识别物体的同时找到合适的抓取矩形。这种集成化的处理方式大大简化了操作流程，提高了机器人执行任务的效率。进一步地，他们设计了一个局部约束预测机制，使得模型能够预测每个物体的多个可行抓取点。对于可以以多种方式进行抓取的物体，这个改进模型表现得尤为出色。局部约束预测能够确保提出的抓取策略多样化且适应性强，从而提高抓取的成功率。总结起来，这篇论文介绍了卷积神经网络在实时机械臂抓取检测中的最新进展。通过高效的网络设计和优化，不仅提高了检测精度，还实现了快速响应，为实际应用中的机器人提供了强大的视觉感知能力。这项工作对于推动机器人智能和自主性的发展具有重要意义，并可能启发更多的研究者探索深度学习在机器人控制和感知领域的应用。

Real-Time Grasp Detection Using Convolutional Neural Networks

Joseph Redmon

, Anelia Angelova

Abstract— We present an accurate, real-time approach to

robotic grasp detection based on convolutional neural networks.

Our network performs single-stage regression to graspable

bounding boxes without using standard sliding window or

region proposal techniques. The model outperforms state-of-

the-art approaches by 14 percentage points and runs at 13

frames per second on a GPU. Our network can simultaneously

perform classiﬁcation so that in a single step it recognizes the

object and ﬁnds a good grasp rectangle. A modiﬁcation to this

model predicts multiple grasps per object by using a locally

constrained prediction mechanism. The locally constrained

model performs signiﬁcantly better, especially on objects that

can be grasped in a variety of ways.

I. INTRODUCTION

Perception—using the senses (or sensors if you are a

robot) to understand your environment—is hard. Visual per-

ception involves mapping pixel values and light information

onto a model of the universe to infer your surroundings. Gen-

eral scene understanding requires complex visual tasks such

as segmenting a scene into component parts, recognizing

what those parts are, and disambiguating between visually

similar objects. Due to these complexities, visual perception

is a large bottleneck in real robotic systems.

General purpose robots need the ability to interact with

and manipulate objects in the physical world. Humans see

novel objects and know immediately, almost instinctively,

how they would grab them to pick them up. Robotic grasp

detection lags far behind human performance. We focus on

the problem of ﬁnding a good grasp given an RGB-D view

of the object.

We evaluate on the Cornell Grasp Detection Dataset, an

extensive dataset with numerous objects and ground-truth

labelled grasps (see Figure 1). Recent work on this dataset

runs at 13.5 seconds per frame with an accuracy of 75 percent

[1] [2]. This translates to a 13.5 second delay between a robot

viewing a scene and ﬁnding where to move its grasper.

The most common approach to grasp detection is a sliding

window detection framework. The sliding window approach

uses a classiﬁer to determine whether small patches of an

image constitute good grasps for an object in that image. This

type of system requires applying the classiﬁer to numerous

places on the image. Patches that score highly are considered

good potential grasps.

We take a different approach; we apply a single network

once to an image and predict grasp coordinates directly. Our

network is comparatively large but because we only apply

it once to an image we get a massive performance boost.

University of Washington

Google Research

Fig. 1. The Cornell Grasping Dataset contains a variety of objects, each

with multiple labelled grasps. Grasps are given as oriented rectangles in

2-D.

Instead of looking only at local patches our network uses

global information in the image to inform its grasp predic-

tions, making it signiﬁcantly more accurate. Our network

achieves 88 percent accuracy and runs at real-time speeds

(13 frames per second). This redeﬁnes the state-of-the-art

for RGB-D grasp detection.

II. RELATED WORK

Signiﬁcant past work uses 3-D simulations to ﬁnd good

grasps [3] [4] [5] [6] [7]. These approaches are powerful but

rely on a full 3-D model and other physical information about

an object to ﬁnd an appropriate grasp. Full object models are

often not known a priori. General purpose robots may need

to grasp novel objects without ﬁrst building complex 3-D

models of the object.

Robotic systems increasingly leverage RGB-D sensors and

data for tasks like object recognition [8], detection [9] [10],

and mapping [11] [12]. RGB-D sensors like the Kinect are

cheap, and the extra depth information is invaluable for

robots that interact with a 3-D environment.

Recent work on grasp detection focusses on the problem

2015 IEEE International Conference on Robotics and Automation (ICRA)

Washington State Convention Center

Seattle, Washington, May 26-30, 2015

下载后可阅读完整内容，剩余6页未读，立即下载

fengfeng11246

粉丝: 3

基于卷积神经网络的实时抓取检测

最新资源