Fast R-CNN:高效深度卷积网络对象检测

需积分: 14 91 浏览量更新于2024-09-06 收藏 714KB PDF 举报

"Girshick在2015年提出的Fast R-CNN是深度学习领域中目标检测技术的重要进步，该方法显著提升了检测速度和精度。Fast R-CNN基于区域卷积网络（Region-based Convolutional Networks，RCNN）系列的工作，通过一系列创新优化了训练和测试效率。" 在《Fast R-CNN》这篇论文中，作者Ross Girshick提出了一种更快、更准确的目标检测算法。Fast R-CNN主要解决了之前RCNN方法中存在的速度慢和计算资源消耗大的问题。与之前的RCNN相比，Fast R-CNN在训练VGG16网络时速度提高了9倍，在测试阶段的速度提高了213倍，并且在PASCAL VOC 2012数据集上的平均精度（mAP）表现更优。同时，相比于SPP-net，Fast R-CNN在训练速度上快了3倍，测试速度提升了10倍，而且精度更高。 Fast R-CNN的核心创新点包括： 1. **End-to-end Training**：Fast R-CNN首次实现了整个目标检测系统的端到端训练，这使得网络可以直接从原始图像输入进行学习，无需像RCNN那样先生成物体候选框再进行分类。 2. **RoI（Region of Interest）Pooling Layer**：引入RoI池化层，该层可以将不同大小和比例的区域转换为固定尺寸的特征映射，从而使得后续的全连接层可以处理这些特征。这一改进极大地提高了计算效率。 3. **Sharing Convolutional Features**：在所有候选框上共享卷积层的计算，避免了对每个候选框重复计算，大大减少了计算量。 4. **Multi-task Loss Function**：Fast R-CNN结合了分类损失和边框回归损失，同时优化物体类别预测和位置估计，提升了检测精度。此外，Fast R-CNN的实现基于Python和C++，利用Caffe框架，并且以MIT开源许可证发布，便于其他研究者使用和进一步开发。 Fast R-CNN是目标检测领域的一个重要里程碑，它不仅提高了检测速度，还通过优化训练策略和模型结构，提升了检测精度，为后来的YOLO、Faster R-CNN等更先进的目标检测算法奠定了基础。

Fast R-CNN

Ross Girshick

Microsoft Research

rbg@microsoft.com

Abstract

This paper proposes a Fast Region-based Convolutional

Network method (Fast R-CNN) for object detection. Fast

R-CNN builds on previous work to efﬁciently classify ob-

ject proposals using deep convolutional networks. Com-

pared to previous work, Fast R-CNN employs several in-

novations to improve training and testing speed while also

increasing detection accuracy. Fast R-CNN trains the very

deep VGG16 network 9× faster than R-CNN, is 213× faster

at test-time, and achieves a higher mAP on PASCAL VOC

2012. Compared to SPPnet, Fast R-CNN trains VGG16 3×

faster, tests 10× faster, and is more accurate. Fast R-CNN

is implemented in Python and C++ (using Caffe) and is

available under the open-source MIT License at https:

//github.com/rbgirshick/fast-rcnn.

1. Introduction

Recently, deep ConvNets [14, 16] have signiﬁcantly im-

proved image classiﬁcation [14] and object detection [9, 19]

accuracy. Compared to image classiﬁcation, object detec-

tion is a more challenging task that requires more com-

plex methods to solve. Due to this complexity, current ap-

proaches (e.g., [9, 11, 19, 25]) train models in multi-stage

pipelines that are slow and inelegant.

Complexity arises because detection requires the ac-

curate localization of objects, creating two primary chal-

lenges. First, numerous candidate object locations (often

called “proposals”) must be processed. Second, these can-

didates provide only rough localization that must be reﬁned

to achieve precise localization. Solutions to these problems

often compromise speed, accuracy, or simplicity.

In this paper, we streamline the training process for state-

of-the-art ConvNet-based object detectors [9, 11]. We pro-

pose a single-stage training algorithm that jointly learns to

classify object proposals and reﬁne their spatial locations.

The resulting method can train a very deep detection

network (VGG16 [20]) 9× faster than R-CNN [9] and 3×

faster than SPPnet [11]. At runtime, the detection network

processes images in 0.3s (excluding object proposal time)

while achieving top accuracy on PASCAL VOC 2012 [7]

with a mAP of 66% (vs. 62% for R-CNN).

1.1. R-CNN and SPPnet

The Region-based Convolutional Network method (R-

CNN) [9] achieves excellent object detection accuracy by

using a deep ConvNet to classify object proposals. R-CNN,

however, has notable drawbacks:

1. Training is a multi-stage pipeline. R-CNN ﬁrst ﬁne-

tunes a ConvNet on object proposals using log loss.

Then, it ﬁts SVMs to ConvNet features. These SVMs

act as object detectors, replacing the softmax classi-

ﬁer learnt by ﬁne-tuning. In the third training stage,

bounding-box regressors are learned.

2. Training is expensive in space and time. For SVM

and bounding-box regressor training, features are ex-

tracted from each object proposal in each image and

written to disk. With very deep networks, such as

VGG16, this process takes 2.5 GPU-days for the 5k

images of the VOC07 trainval set. These features re-

quire hundreds of gigabytes of storage.

3. Object detection is slow. At test-time, features are

extracted from each object proposal in each test image.

Detection with VGG16 takes 47s / image (on a GPU).

R-CNN is slow because it performs a ConvNet forward

pass for each object proposal, without sharing computation.

Spatial pyramid pooling networks (SPPnets) [11] were pro-

posed to speed up R-CNN by sharing computation. The

SPPnet method computes a convolutional feature map for

the entire input image and then classiﬁes each object pro-

posal using a feature vector extracted from the shared fea-

ture map. Features are extracted for a proposal by max-

pooling the portion of the feature map inside the proposal

into a ﬁxed-size output (e.g., 6 × 6). Multiple output sizes

are pooled and then concatenated as in spatial pyramid pool-

ing [15]. SPPnet accelerates R-CNN by 10 to 100× at test

time. Training time is also reduced by 3× due to faster pro-

posal feature extraction.

All timings use one Nvidia K40 GPU overclocked to 875 MHz.

arXiv:1504.08083v2 [cs.CV] 27 Sep 2015

下载后可阅读完整内容，剩余8页未读，立即下载

zjn.ai

粉丝: 40

Fast R-CNN:高效深度卷积网络对象检测

目标检测.pdf

Faster RCNN.pdf

Faster r-cnn代码解析.pdf

FAST R-CNN Ross Girshick 2015文献

Fast R-CNN综述

R-CNN: Rich feature hierarchies for accurate object detection and semantic segmentation by Ross Girshick, et al. (2014)在哪里下载

fast r-cnn

faster rcnn

Faster R-CNN是什么？

Faster R-CNN算法分析

最新资源