R-CNN算法：深度学习目标检测新突破

需积分: 0 123 浏览量更新于2024-07-01 收藏 37.94MB PDF 举报

"R-CNN-孙超1" 这篇技术报告主要介绍了R-CNN（Region-based Convolutional Neural Networks）算法，这是一种用于精确目标检测和语义分割的深度学习方法。R-CNN由Ross Girshick、Jeff Donahue、Trevor Darrell和Jitendra Malik等人在加州大学伯克利分校提出，它显著提高了在PASCAL VOC数据集上的对象检测性能。在过去几年里，尽管目标检测技术已经取得了很大进步，但在PASCAL VOC这一标准数据集上的表现已经趋于平稳。大多数最佳的方法是复杂的集成系统，这些系统通常结合了多种低级图像特征和高级上下文信息。R-CNN提出了一种简单且可扩展的检测算法，相比于之前在VOC2012上最好的结果，提升了超过30%的平均精度（mAP），实现了53.3%的mAP。 R-CNN的核心思想包括两个关键洞察： 1. 应用高容量的卷积神经网络（CNNs）到自下而上的区域提议，以便定位和分割物体。这种方法利用了CNN的强大特征提取能力，可以更准确地识别和分离目标。 2. 在标注训练数据稀缺的情况下，通过监督预训练进行辅助任务的学习，随后针对特定领域进行微调，可以显著提升性能。这种方法利用迁移学习，先在大规模数据集（如ImageNet）上预训练模型，然后在目标检测任务上进行适应性调整。由于R-CNN将区域提议与CNN特征相结合，因此得名“带有CNN特征的区域”（Regions with CNN features）。这种方法首先使用选择性搜索等技术生成可能包含目标的候选区域，然后对每个区域应用预训练的CNN进行特征提取。提取的特征用于支持向量机（SVM）或其他分类器，以判断区域内是否包含目标以及其类别。最后，通过边界框回归进一步细化目标的位置。 R-CNN的出现标志着深度学习在目标检测领域的重大突破，它为后续的Fast R-CNN、Faster R-CNN和YOLO等高效目标检测框架奠定了基础。然而，R-CNN自身也存在一些局限性，比如计算效率低，因为需要对每个区域提案独立运行CNN。为了解决这个问题，后续的工作提出了优化方案，如使用共享的卷积层来减少计算量，以及引入区域提议网络（Region Proposal Network）直接在CNN内部生成区域提案，从而显著提高了检测速度。 R-CNN是深度学习在计算机视觉领域的一个里程碑，它展示了CNN在目标检测和语义分割中的强大潜力，并推动了该领域的持续发展。

VOC 2010 test aero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv mAP

DPM v5 [20]

†

49.2 53.8 13.1 15.3 35.5 53.4 49.7 27.0 17.2 28.8 14.7 17.8 46.4 51.2 47.7 10.8 34.2 20.7 43.8 38.3 33.4

UVA [39] 56.2 42.4 15.3 12.6 21.8 49.3 36.8 46.1 12.9 32.1 30.0 36.5 43.5 52.9 32.9 15.3 41.1 31.8 47.0 44.8 35.1

Regionlets [41] 65.0 48.9 25.9 24.6 24.5 56.1 54.5 51.2 17.0 28.9 30.2 35.8 40.2 55.7 43.5 14.3 43.9 32.6 54.0 45.9 39.7

SegDPM [18]

†

61.4 53.4 25.6 25.2 35.5 51.7 50.6 50.8 19.3 33.8 26.8 40.4 48.3 54.4 47.1 14.8 38.7 35.0 52.8 43.1 40.4

R-CNN 67.1 64.1 46.7 32.0 30.5 56.4 57.2 65.9 27.0 47.3 40.9 66.6 57.8 65.9 53.6 26.7 56.5 38.1 52.8 50.2 50.2

R-CNN BB 71.8 65.8 53.0 36.8 35.9 59.7 60.0 69.9 27.9 50.6 41.4 70.0 62.0 69.0 58.1 29.5 59.4 39.3 61.2 52.4 53.7

Table 1: Detection average precision (%) on VOC 2010 test. R-CNN is most directly comparable to UVA and Regionlets since all

methods use selective search region proposals. Bounding-box regression (BB) is described in Section C. At publication time, SegDPM

was the top-performer on the PASCAL VOC leaderboard.

†

DPM and SegDPM use context rescoring not used by the other methods.

0 20 40 60 80 100

UIUC−IFP

Delta

GPU_UCLA

SYSU_Vision

Toronto A

*OverFeat (1)

*NEC−MU

UvA−Euvision

*OverFeat (2)

*R−CNN BB

mean average precision (mAP) in %

ILSVRC2013 detection test set mAP

1.0%

6.1%

9.8%

10.5%

11.5%

19.4%

20.9%

22.6%

24.3%

31.4%

competition result

post competition result

100

*R−CNN BB

UvA−Euvision

*NEC−MU

*OverFeat (1)

Toronto A

SYSU_Vision

GPU_UCLA

Delta

UIUC−IFP

average precision (AP) in %

ILSVRC2013 detection test set class AP box plots

Figure 3: (Left) Mean average precision on the ILSVRC2013 detection test set. Methods preceeded by * use outside training data

(images and labels from the ILSVRC classiﬁcation dataset in all cases). (Right) Box plots for the 200 average precision values per

method. A box plot for the post-competition OverFeat result is not shown because per-class APs are not yet available (per-class APs for

R-CNN are in Table 8 and also included in the tech report source uploaded to arXiv.org; see R-CNN-ILSVRC2013-APs.txt). The red

line marks the median AP, the box bottom and top are the 25th and 75th percentiles. The whiskers extend to the min and max AP of each

method. Each AP is plotted as a green dot over the whiskers (best viewed digitally with zoom).

1.0 1.0 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9

1.0 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.6 0.6

1.0 0.8 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.6 0.6

1.0 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7

1.0 1.0 0.9 0.9 0.9 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8

1.0 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7

Figure 4: Top regions for six pool

units. Receptive ﬁelds and activation values are drawn in white. Some units are aligned to concepts,

such as people (row 1) or text (4). Other units capture texture and material properties, such as dot arrays (2) and specular reﬂections (6).

剩余20页未读，继续阅读

AIAlchemist

粉丝: 1008
资源: 304

R-CNN算法：深度学习目标检测新突破

逆变系统负载调整率优化策略对比研究

自适应RSS室内定位：基于两步字典学习的方法

Se与NPB共混薄膜：光电特性和载流子迁移率研究

Fast-RCNN-孙超1

You Only Look Once- Unified, Real-Time Object Detection-孙超1

OHEM-孙超1

参考资料-大功率LLC谐振变换器中谐振电感的优化研究-孙超.zip

基于深度学习的软件实体识别方法_孙超.caj

物资租赁系统+孙超+丁亮+方文博+王晨旭.zip

阵列信号处理--授课老师：廖桂生

最新资源