"深度学习在通用目标检测中的应用综述"

需积分: 0 144 浏览量更新于2024-01-04 收藏 4.14MB PDF 举报

基于文献综述2-1809.021651，本文探讨了通用物体检测的深度学习方法。通用物体检测是计算机视觉领域中一个最为基础且具有挑战性的问题，其目标是从大量预定义类别中在自然图像中定位物体实例。近年来，深度学习技术作为一种强大的学习特征表示的方法而兴起。文章开篇先介绍了通用物体检测的背景和意义，指出该问题的复杂性和应用前景。然后详细讨论了现有的通用物体检测方法，特别侧重于深度学习技术的应用。作者提到，深度学习技术通过多层神经网络的构建和训练，可以自动学习图像中的细节和特征，从而在通用物体检测任务中取得了显著的成果。接下来，文章对深度学习方法在通用物体检测中的应用进行了详细介绍。作者首先介绍了深度学习技术的基本原理和常用模型，如卷积神经网络（CNN）和循环神经网络（RNN）。然后，作者对目标检测中的常见问题进行了讨论，并介绍了一些主流的深度学习模型和算法，如Faster R-CNN、YOLO和SSD等。这些方法在不同的评价指标下都展现了出色的性能，验证了深度学习在通用物体检测中的潜力。此外，文章还对目标检测数据集和评价指标进行了介绍。作者指出，数据集的质量和多样性对于深度学习模型的训练至关重要，并列举了一些常用的数据集，如PASCAL VOC和COCO。另外，文章还介绍了一些常用的评价指标，如准确率、召回率和平均精度均值（mAP）等，用来评估通用物体检测算法的性能。最后，文章总结了目前深度学习在通用物体检测中的应用情况，并提出了一些未来的研究方向。作者认为，随着深度学习技术的不断发展和模型的进一步优化，通用物体检测的性能还有很大的提升空间。未来的研究可以集中在解决一些挑战性问题上，如小目标检测、遮挡目标检测和实时目标检测等。综上所述，本文通过对通用物体检测的深度学习方法进行综述，详细介绍了目前主流的深度学习模型和算法，以及数据集和评价指标。通过对现有研究的总结和对未来发展的展望，本文为通用物体检测领域的研究者提供了重要的参考和指导，推动了该领域的进一步发展。

6 Li Liu et al.

The research community has started moving towards the chal-

lenging goal of building general purpose object detection systems

whose ability to detect many object categories matches that of hu-

mans. This is a major challenge: according to cognitive scientists,

human beings can identify around 3,000 entry level categories and

30,000 visual categories overall, and the number of categories dis-

tinguishable with domain expertise may be on the order of 10

[14]. Despite the remarkable progress of the past years, designing

an accurate, robust, efﬁcient detection and recognition system that

approaches human-level performance on 10

− 10

categories is

undoubtedly an open problem.

3 Frameworks

There has been steady progress in object feature representations

and classiﬁers for recognition, as evidenced by the dramatic change

from handcrafted features [213, 42, 55, 76, 212] to learned DCNN

features [65, 160, 64, 175, 40].

In contrast, the basic “sliding window” strategy [42, 56, 55]

for localization remains to be the main stream, although with some

endeavors in [113, 209]. However the number of windows is large

and grows quadratically with the number of pixels, and the need to

search over multiple scales and aspect ratios further increases the

search space. The the huge search space results in high computa-

tional complexity. Therefore, the design of efﬁcient and effective

detection framework plays a key role. Commonly adopted strate-

gies include cascading, sharing feature computation, and reducing

per-window computation.

In this section, we review the milestone detection frameworks

present in generic object detection since deep learning entered the

ﬁeld, as listed in Fig. 6 and summarized in Table 10. Nearly all

detectors proposed over the last several years are based on one of

these milestone detectors, attempting to improve on one or more

aspects. Broadly these detectors can be organized into two main

categories:

A. Two stage detection framework, which includes a pre-processing

step for region proposal, making the overall pipeline two stage.

B. One stage detection framework, or region proposal free frame-

work, which is a single proposed method which does not sep-

arate detection proposal, making the overall pipeline single-

stage.

Section 4 will build on the following by discussing fundamental

subproblems involved in the detection framework in greater detail,

including DCNN features, detection proposals, context modeling,

bounding box regression and class imbalance handling.

3.1 Region Based (Two Stage Framework)

In a region based framework, category-independent region propos-

als are generated from an image, CNN [109] features are extracted

from these regions, and then category-speciﬁc classiﬁers are used

to determine the category labels of the proposals. As can be ob-

served from Fig. 6, DetectorNet [198], OverFeat [183], MultiBox

[52] and RCNN [65] independently and almost simultaneously

proposed using CNNs for generic object detection.

RCNN: Inspired by the breakthrough image classiﬁcation re-

sults obtained by CNN and the success of selective search in re-

gion proposal for hand-crafted features [209], Girshick et al. were

among the ﬁrst to explore CNN for generic object detection and

developed RCNN [65, 67], which integrates AlexNet [109] with

the region proposal method selective search [209]. As illustrated

in Fig. 7, training in an RCNN framework consists of multistage

pipelines:

1. Class-agnostic region proposals, which are candidate regions

that might contain objects, are obtained selective search [209];

2. Region proposals, which are cropped from the image and warped

into the same size, are used as the input for ﬁnetuning a CNN

model pre-trained using large-scale dataset such as ImageNet;

3. A set of class speciﬁc linear SVM classiﬁers are trained using

ﬁxed length features extracted with CNN, replacing the soft-

max classiﬁer learned by ﬁnetuning.

4. Bounding box regression is learned for each object class with

CNN features.

In spite of achieving high object detection quality, RCNN has no-

table drawbacks [64]:

1. Training is a multistage complex pipeline, which is inelegant,

slow and hard to optimize because each individual stage must

be trained separately.

2. Numerous region proposals which provide only rough local-

ization need to be externally detected.

3. Training SVM classiﬁers and bounding box regression is ex-

pensive in both disk space and time, since CNN features are

extracted independently from each region proposal in each im-

age, posing great challenges for large-scale detection, espe-

cially very deep CNN networks such as AlexNet [109] and

VGG [191].

4. Testing is slow, since CNN features are extracted per object

proposal in each testing image.

SPPNet: During testing, CNN features extraction is the main

bottleneck of the RCNN detection pipeline, which requires to ex-

tract CNN features from thousands of warped region proposals for

an image. Noticing these obvious disadvantages, He et al. [77] in-

troduced the traditional spatial pyramid pooling (SPP) [68, 114]

into CNN architectures. Since convolutional layers accept inputs

of arbitrary sizes, the requirement of ﬁxed-sized images in CNNs

is only due to the Fully Connected (FC) layers, He et al. found

this fact and added an SPP layer on top of the last convolutional

(CONV) layer to obtain features of ﬁxed-length for the FC lay-

ers. With this SPPnet, RCNN obtains a signiﬁcant speedup with-

out sacriﬁcing any detection quality because it only needs to run

the convolutional layers once on the entire test image to generate

ﬁxed-length features for region proposals of arbitrary size. While

SPPnet accelerates RCNN evaluation by orders of magnitude, it

does not result in a comparable speedup of the detector training.

Moreover, ﬁnetuning in SPPnet [77] is unable to update the convo-

lutional layers before the SPP layer, which limits the accuracy of

very deep networks.

Fast RCNN: Girshick [64] proposed Fast RCNN that addresses

some of the disadvantages of RCNN and SPPnet, while improv-

ing on their detection speed and quality. As illustrated in Fig. 8,

Fast RCNN enables end-to-end detector training (when ignoring

剩余29页未读，继续阅读

小埋妹妹

粉丝: 30
资源: 343

"深度学习在通用目标检测中的应用综述"

347 文献综述模板-论文.zip

421页rfpa综述2020-book-modellingrockfracturingprocess.pdf

综述论文 -案例.rar

文献综述查重-已改 - 副本.zip

Paloalto+产品解决方案综述-2013-11.rar

文献综述-金杰.doc

CNCERT-2019年我国互联网网络安全态势综述-2020.04-49页.pdf

柳丹文献综述-论文.zip

企业SAM工具研究综述-EN.pdf

电子政务绩效评估综述-PowerPointPresen.pptx

最新资源