YOLOv3：小改进大提升的目标检测技术

需积分: 0 170 浏览量更新于2024-08-04 收藏 2.34MB PDF 举报

本文是一篇关于目标检测领域的研究论文，名为"YOLOv3: An Incremental Improvement"，由Joseph Redmon和Ali Farhadi在华盛顿大学共同撰写。YOLOv3是对前一代YOLO模型的优化升级，旨在提高检测精度和速度。作者提到，虽然论文没有涉及大量的新研究，但通过一系列小的改动，YOLOv3在保持高效率的同时，显著提升了性能。 1. **更新与改进**: YOLOv3在设计上做了若干微调，包括但不限于网络架构的优化，这使得它在保持实时性（例如，在320x320分辨率下运行速度达到22毫秒，准确率达到28.2 mAP）的同时，比其前辈SSD更准确。特别是在旧的5IOU mAP检测指标下，YOLOv3表现出色，实现了57.9 AP50的精度，仅需51毫秒，相比之下，RetinaNet的57.5 AP50需要198毫秒，这意味着YOLOv3的速度提高了约3.8倍。 2. **速度与准确性**: 论文强调了YOLOv3的双重优势，即使在提高精度的同时，也保持了较高的运行速度。这表明它在实际应用中的实用性得到了提升，尤其是在需要快速响应和高精度场景中。 3. **技术背景**: 作者并未专注于深入理论研究，而是利用了去年的一些研究成果（[12]和[1]），并将这些积累用于改进YOLOv3。他们承认，这些改进并非革命性的，但是一系列的小改进组合起来，带来了整体性能的提升。 4. **代码共享**: 与以往一样，作者公开了所有的代码，以便于社区成员的交流和进一步研究，体现了开放源代码的精神，有助于技术的持续发展和创新。 5. **研究环境**: 训练工作是在TitanX显卡上进行的，这展示了实验的硬件环境，同时也为其他研究人员提供了参考。 YOLOv3论文主要关注的是如何通过增量改进现有的目标检测算法，使其在保持高速度的同时提高检测精度，同时分享开源代码以促进领域内的合作和进步。

YOLOv3: An Incremental Improvement

Joseph Redmon Ali Farhadi

University of Washington

Abstract

We present some updates to YOLO! We made a bunch

of little design changes to make it better. We also trained

this new network that’s pretty swell. It’s a little bigger than

last time but more accurate. It’s still fast though, don’t

worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP,

as accurate as SSD but three times faster. When we look

at the old .5 IOU mAP detection metric YOLOv3 is quite

good. It achieves 57.9 AP

in 51 ms on a Titan X, com-

pared to 57.5 AP

in 198 ms by RetinaNet, similar perfor-

mance but 3.8× faster. As always, all the code is online at

https://pjreddie.com/yolo/.

1. Introduction

Sometimes you just kinda phone it in for a year, you

know? I didn’t do a whole lot of research this year. Spent

a lot of time on Twitter. Played around with GANs a little.

I had a little momentum left over from last year [12] [1]; I

managed to make some improvements to YOLO. But, hon-

estly, nothing like super interesting, just a bunch of small

changes that make it better. I also helped out with other

people’s research a little.

Actually, that’s what brings us here today. We have

a camera-ready deadline [4] and we need to cite some of

the random updates I made to YOLO but we don’t have a

source. So get ready for a TECH REPORT!

The great thing about tech reports is that they don’t need

intros, y’all know why we’re here. So the end of this intro-

duction will signpost for the rest of the paper. First we’ll tell

you what the deal is with YOLOv3. Then we’ll tell you how

we do. We’ll also tell you about some things we tried that

didn’t work. Finally we’ll contemplate what this all means.

2. The Deal

So here’s the deal with YOLOv3: We mostly took good

ideas from other people. We also trained a new classiﬁer

network that’s better than the other ones. We’ll just take

you through the whole system from scratch so you can un-

derstand it all.

50 100 150 200 250

inference time (ms)

COCO AP

B C

RetinaNet-50

RetinaNet-101

YOLOv3

Method

[B] SSD321

[C] DSSD321

[D] R-FCN

[E] SSD513

[F] DSSD513

[G] FPN FRCN

RetinaNet-50-500

RetinaNet-101-500

RetinaNet-101-800

YOLOv3-320

YOLOv3-416

YOLOv3-608

mAP

28.0

29.9

31.2

33.2

36.2

32.5

34.4

37.8

28.2

31.0

33.0

time

125

156

172

198

Figure 1. We adapt this ﬁgure from the Focal Loss paper [9].

YOLOv3 runs signiﬁcantly faster than other detection methods

with comparable performance. Times from either an M40 or Titan

X, they are basically the same GPU.

2.1. Bounding Box Prediction

Following YOLO9000 our system predicts bounding

boxes using dimension clusters as anchor boxes [15]. The

network predicts 4 coordinates for each bounding box, t

, t

. If the cell is offset from the top left corner of the

image by (c

, c

) and the bounding box prior has width and

height p

, p

, then the predictions correspond to:

= σ(t

) + c

= σ(t

) + c

= p

During training we use sum of squared error loss. If the

ground truth for some coordinate prediction is

our gra-

dient is the ground truth value (computed from the ground

truth box) minus our prediction:

− t

. This ground truth

value can be easily computed by inverting the equations

above.

YOLOv3 predicts an objectness score for each bounding

box using logistic regression. This should be 1 if the bound-

ing box prior overlaps a ground truth object by more than

any other bounding box prior. If the bounding box prior

arXiv:1804.02767v1 [cs.CV] 8 Apr 2018

下载后可阅读完整内容，剩余5页未读，立即下载

麦滋堡的摸鱼芝士

粉丝: 405
资源: 10

YOLOv3：小改进大提升的目标检测技术

YOLOV1-V7英文论文，深度学习、目标检测领域必读经典论文

yolo，yolov2,yolov3论文原文

yolov1-yolov7总共7篇英文论文原文

YOLOv3论文原文及原格式译文word+pdf

YOLOV1论文原文和论文对应的ppt文件You Only Look Once:Unified, Real-Time Objec

yolov4论文word版

yolov7论文及解读

深度学习中目标检测的论文原文

全新的SOTA模型YOLOv9原文 + 论文阅读笔记

yolo系列论文原文，包含yolov1~yolov7

最新资源