YOLO：实时目标检测的新方法

需积分: 50 183 浏览量更新于2024-09-10 收藏 4.98MB PDF 举报

"You Only Look Once: Unified, Real-Time Object Detection" YOLO，全称为"You Only Look Once"，是一种创新的实时目标检测算法，由Joseph Redmon、Santosh Divvala、Ross Girshick和Ali Farhadi共同提出。这个算法在2016年的论文中首次发表，它彻底改变了对象检测领域，因为它将传统的分类器转化为一个实时的目标检测框架。在传统的对象检测方法中，通常先进行特征提取，然后通过滑动窗口或区域建议网络（如R-CNN）来识别可能包含对象的区域。然而，YOLO则采取了一种完全不同的方法。它将目标检测视为一个回归问题，直接预测出图像中每个对象的边界框及其所属类别的概率。这一过程在一个单一的神经网络中完成，这个网络一次评估就能预测全图的所有边界框和类别概率，因此能实现端到端的优化，以提升检测性能。 YOLO模型的设计极其高效。基础版的YOLO模型可以在45帧每秒的速度下处理图像，这在实时应用中具有显著优势。为了进一步提高速度，作者还设计了一个更小的版本——Fast YOLO，它能在保持高效率的同时，达到155帧每秒的处理速度，而且其平均精度（mAP）是其他实时检测器的两倍。尽管YOLO相比其他最先进的检测系统在定位误差上可能更高，但它显著降低了误报（假阳性）的发生率。这意味着YOLO在避免误判背景为对象方面表现得更好，从而提高了检测的准确性。这种平衡使得YOLO在实时应用中尤其受欢迎，例如自动驾驶、视频监控和机器人导航等领域。 YOLO的主要贡献在于它的统一架构和实时性能。通过将整个检测流程封装在一个网络中，YOLO可以快速且有效地进行训练和部署，同时保持了良好的检测精度。这一突破性的进展推动了深度学习在目标检测领域的应用，并启发了后续的许多改进和变体，如YOLOv2、YOLOv3和YOLOv4等，它们持续优化了检测速度和精度，进一步推动了计算机视觉技术的发展。

You Only Look Once:

Uniﬁed, Real-Time Object Detection

Joseph Redmon

∗

, Santosh Divvala

∗†

, Ross Girshick

, Ali Farhadi

∗†

University of Washington

∗

, Allen Institute for AI

†

, Facebook AI Research

http://pjreddie.com/yolo/

Abstract

We present YOLO, a new approach to object detection.

Prior work on object detection repurposes classiﬁers to per-

form detection. Instead, we frame object detection as a re-

gression problem to spatially separated bounding boxes and

associated class probabilities. A single neural network pre-

dicts bounding boxes and class probabilities directly from

full images in one evaluation. Since the whole detection

pipeline is a single network, it can be optimized end-to-end

directly on detection performance.

Our uniﬁed architecture is extremely fast. Our base

YOLO model processes images in real-time at 45 frames

per second. A smaller version of the network, Fast YOLO,

processes an astounding 155 frames per second while

still achieving double the mAP of other real-time detec-

tors. Compared to state-of-the-art detection systems, YOLO

makes more localization errors but is less likely to predict

false positives on background. Finally, YOLO learns very

general representations of objects. It outperforms other de-

tection methods, including DPM and R-CNN, when gener-

alizing from natural images to other domains like artwork.

1. Introduction

Humans glance at an image and instantly know what ob-

jects are in the image, where they are, and how they inter-

act. The human visual system is fast and accurate, allow-

ing us to perform complex tasks like driving with little con-

scious thought. Fast, accurate algorithms for object detec-

tion would allow computers to drive cars without special-

ized sensors, enable assistive devices to convey real-time

scene information to human users, and unlock the potential

for general purpose, responsive robotic systems.

Current detection systems repurpose classiﬁers to per-

form detection. To detect an object, these systems take a

classiﬁer for that object and evaluate it at various locations

and scales in a test image. Systems like deformable parts

models (DPM) use a sliding window approach where the

classiﬁer is run at evenly spaced locations over the entire

image [10].

More recent approaches like R-CNN use region proposal

1. Resize image.

2. Run convolutional network.

3. Non-max suppression.

Dog: 0.30

Person: 0.64

Horse: 0.28

Figure 1: The YOLO Detection System. Processing images

with YOLO is simple and straightforward. Our system (1) resizes

the input image to 448 × 448, (2) runs a single convolutional net-

work on the image, and (3) thresholds the resulting detections by

the model’s conﬁdence.

methods to ﬁrst generate potential bounding boxes in an im-

age and then run a classiﬁer on these proposed boxes. After

classiﬁcation, post-processing is used to reﬁne the bound-

ing boxes, eliminate duplicate detections, and rescore the

boxes based on other objects in the scene [13]. These com-

plex pipelines are slow and hard to optimize because each

individual component must be trained separately.

We reframe object detection as a single regression prob-

lem, straight from image pixels to bounding box coordi-

nates and class probabilities. Using our system, you only

look once (YOLO) at an image to predict what objects are

present and where they are.

YOLO is refreshingly simple: see Figure 1. A sin-

gle convolutional network simultaneously predicts multi-

ple bounding boxes and class probabilities for those boxes.

YOLO trains on full images and directly optimizes detec-

tion performance. This uniﬁed model has several beneﬁts

over traditional methods of object detection.

First, YOLO is extremely fast. Since we frame detection

as a regression problem we don’t need a complex pipeline.

We simply run our neural network on a new image at test

time to predict detections. Our base network runs at 45

frames per second with no batch processing on a Titan X

GPU and a fast version runs at more than 150 fps. This

means we can process streaming video in real-time with

less than 25 milliseconds of latency. Furthermore, YOLO

achieves more than twice the mean average precision of

other real-time systems. For a demo of our system running

in real-time on a webcam please see our project webpage:

http://pjreddie.com/yolo/.

Second, YOLO reasons globally about the image when

下载后可阅读完整内容，剩余9页未读，立即下载

SimpleUmbrella

粉丝: 105
资源: 35

YOLO：实时目标检测的新方法

YOLOv1：实时目标检测的新突破——统一、高效物体识别

YOLOV1论文与PPT：实时统一目标检测的深度解析

"深层系统介绍YOLOv1并结构图修改：实时统一目标检测技术比较结果"。

You Only Look Once:Unified, Real-Time Object Detection

目标检测： You Only Look Once Unified, Real-Time Object Detection 研究

You Only Look Once: Unified, Real-Time Object Detection(YOLO)

You Only Look Once Unified, Real-Time Object Detection.pdf

You Only Look Once- Unified, Real-Time Object Detection-孙超1

You Only Look OnceUnified, Real-Time Object Detectio.pdf

Yolo，you only look once，yolo物体检测系列算法介绍，70页PPT资源

最新资源