YOLO：统一实时目标检测的新突破

需积分: 0 71 浏览量更新于2024-08-03 收藏 1.19MB DOCX 举报

"YOLO: You Only Look Once" 是一篇具有革命性的论文，它提出了一个全新的对象检测方法，彻底改变了传统计算机视觉领域的检测任务处理方式。在此之前，物体检测通常依赖于重新设计分类器来执行这项任务，而YOLO则将检测问题定义为回归问题，目标是直接从整张图片预测出多个空间上分开的边界框（bounding boxes）及其对应的类别概率。 YOLO的核心理念是将整个检测过程整合成一个单一的神经网络，这个网络在一次前向传播中就能同时完成对图像中所有物体的定位和识别。这种统一架构的优势在于它的实时性：基础版本的YOLO模型可以达到每秒45帧的处理速度，这意味着它能够在实时视频流中进行高效的物体检测。为了进一步提升速度，作者还开发了一个更小版本的模型，称为Fast YOLO，其速度达到了惊人的155帧/秒，同时保持了与其他实时检测器相当甚至更高的平均精度（mAP，mean Average Precision）。与当时最先进的检测系统相比，YOLO虽然在局部定位精度上可能稍逊一筹，但它在避免误报（false positives）方面表现出色，即在背景区域较少产生错误的检测结果。这表明YOLO在追求速度的同时，也注重提高整体性能和精确度的平衡。此外，由于YOLO采用了端到端的学习策略，即整个网络参数可以直接针对检测性能进行优化，这使得模型能够充分利用大量的标注数据进行训练，并通过反向传播调整权重，从而提升检测的准确性。这种方法简化了传统的两阶段检测流程（如R-CNN），减少了计算量，使得对象检测变得更加高效和实时。 YOLO的提出标志着物体检测领域的一个重大突破，它不仅提供了实时的性能，而且改变了我们对于对象检测任务的理解和实现方式。这一创新为后续的实时计算机视觉应用，如自动驾驶、无人机监控和视频分析等领域，奠定了坚实的基础。随着技术的发展，YOLO系列及其后续改进版本如YOLOv2、YOLOv3等，持续推动着物体检测技术的进步。

You Only Look Once:

Unified, Real-Time Object Detection

Joseph Redmon

∗

, Santosh Divvala

∗†

, Ross Girshick

, Ali Farhadi

∗†

University of Washington

∗

, Allen Institute for AI

†

, Facebook AI Research

http://pjreddie.com/yolo/

Abstract

We present YOLO, a new approach to object detection.

Prior work on object detection repurposes classifiers to per-

form detection. Instead, we frame object detection as a re-

gression problem to spatially separated bounding boxes and

associated class probabilities. A single neural network pre-

dicts bounding boxes and class probabilities directly from

full images in one evaluation. Since the whole detection

pipeline is a single network, it can be optimized end-to-end

directly on detection performance.

Our unified architecture is extremely fast. Our base

YOLO model processes images in real-time at 45 frames

per second. A smaller version of the network, Fast YOLO,

processes an astounding 155 frames per second while

still achieving double the mAP of other real-time detec-

tors. Compared to state-of-the-art detection systems, YOLO

makes more localization errors but is less likely to predict

false positives on background. Finally, YOLO learns very

general representations of objects. It outperforms other de-

tection methods, including DPM and R-CNN, when gener-

alizing from natural images to other domains like artwork.

Introduction

Humans glance at an image and instantly know what ob-

jects are in the image, where they are, and how they inter-

act. The human visual system is fast and accurate, allow-

ing us to perform complex tasks like driving with little con-

scious thought. Fast, accurate algorithms for object detec-

tion would allow computers to drive cars without special-

ized sensors, enable assistive devices to convey real-time

scene information to human users, and unlock the potential

for general purpose, responsive robotic systems.

Current detection systems repurpose classifiers to per-

form detection. To detect an object, these systems take a

classifier for that object and evaluate it at various locations

and scales in a test image. Systems like deformable parts

models (DPM) use a sliding window approach where the

classifier is run at evenly spaced locations over the entire

image [10].

More recent approaches like R-CNN use region proposal

Figure 1: The YOLO Detection System. Processing images

with YOLO is simple and straightforward. Our system (1) resizes

the input image to 448 448, (2) runs a single convolutional net-

work on the image, and (3) thresholds the resulting detections by

the model’s confidence.

methods to first generate potential bounding boxes in an im-

age and then run a classifier on these proposed boxes. After

classification, post-processing is used to refine the bound-

ing boxes, eliminate duplicate detections, and rescore the

boxes based on other objects in the scene [13]. These com-

plex pipelines are slow and hard to optimize because each

individual component must be trained separately.

We reframe object detection as a single regression prob-

lem, straight from image pixels to bounding box coordi-

nates and class probabilities. Using our system, you only

look once (YOLO) at an image to predict what objects are

present and where they are.

YOLO is refreshingly simple: see Figure 1. A sin-

gle convolutional network simultaneously predicts multi-

ple bounding boxes and class probabilities for those boxes.

YOLO trains on full images and directly optimizes detec-

tion performance. This unified model has several benefits

over traditional methods of object detection.

First, YOLO is extremely fast. Since we frame detection

as a regression problem we don’t need a complex pipeline.

We simply run our neural network on a new image at test

time to predict detections. Our base network runs at 45

frames per second with no batch processing on a Titan X

GPU and a fast version runs at more than 150 fps. This

means we can process streaming video in real-time with

less than 25 milliseconds of latency. Furthermore, YOLO

achieves more than twice the mean average precision of

other real-time systems. For a demo of our system running

in real-time on a webcam please see our project webpage:

http://pjreddie.com/yolo/.

Second, YOLO reasons globally about the image when

Resize image.

Run convolutional network.

Non-max suppression.

Person: 0.64

Horse: 0.28

Dog: 0.30

arXiv:1506.02640v5 [cs.CV] 9 May 2016

下载后可阅读完整内容，剩余9页未读，立即下载

Planetesimals.

粉丝: 15
资源: 1

YOLO：统一实时目标检测的新突破

You Only Look Once Unified, Real-Time Object Detection.pdf

You Only Look Once:Unified, Real-Time Object Detection

目标检测： You Only Look Once Unified, Real-Time Object Detection 研究

You Only Look Once: Unified, Real-Time Object Detection(YOLO)

You Only Look Once- Unified, Real-Time Object Detection-孙超1

You Only Look OnceUnified, Real-Time Object Detectio.pdf

Yolo，you only look once，yolo物体检测系列算法介绍，70页PPT资源

"YOLO" 是一种在计算机视觉领域广泛使用的目标检测算法，全称为 "You Only Look Once" 这种算法由 Jos

Optimal Speed and Accuracy of Object Detection

66.深度学习物体检测详解：YOLO vs SSD - Enjoy Coding - CSDN博客1

最新资源