YOLO：实时目标检测的新方法

需积分: 20 45 浏览量更新于2024-09-07 收藏 1.16MB PDF 举报

"YOLO（You Only Look Once）是一种实时物体检测算法，通过将物体检测问题转化为回归问题，使用一个卷积神经网络（CNN）直接从输入图像预测边界框（bounding box）和类别概率。YOLO算法具有速度快、全局信息利用、泛化能力强和准确性高等优点。在Titan X GPU上，基础版YOLO可以达到45 fps的处理速度，而快速版Fast YOLO则可达到155 fps，同时在误检测方面表现优于其他实时检测系统。YOLO模型作为一个整体，能够端到端地优化，直接针对检测性能进行训练。" YOLO算法的核心在于其创新的设计，它摒弃了传统的将分类器用于检测的方法，而是提出了一种回归问题的解决方式。这个单一的神经网络不仅预测边界框的位置，还估计每个框内物体的类别概率。由于整个检测流程在一个网络中完成，YOLO可以被优化以直接提升检测性能。 YOLO算法的实时性得益于其高效的计算效率。基础版的YOLO模型可以在保持高帧率的同时完成物体检测，这对于实时应用如自动驾驶、视频监控等场景至关重要。而Fast YOLO则是通过减少网络规模进一步提升了速度，尽管牺牲了一些精度，但其检测速率和误检率的平衡仍然优于其他同类实时检测器。 YOLO的一个显著特点是考虑了图像的全局信息，这与基于滑动窗口或区域提议的方法不同。这种全局视角有助于减少误检测，比如将背景误判为物体的情况。此外，YOLO在学习物体的通用表示（generalizable representations）方面表现出色，这意味着它能在未见过的场景中保持良好的检测效果，即具有强大的泛化能力。在性能比较方面，YOLO虽然可能会出现更多的定位错误，但相对于其他最先进的检测系统，它更少预测假阳性（false positives），这是非常关键的一点，因为假阳性的减少意味着减少了不必要的警报和误报。 YOLO算法通过其独特的设计和强大的性能，为实时物体检测提供了一个高效的解决方案，它不仅在速度上领先，而且在准确性、泛化能力和对全局信息的利用上都有出色的表现。这使得YOLO成为了深度学习领域中物体检测研究和应用的重要里程碑。

You Only Look Once:

Uniﬁed, Real-Time Object Detection

Joseph Redmon

∗

, Santosh Divvala

∗†

, Ross Girshick

, Ali Farhadi

∗†

University of Washington

∗

, Allen Institute for AI

†

, Facebook AI Research

http://pjreddie.com/yolo/

Abstract

We present YOLO, a new approach to object detection.

Prior work on object detection repurposes classiﬁers to per-

form detection. Instead, we frame object detection as a re-

gression problem to spatially separated bounding boxes and

associated class probabilities. A single neural network pre-

dicts bounding boxes and class probabilities directly from

full images in one evaluation. Since the whole detection

pipeline is a single network, it can be optimized end-to-end

directly on detection performance.

Our uniﬁed architecture is extremely fast. Our base

YOLO model processes images in real-time at 45 frames

per second. A smaller version of the network, Fast YOLO,

processes an astounding 155 frames per second while

still achieving double the mAP of other real-time detec-

tors. Compared to state-of-the-art detection systems, YOLO

makes more localization errors but is less likely to predict

false positives on background. Finally, YOLO learns very

general representations of objects. It outperforms other de-

tection methods, including DPM and R-CNN, when gener-

alizing from natural images to other domains like artwork.

1. Introduction

Humans glance at an image and instantly know what ob-

jects are in the image, where they are, and how they inter-

act. The human visual system is fast and accurate, allow-

ing us to perform complex tasks like driving with little con-

scious thought. Fast, accurate algorithms for object detec-

tion would allow computers to drive cars without special-

ized sensors, enable assistive devices to convey real-time

scene information to human users, and unlock the potential

for general purpose, responsive robotic systems.

Current detection systems repurpose classiﬁers to per-

form detection. To detect an object, these systems take a

classiﬁer for that object and evaluate it at various locations

and scales in a test image. Systems like deformable parts

models (DPM) use a sliding window approach where the

classiﬁer is run at evenly spaced locations over the entire

image [

10].

More recent approaches like R-CNN use region proposal

1. Resize image.

2. Run convolutional network.

3. Non-max suppression.

Dog: 0.30

Person: 0.64

Horse: 0.28

Figure 1: The YOLO Detection System. Processing images

with YOLO is simple and straightforward. Our system (1) resizes

the input image to 448 × 448, (2) runs a single convolutional net-

work on the image, and (3) thresholds the resulting detections by

the model’s conﬁdence.

methods to ﬁrst generate potential bounding boxes in an im-

age and then run a classiﬁer on these proposed boxes. After

classiﬁcation, post-processing is used to reﬁne the bound-

ing boxes, eliminate duplicate detections, and rescore the

boxes based on other objects in the scene [

13]. These com-

plex pipelines are slow and hard to optimize because each

individual component must be trained separately.

We reframe object detection as a single regression prob-

lem, straight from image pixels to bounding box coordi-

nates and class probabilities. Using our system, you only

look once (YOLO) at an image to predict what objects are

present and where they are.

YOLO is refreshingly simple: see Figure

1. A sin-

gle convolutional network simultaneously predicts multi-

ple bounding boxes and class probabilities for those boxes.

YOLO trains on full images and directly optimizes detec-

tion performance. This uniﬁed model has several beneﬁts

over traditional methods of object detection.

First, YOLO is extremely fast. Since we frame detection

as a regression problem we don’t need a complex pipeline.

We simply run our neural network on a new image at test

time to predict detections. Our base network runs at 45

frames per second with no batch processing on a Titan X

GPU and a fast version runs at more than 150 fps. This

means we can process streaming video in real-time with

less than 25 milliseconds of latency. Furthermore, YOLO

achieves more than twice the mean average precision of

other real-time systems. For a demo of our system running

in real-time on a webcam please see our project webpage:

http://pjreddie.com/yolo/.

Second, YOLO reasons globally about the image when

779

下载后可阅读完整内容，剩余7页未读，立即下载

spartanfuk

粉丝: 48
资源: 5

YOLO：实时目标检测的新方法

yoloV4.pdf

从零开始学习YOLO.pdf

YOLOX原文献中文翻译版pdf

yolo从入门到精通.pdf下载

yolo ghost

yolo目标检测杨建华pdf

yolo源码解析 pdf

uav123转yolo

将yolov5中普通卷积替换为acnet卷积模块，代码实现

yolov5如何修改损失函数

最新资源