YOLO-Z：提升YOLOv5在自动驾驶中小物体检测的性能

需积分: 0 172 浏览量更新于2024-08-04 9 收藏 3.87MB PDF 举报

"YOLO-Z 是一种针对自动驾驶领域小物体检测优化的YOLOv5算法改进版，由Aduen Benjumea等人提出。它旨在解决自动驾驶系统中对小型物体快速准确检测的需求，通过改进YOLOv5的骨干网络架构和采用特定策略提升检测效果。" YOLO (You Only Look Once) 是一种实时目标检测系统，以其高效和实时性而受到广泛关注。YOLOv5作为YOLO系列的最新版本，已经在速度和准确性之间取得了良好的平衡。然而，在处理自动驾驶中的小物体检测时，YOLOv5仍存在一定的局限性。YOLO-Z正是针对这一问题提出的解决方案。首先，YOLO-Z的核心改进之一是“分裂原图”策略。该方法将原始输入图像分割成多个小块，使得模型可以更专注于细节，从而提高对小物体的识别能力。这种策略有助于扩大模型的视野，减少因物体尺寸过小而引起的漏检问题。其次，YOLO-Z改进了非极大抑制（Non-Maximum Suppression, NMS）策略。NMS是一种常用的技术，用于消除目标检测中产生的重叠边界框。在YOLO-Z中，这个策略被优化，使得模型能够更准确地处理相互遮挡的小物体，避免误判和漏检。在YOLOv5的网络结构基础上，YOLO-Z对模型的部分结构元素、连接及参数进行了调整和优化。这包括可能涉及卷积层、池化层、激活函数等组件的修改，以适应小物体检测的特殊需求。这些调整旨在提升模型在小尺度物体上的检测精度，同时尽可能保持原有的计算效率。实验结果表明，YOLO-Z在COCO数据集上达到了领先的平均精度（mean Average Precision, mAP），尤其是在小物体类别上，性能显著提升。这对于自动驾驶系统至关重要，因为这类系统需要在复杂环境中及时准确地识别各种大小的物体，尤其是那些可能影响安全的小型障碍物。 YOLO-Z的创新之处在于结合了YOLOv5的强大基础并针对性地解决了小物体检测的难题。这种方法对于推动自动驾驶技术的发展，特别是在赛车等高精度需求的场景中，具有重要的实践意义。通过对网络结构的深入研究和优化，YOLO-Z为未来的目标检测算法提供了一个新的方向，有望进一步提升机器在小物体检测方面的性能。

complex systems for helmet detection [10] also do a great job at leveraging the contextual information around small

objects to isolate them and facilitate their detection. However, their approach is not quite universally applicable and

comes at the cost of introducing a two-step process.

Typical adjustments to the internal structures of the model are surface-level. In a recent apple detection system [32],

the backbone of YOLOv5 is slightly modiﬁed to simplify it, which offers the potential to adapt to the system’s re-

quirements and one that opens the way for additional changes. If a single backbone element is modiﬁed, more drastic

changes can be applied for additional effects.

2.4 Small object detection

Some effort has been put into developing systems which direct the processing towards certain areas of the input image

[29, 28, 27], which allows us to adjust resolution and therefore bypass the limitation of having fewer pixels deﬁning an

object. This approach, however, is better suited for systems that are not time-sensitive, as they require multiple passes

through a network at different scales. This idea of paying more attention to speciﬁc scales can nevertheless inspire the

way we treat certain feature maps.

Additionally, a lot can be learned by looking at how feature maps can be treated instead of just modifying the backbone.

Different types of feature pyramid networks (FPN) [13, 30, 15] can aggregate feature maps differently to enhance a

backbone in different ways. Such techniques prove to be rather effective.

2.5 Autonomous vehicles

Within autonomous driving, object detection can provide valuable contextual information about the vehicle’s surround-

ings and heavily inform its decision making process [17, 4]. In this case, smaller objects translate to objects further

away, meaning a more complete context for the system to make use of. These systems heavily focus on inference time,

sacriﬁcing performance if needed, but work can be done to improve them at minimal cost. Performance in this ﬁeld is

critical, as a small improvement in this system can greatly impact the entire vehicle. A common requirement in this

area is for detectors to be single-stage [31], for the simple reason that fewer steps and transitions between them often

translates into fewer resources needed.

3 Methodology

YOLOv5 provides four different scales for their model, S, M , L and X which stand for Small, Medium, Large, and

Xlarge, respectively. Each of these scales applies a different multiplier to the depth and width of the model, meaning

the overall structure of the model remains constant, but the size and complexity of each model are scaled. In Our

experiments, we apply changes to the structure of the models individually across all the scales and treat each one as a

different model for the purposes of evaluating their effect.

To set a baseline, we trained and tested the unmodiﬁed versions of the four scales of YOLOv5. We then tested changes

to these networks individually in order to observe their impact separately against our baseline results. The techniques

and structures that did not appear to contribute to better accuracy or inference time were ﬁltered out when moving to the

next phase. We then attempted combinations of the selected techniques. This process was repeated, observing whether

certain techniques complemented or diminished each other and adding more complex combinations progressively.

We ﬁrst discuss the appropriate evaluation metric for our work (Section 3.1), and the dataset used for our investigation

(Section 3.2). We then move on to describe our plans to apply a number of model changes to be run under controlled

circumstances (Section 3.2), logging and adjusting as we move through different stages.

3.1 Evaluation metric

The original implementation of YOLOv5 provides compatibility with Microsoft Common Objects in Context (COCO)

API’s [14] metrics at three different object scales (bounding box areas) and Intersection over Unions (IOU ), which

proves useful for the purpose of this study. The way values at speciﬁc scales are calculated can give us a good indication

剩余10页未读，继续阅读

充电君

粉丝: 2768

YOLO-Z：提升YOLOv5在自动驾驶中小物体检测的性能

YOLOV5算法改进及其现实应用

面向采摘机器人的改进YOLOv3-tiny轻量化柑橘识别方法.docx

高斯 YOLOv3一种利用定位不确定性实现自动驾驶的准确快速的物体检测器 (ICCV，2019).zip

一种基于YOLOv4的改进DeepSort目标跟踪算法.docx

目标检测-YOLOv3论文原文

yolov论文的一个简单介绍

基于Yolov3的自动驾驶目标检测.pdf

yolov论文.zip

YOLOV：一阶段物体检测器在视频物体检测中的新突破

cole_02_0507.pdf

最新资源