华为 Gold-YOLO: 实时目标检测的新突破——融合与分布机制

需积分: 5 84 浏览量更新于2024-06-17 1 收藏 33.41MB PDF 举报

Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism 在实时物体检测领域，YOLO系列模型凭借其高效性和准确性逐渐成为行业的领导者。近年来，许多研究者通过改进架构、数据增强和设计新的损失函数来提升基准线，但现有的模型仍然存在信息融合问题。尽管特征金字塔网络（FPN）和路径聚合网络（PANet）在一定程度上缓解了这一问题，但黄金YOLO的研究者们发现仍有改进的空间。为了克服信息融合瓶颈，本研究提出了一种先进的“Gather-and-Distribute”（GD）机制，它结合了卷积和自注意力操作，旨在增强多尺度特征的融合能力。黄金YOLO模型正是基于这种机制设计的，它在保持高精度的同时，成功地实现了延迟和准确性的理想平衡，尤其在不同模型规模下表现出色。值得注意的是，黄金YOLO还引入了MAE风格的预训练方法，这是一种新颖的预训练策略，能够进一步提升模型的基础性能，使得模型在训练初期就能展现出更好的泛化能力和适应性。通过这种方式，黄金YOLO不仅在速度和准确性上超越了先前的YOLO系列模型，而且在实际应用中的鲁棒性和适应各种复杂场景的能力也得到了显著增强。在技术细节上，GD机制可能包括了对低分辨率特征进行聚集处理，以便捕捉全局上下文信息，然后通过自注意力机制将这些信息有效地分布到不同尺度的特征图中，从而优化特征融合过程。这有助于减少误报和漏报，提高定位精度，同时保持实时性。总结来说，黄金YOLO的出现标志着在实时物体检测领域的一个重要里程碑，它不仅提升了模型的性能，而且通过创新的信息融合方法，为未来的实时目标检测任务提供了新的思考方向。随着GD机制的引入，黄金YOLO有望在商业应用和科研领域都产生深远的影响。

level-3level-3

level-2level-2

level-1level-1

level-3level-3

level-2level-2

level-1level-1

fusefuse

(a) traditional neck structure (b) traditional neck (c) our proposed neck

Figure 3: (a) is example diagram of traditional neck information fusion structure. (b) and (c) is

AblationCAM [38] visualization

state-of-the-art performance with single-level features. SFNet [

] aligns different level features with

semantic ﬂow to improves FPN performance in model. SAFNet [

] introduced Adaptive Feature

Fusion and Self-Enhanced Modules. [

] presented a parallel FPN structure for object detection

with bi-directional fusion.However, due to the excessive number of paths and indirect interaction

methods in the network, the previous FPN-based fusion structures still have drawbacks in low speed,

cross-level information exchange and information loss.

However, due to the excessive number of paths and indirect interaction methods in the network, the

previous FPN-based fusion structures still have drawbacks in low speed, cross-level information

exchange and information loss.

3 Method

3.1 Preliminaries

The YOLO series neck structure, as depicted in Fig.3, employs a traditional FPN structure, which

comprises multiple branches for multi-scale feature fusion. However, it only fully fuse features from

neighboring levels, for other layers information it can only be obtained indirectly ‘recursively’. In

Fig.3, it shows the information fusion structure of the conventional FPN: where existing level-1, 2,

and 3 are arranged from top to bottom. FPN is used for fusion between different levels. There are

two distinct scenarios when level-1 get information from the other two levels:

If level-1 seeks to utilize information from level-2, it can directly access and fuse this information.

If level-1 wants to use level-3 information, level-1 should recursively calling the information

fusion module of the adjacent layer. Speciﬁcally, the level-2 and level-3 information must be fused

ﬁrst, then level-1 can indirectly obtain level-3 information by combining level-2 information.

This transfer mode can result in a signiﬁcant loss of information during calculation. Information

interactions between layers can only exchange information that is selected by intermediate layers, and

not selected information is discarded during transmission. This leads to a situation where information

at a certain level can only adequately assist neighboring layers and weaken the assistance provided to

other global layers. As a result, the overall effectiveness of the information fusion may be limited.

To avoid information loss in the transmission process of traditional FPN structures, we abandon the

original recursive approach and construct a novel gather-and-distribute mechanism (GD). By using

a uniﬁed module to gather and fuse information from all levels and subsequently distribute it to

different levels, we not only avoid the loss of information inherent in the traditional FPN structure but

also enhance the neck’s partial information fusion capabilities without signiﬁcantly increasing latency.

Our approach thus allows for more effective leveraging of the features extracted by the backbone,

and can be easily integrated into any existing backbone-neck-head structure.

剩余18页未读，继续阅读

o涂鸦小巷的菇凉o

粉丝: 1793
资源: 3

华为 Gold-YOLO: 实时目标检测的新突破——融合与分布机制

YOLO_1.pdf

深度学习论文：华为提出Gold-YOLO，高效实时目标检测器

华为Gold-YOLO：高效目标检测新突破

mindinsight-master.zip

MobileNet-Yolo：MobileNetV2-YoloV3-Nano：0.5BFlops 3MB华为P40：6msimg，YoloFace-500k：0.1Bflops 420KB

目标检测-手机屏幕表面缺陷检测数据集-1200张图-+对应VOC-COCO-YOLO三种格式标签+数据集划分脚本

融合GhostNet和Yolov5的遥感图像目标检测-谢轩.pdf

yolo开发ma-yolov4.zip

为Fast-YOLO选择合适的硬件平台

公开资料《五大生物识别技术报告解读 别一说人工智能就下围棋》NLP&YOLO报告原文.pdf

最新资源

公开资料《五大生物识别技术报告解读别一说人工智能就下围棋》NLP&YOLO报告原文.pdf