解决单阶段目标检测中的特征混淆：增强解耦模块与响应对齐策略

152 浏览量更新于2024-08-03 收藏 3.06MB PDF 举报

本文主要探讨了"Feature disentanglement in one-stage object detection"这一主题，发表在Pattern Recognition期刊上，发表日期为2023年8月18日，卷号为145，页码为109878。该研究聚焦于解决深度学习中的对象检测器，尤其是基于卷积神经网络（Convolutional Neural Networks, CNN）的一阶段检测器所面临的挑战——特征对齐问题。传统的CNN-based object detectors通常结合了分类和回归任务，这两者之间可能存在内在的难以调和的冲突，导致特征在不同任务上的表示不一致，即特征misalignment。为解决这个问题，作者提出了一种增强的解耦模块，特别应用在特征金字塔网络（Feature Pyramid Network, FPN）的架构中。FPN是现代对象检测模型中的关键组件，它通过多尺度特征融合来提高检测性能。该研究方法的核心在于对FPN中的特征进行解耦，即分离出那些专用于分类和回归任务的独立特征表示，以减少它们之间的干扰。通过这种解耦，可以期望提高模型的精确性和鲁棒性，因为分类和定位任务可以各自优化其特定的目标。此外，文中还引入了一个响应对齐策略，这可能是通过某种形式的软采样或自适应调整机制，确保不同层次的特征在空间位置上更好地对应物体的实际位置，从而进一步提升检测的准确性。这种方法可能涉及到特征图的重新校准或者利用注意力机制来强化关键特征的提取。这篇论文对于提高一阶段物体检测的性能具有重要意义，它通过深入理解并解决特征对齐问题，为后续的研究者提供了一种有效的改进方法，特别是在实时性和准确性之间找到一个平衡。对于关注深度学习、计算机视觉和对象检测技术的读者来说，这篇文章是一个有价值的参考资源。

Pattern Recognition 145 (2024) 109878

Available online 18 August 2023

Contents lists available at ScienceDirect

Pattern Recognition

journal homepage: www.elsevier.com/locate/pr

Feature disentanglement in one-stage object detection

Wenjie Lin

, Jun Chu

∗

, Lu Leng

, Jun Miao

, Lingfeng Wang

Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition Nanchang Hangkong University, Nanchang 330063, China

College of Information Science and Technology Beijing University of Chemical Technology, Beijing 100029, China

A R T I C L E I N F O

Keywords:

Object detection

Feature misalignment

Response alignment

Feature disentanglement

Soft sampling

A B S T R A C T

In this paper, an enhanced disentanglement module is proposed to address feature misalignment caused by

inherently irreconcilable conflicts between classification and regression tasks in Convolutional Neural Network-

based object detectors. The proposed method disentangles features in the feature pyramid network (FPN) at the

neck of the architecture. In addition, a response alignment strategy is proposed to reduce inconsistent responses

and suppress inferior predictions. Extensive experiments are performed on the MS COCO and PASCAL VOC

datasets with different backbones, confirming that the proposed method improves performance significantly.

The proposed method exhibits two main advantages over existing solutions—features are disentangled at the

neck instead of at the head, enabling comprehensive resolution of feature misalignment, and independent

outputs of the two tasks after feature disentanglement are avoided, thereby preventing response inconsistencies.

1. Introduction

Recently, convolutional neural networks (CNNs) [1] have been

widely adopted for object detection, with satisfactory performances.

CNN-based object detection models can be classified as two-stage [2,3]

and one-stage detectors [4,5]. Two-stage detectors with region pro-

posal mechanisms typically exhibit better accuracy. On the other hand,

one-stage detectors balance speed and accuracy; therefore, they are

commonly used in practical applications.

Unfortunately, an inherently irreconcilable conflict called feature

misalignment occurs between the classification and regression tasks

in object detection architectures, which degrades detection accuracy.

To address this conflict, most modern detectors use two task-specific

parameter-independent branches (Separate head), instead of a

parameter-sharing branch (Shared Head), to infer object categories and

bounding boxes. For instance, in RetinaNet [4], separate-head consist-

ing of two lightweight fully convolutional networks were introduced

into the framework. In [6], Wu et al. studied the effects of using

separate and shared heads on performance. The authors concluded

that using separate-head comprising a fully connected network for

classification and a convolution network for regression yielded the

best performance. However, even in architectures using separate-head,

features are produced based on the same proposal generated by the

region proposal network (RPN); therefore, the conflict persists. Song

et al. [7] employed a deformable pool in separate-head to construct a

task-aware spatial disentanglement head based on a Faster R-CNN [2],

which encodes the different features of two tasks based on the same

∗

Corresponding author.

E-mail addresses: chuj@nchu.edu.cn (J. Chu), leng@nchu.edu.cn (L. Leng).

proposal in the spatial dimension. However, features are entangled in

the feature pyramid network (FPN) of the neck before being transmitted

into the head; therefore, this conflict cannot be completely overcome

using the aforementioned method.

In this study, we extend the feature-disentanglement operation from

the head to the neck to address this conflict. To this end, an enhanced

disentanglement module (EDM) is proposed to replace the conventional

FPN. As depicted in Fig. 1, compared to FPN, EDM exhibits richer

semantic features for classification and more distinct edge features

around the boundary, facilitating regression. Although feature disen-

tanglement is typically conducted on two-stage detectors, the RPN

proposal is uncorrelated with the category and prefers a regression task.

Therefore, disentanglement of the RPN proposal lacks semantic infor-

mation. Fully Convolutional One-Stage Object Detection (FCOS) [5],

a popular and representative one-stage detector, exhibits satisfactory

performance and is easily modified [8]; therefore, it is selected as the

baseline for the evaluation of EDM.

Existing feature disentanglement methods output separate responses

of good quality for classification and regression, but the features of

the two tasks are independent after disentanglement. Therefore, the

responses of two tasks at the same location are typically inconsistent. As

a result, some inferior prediction results exhibit high classification con-

fidence (score), but low regression accuracy (intersection-over-union,

IoU) at the same spatial point. To resolve the problem of inconsistent

responses, examples that are good at both classification and regression

should be leveraged sufficiently. To ensure joint representation [9,

https://doi.org/10.1016/j.patcog.2023.109878

Received 30 August 2021; Received in revised form 3 August 2023; Accepted 8 August 2023

下载后可阅读完整内容，剩余9页未读，立即下载

DrYJ

粉丝: 40
资源: 24

解决单阶段目标检测中的特征混淆：增强解耦模块与响应对齐策略

FPN在目标检测中的应用与One-Stage算法优势

FCOS:一种全新的全卷积单阶段目标检测技术

视频目标检测FGFA框架：Flow-Guided Feature Aggregation研究

ThunderNet Towards Real-time Generic Object Detection

Object Detection in 20 Years A Survey目标检测综述论文组会汇报

Object detection at 200 Frames Per Second

Flow-Guided-Feature-Aggregation研究基于视频的目标检测FGFA框架.zip

world Scenarios: Sharing Practical Experience in Object Detection

Exploring the Future of YOLOv8: Cutting-edge Considerations in Deep Learning Object Detection ...

Handling Class Imbalance in YOLOv8 Object Detection Tasks

最新资源