深度学习中的对象检测：内在交互关系推理网络

50 浏览量更新于2024-08-03 收藏 1.06MB PDF 举报

“Object Detection via Inner-Inter Relational Reasoning Network”是关于深度学习在对象检测领域的一篇研究文章，由He Liu, Xiuting You, Tao Wang, 和 Yidong Li等人撰写，发表于北京交通大学计算机与信息技术学院。该研究探讨了如何利用内在交互关系推理网络提升目标检测的性能。文章摘要指出，近年来，通过图消息传递机制来利用物体间或标签间的关系以促进目标检测的方法得到了广泛研究。然而，这些方法依赖于手工设计的图结构，可能会引入不可靠的关系，从而影响目标检测的准确性。针对这一问题，作者提出了一种新颖的目标检测框架，该框架充分利用了全注意力架构下的物体关系表示和标签表示。具体来说，他们将提取出的候选框（proposals）视为视觉特征空间中的独立集合，而候选标签则被视为标签嵌入空间中的独立集合。然后，他们设计了一个自注意力模块（self-attention module），用于在这些空间中发现并利用内在的相互关系。这种方法旨在摆脱对预定义关系的依赖，让模型能自动学习和理解场景中的复杂关系。关键词包括：目标检测、关系推理、注意力模型。这表明本文关注的重点在于如何利用深度学习中的注意力机制来增强目标检测中对象之间的关系推理能力，以提高模型的识别精度和鲁棒性。这篇论文贡献了以下几点： 1. 提出了一种新的目标检测框架，该框架无需依赖手工设计的图结构，而是利用全注意力架构自动学习物体和标签的内在关系。 2. 设计了自注意力模块，能够在视觉特征和标签嵌入空间中捕获和利用关系，以提升检测性能。 3. 通过这种方式，可能解决了传统方法因依赖人工定义的关系而导致的不准确性问题，有望提高目标检测的准确性和稳定性。这项工作对于理解和改进深度学习在目标检测领域的应用具有重要意义，特别是在处理复杂场景和多目标交互时，可以提供更加准确和灵活的解决方案。

Object detection via inner-inter relational reasoning network

He Liu, Xiuting You, Tao Wang

⁎

,YidongLi

School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

abstractarticle info

Article history:

Received 28 September 2022

Accepted 17 December 2022

Available online 29 December 2022

Keywords:

Object detection

Relational reasoning

Attention model

Exploiting relationships between objects and (or) labels under graph message passing mechanism to facilitate

object detection has been widely investigated in recent years. However, these methods heavily rely on hand-

crafted graph structures, which may introduce unreliable relationships and in turn hurt the object detection

performance. Aiming to address this issue, we propose a novel object detection framework that fully explores

the relational representations for objects and labels under a full attention architecture. Speciﬁcally, we directly

regard the extracted proposals and candidate labels as two independent sets in visual feature space and label

embedding space, respectively. And we design a self-attention module to discover the inner-relationships within

the visual feature space or label embedding space. In addition, a cross-attention module is developed to explore

the inter-relationships between the two spaces. Furthermore, both the inner-relationships and inter-

relationships are utilized to enhance the object features and label embedding representations to facilitate the

object detection. To validate the proposed framework in improving object detection performance, we embed it

into several state-of-the-art baselines and perform extensive experiments on two public benchmarks (named

Pascal VOC and COCO 2017). The experimental results demons trate the effecti veness and ﬂexibility of the

proposed framework.

1. Introduction

As a fundamental problem in image recognition community, ob-

ject detection aims to localize and classify the candidate bounding

boxes extracted f rom a given image, and has been wid ely used in

many realistic tasks such as visual survei llance [1] and automated

driving [2]. In general, object detection methods can be divid ed

into two groups including regression-based detection methods and

region-based detection methods. G iven an image, the regression-

based de tection methods take it as input and di rectly predict the

location and classiﬁcatio n of the ob jects. While the region-based

detection methods gener ally extract a series of region proposals

from Region Proposal Network (RPN) used to indicate the coarse loca-

tions of the candidate objects, and then pass the region proposals

into follow-up learnable modules to predict more precise locations

and categories.

Previous classic methods, including Faster R-CNN [3], Mask R-CNN

[4], YOLO [5] and SSD [6], deal with the location regression and classiﬁ-

cation on the extracted proposals individually, and pay less attention

to the relationships between them, thus making the limited

representation ability and leading to unsatisfactory performance.

Recently, several works that attempt to introduce relation between in-

stances into object detection via graph message passing mancin ism

have been proposed. For example, Liu et al. [7] explore the relationship

between global scene context and individual objects and enhance the

region feature using recurrent neural network (RNN). Li et al. [8] estab-

lish the relationship between image feature maps in feature pyramid

networks (FPN), and propose a dynamic feature fusion method based

on the graph convolution network (GCN) to enrich the representation

of image feature maps. In addition, several works [9–11] consider estab-

lishing the spatial position relationship of the region proposals

extracted from the RPN, and enhance the features of region proposals

via GCNs. Similarly, Li et a l. [12] introduced global scene features on

region-based relation graph, which ma kes the region proposal learn

both local and global features to enhance feature representation of re-

gion. Different from the above methods that mainly focus on exploring

the relationships within the visual feature space, several works [13,14]

consider es tablishing the relationship between category labels on a

constructed label graph, and enhance the feature representation of re-

gions by fusing the information of neighbors to improve the detection

performance of the detector.

Although the above methods have effectively improved the detec-

tion performance, they heavily rely on the heuristically generated

graph structure, which may impose noisy relationships in the graph

Image and Vision Computing 130 (2023) 104615

⁎ Corresponding author.

E-mail addresses: liuhe1996@bjtu.edu.cn (H. Liu), yxting@bjtu.edu.cn (X. You),

twang@bjtu.edu.cn (T. Wang), ydli@bjtu.edu.cn (Y. Li).

https://doi.org/10.1016/j.imavis.2022.104615

Contents lists available at ScienceDirect

Image and Vision Computing

journal homepage: www.elsevier.com/locate/imavis

下载后可阅读完整内容，剩余7页未读，立即下载

DrYJ

粉丝: 40

深度学习中的对象检测：内在交互关系推理网络

Compositional Language Understanding with Text-based Relational Reasoning.pdf

图机器学习峰会-3-6 Relational Reasoning with Rule Discovery.pdf

self-supervised-relational-reasoning:PyTorch正式实施的论文“用于表示学习的自我监督关系推理”，NeurIPS 2020聚焦

Object Freezer - Relational-开源

deep relational reasoning graph network for arbitrary shape text detection

Pikk - Object Relational Mapping ( ORM )-开源

object-relational-mapping

英文原版-Advanced Relational Programming 1st Edition

JGrinder Object/Relational Mapping-开源

MISTRAL - Processing Relational Queries Using a Multidimensional Access Method - 2000 - Slides-计算机科学

最新资源