深度学习驱动的多标签识别：图融合提升与关联模型探讨

需积分: 18 180 浏览量更新于2024-08-05 收藏 1.32MB PDF 举报

"《多标签分类中的标签图融合：一种深度学习与图卷积网络的新方法》（Multi-Label Classification with Label Graph Superimposing）是一篇深入探讨多标签图像和视频识别领域的顶级AI论文。随着深度学习技术的快速发展，多标签识别在性能上取得了显著进步。然而，如何有效地模型化标签间的关联，并在具有标签体系意识的特征学习中实现进一步提升，一直是研究中的关键问题。该论文的主要贡献在于提出了一种名为“标签图融合”的框架，旨在改进现有的基于图卷积网络（Graph Convolutional Network, GCN）和卷积神经网络（Convolutional Neural Network, CNN）的多标签识别方法。首先，作者通过构建基于统计的标签图来捕捉和利用类别之间的内在联系，这有助于增强模型对复杂关系的理解。这种方法通过将标签图与原始图像或视频特征进行融合，从而更好地整合了全局和局部信息，使得模型能够更准确地识别多个相关的对象或动作。其次，论文提出了一种创新的策略，即在卷积过程中结合标签图信息，这不仅提升了模型对标签依赖性的理解，还有助于挖掘潜在的共现模式。这有助于减少过拟合的风险，提高泛化能力，使得模型在面对大规模多标签数据时也能保持稳健性能。此外，作者还展示了他们在各种多标签识别任务上的实验结果，包括但不限于图像分类、视频行为识别等，这些实验证明了他们的方法在性能上超越了传统方法，并且在处理具有高维度和复杂关系的标签集时展现出明显优势。这篇论文为多标签分类问题提供了一个新的视角，即通过标签图融合，有效地解决了标签关联和特征学习中的挑战，有望在未来推动多模态智能系统的进一步发展。"

Multi-Label Classiﬁcation with Label Graph Superimposing

Ya Wang

$∗

, Dongliang He

‡∗

, Fu Li

‡

, Xiang Long

‡

, Zhichao Zhou

‡

, Jinwen Ma

$†

, Shilei Wen

‡

School of Mathematical Sciences and LMAM, Peking University, China

‡

Department of Computer Vision Technology (VIS), Baidu Inc., Beijing, China

{wangyachn@, jwma@math}.pku.edu.cn {hedongliang01, lifu, longxiang, zhouzhichao01, wenshilei}@baidu.com

Abstract

Images or videos always contain multiple objects or ac-

tions. Multi-label recognition has been witnessed to achieve

pretty performance attribute to the rapid development of deep

learning technologies. Recently, graph convolution network

(GCN) is leveraged to boost the performance of multi-label

recognition. However, what is the best way for label corre-

lation modeling and how feature learning can be improved

with label system awareness are still unclear. In this paper,

we propose a label graph superimposing framework to im-

prove the conventional GCN+CNN framework developed for

multi-label recognition in the following two aspects. Firstly,

we model the label correlations by superimposing label graph

built from statistical co-occurrence information into the graph

constructed from knowledge priors of labels, and then multi-

layer graph convolutions are applied on the ﬁnal superim-

posed graph for label embedding abstraction. Secondly, we

propose to leverage embedding of the whole label system

for better representation learning. In detail, lateral connec-

tions between GCN and CNN are added at shallow, mid-

dle and deep layers to inject information of label system

into backbone CNN for label-awareness in the feature learn-

ing process. Extensive experiments are carried out on MS-

COCO and Charades datasets, showing that our proposed so-

lution can greatly improve the recognition performance and

achieves new state-of-the-art recognition performance.

Introduction

Multi-label is a natural property of images or videos, it is

usually the case that a image or video contains multiple ob-

jects or actions. In the computer vision community, multi-

label recognition is a fundamental and practical task, and has

attracted increasing research efforts. Given the great suc-

cess of single label image/video classiﬁcation brought by

deep convolutional networks (He et al. 2015; Carreira and

Zisserman 2017; He et al. 2016a; Feichtenhofer et al. 2018;

Wu et al. 2019), multi-label recognition can achieve pretty

performance by naively treating each label as an indepen-

dent individual and applying multiple binary classiﬁcation

∗

equal contribution. This work was done when Ya Wang was a

full-time research intern at Baidu.

†

Corresponding author

 2020, Association for the Advancement of Artiﬁcial

! = 0.42

Sports Ball

Sports Ball,

Tenni s Racket

(a) Examples on MS-COCO

! = 0.20

Sitting on Couch

Sitting on Couch,

Watching Te le vision

(b) Examples on Charades

Figure 1: Examples of label relationship in multi-label

datasets. (a) illustrates the co-occurrence of “Sports Ball”

and “Tennis Racket” on the MS-COCO datasets, we can see

the frequency that “Tennis Racket” co-occurs with “Sports

Ball” is as high as 0.42. Similarly, (b) showcases an exam-

ple of “Sitting on Couch” and “Watching Television” from

the Charades dataset.

to predict whether a label presents or not. However, we ar-

gue that the following two aspects should be taken into con-

sideration for such a task.

First of all, labels co-occur in images or videos with pri-

ors. As illustrated in Figure 1, with great chance, “Sports

Ball” comes together with “Tennis Racket” and a man “Sit-

ting on Couch” is “Watching Television” simultaneously.

Then, a question is naturally raised, how to model the re-

arXiv:1911.09243v1 [cs.CV] 21 Nov 2019

下载后可阅读完整内容，剩余7页未读，立即下载

DeepLearning小舟

粉丝: 2394
资源: 57

深度学习驱动的多标签识别：图融合提升与关联模型探讨

multi-label-classification.pdf

PyTorch-Image-Models-Multi-Label-Classification-main.zip

Multi-Label-Text-Classification-master.zip_CNN 分类_cNN分类_designz

multi-dimensional classification via sparse label encoding.pdf

Multi-Label classification: Dealing with Imbalance by Combining Labels

multi-label classification by exploiting label correlations

Learning Label Specific Features for Multi-label Classification

多标记分类课件Multi-label Classification课件

GROUP SENSITIVE CLASSIFIER CHAINS FOR MULTI-LABEL CLASSIFICATION

A Shared-Subspace Learning Framework for Multi-Label Classification

最新资源