端到端深度结构约束提升零样本学习性能

PDF格式 | 669KB | 更新于2024-08-26 | 115 浏览量 | 举报

零射击学习（Zero-Shot Learning, ZSL）是一种先进的计算机视觉技术，其目标是在没有直接训练样本的情况下识别新类别。在传统的ZSL方法中，通常采取一个分两步的流程：首先，从预训练的卷积神经网络（Convolutional Neural Network, CNN）中提取图像特征，这些特征是通用的但可能缺乏针对ZSL任务所需的特定结构语义信息。这一步骤的局限在于，它假设已有的特征可以很好地适应所有未见过的类别，而实际上，每个类别之间可能存在着独特的视觉和语义关联。为了克服这个问题，本文提出了深度语义结构约束（Deep Semantic Structural Constraints, DSSC）模型。DSSC是一个端到端可训练的框架，它整合了两个关键组件：图像特征结构约束和语义嵌入结构约束。图像特征结构约束部分，通过对CNN特征提取过程进行优化，确保学习到的特征不仅具有良好的表示能力，还能保留原有的视觉结构信息。这样，即使面对未知类别，模型也能更好地理解和解释它们的特性。另一方面，语义嵌入结构约束着重于构建一个能反映类别间语义关系的嵌入空间。通过将结构化的语义信息融入学习过程中，模型能够捕捉到类别之间的内在联系，从而提供额外的学习辅助线索。这种结构化的学习方式使得模型在面对零镜头学习任务时，能够更有效地迁移和泛化，提升对未见过类别的分类准确性。 DSSC模型的优势在于其整体性和灵活性，它能够同时处理视觉特征和语义信息，实现两者之间的协同作用。实验结果表明，相比于传统的两步法，DSSC在零射击学习任务上的性能得到了显著提升，证明了它在保留图像结构的同时，增强了嵌入空间的泛化能力。这对于实际应用中的图像分类、物体识别和图像检索等场景具有重要的意义，推动了零射击学习领域的前沿研究。

Deep Semantic Structural Constraints for Zero-Shot Learning

Yan Li

∗ 1,2

, Zhen Jia

∗ 1,2

, Junge Zhang

1,2

, Kaiqi Huang

1,2,3

, Tieniu Tan

1,2,3

CRIPAC & NLPR, Institute of Automation, Chinese Academy of Sciences

University of Chinese Academy of Sciences

CAS Center for Excellence in Brain Science and Intelligence Technology

yan.li@cripac.ia.ac.cn, {zhen.jia, jgzhang, kqhuang, tnt}@nlpr.ia.ac.cn

Abstract

Zero-shot learning aims to classify unseen image categories

by learning a visual-semantic embedding space. In most

cases, the traditional methods adopt a separated two-step

pipeline that extracts image features from pre-trained CNN

models. Then the ﬁxed image features are utilized to learn

the embedding space. It leads to the lack of speciﬁc structural

semantic information of image features for zero-shot learn-

ing task. In this paper, we propose an end-to-end trainable

Deep Semantic Structural Constraints model to address this

issue. The proposed model contains the Image Feature Struc-

ture constraint and the Semantic Embedding Structure con-

straint, which aim to learn structure-preserving image fea-

tures and endue the learned embedding space with stronger

generalization ability respectively. With the assistance of se-

mantic structural information, the model gains more auxiliary

clues for zero-shot learning. The state-of-the-art performance

certiﬁes the effectiveness of our proposed method.

Introduction

As one of the most basic problems in the computer vision

area, image classiﬁcation methods gain huge progress in re-

cent years with the impressive development of deep learn-

ing. Although ResNet (He et al. 2016), an outstanding repre-

sentation of the Convolutional Neural Network (CNN) clas-

siﬁcation models, gets the top-5 error rate as low as 3.57%

on ImageNet classiﬁcation task, its classiﬁcation ability is

still limited to the image categories in the training dataset.

The limitation that models can only classify image cate-

gories within the training set, restricts them to become more

intelligent as human beings. For a simple example, human

beings are able to classify different kinds of animals by just

reading their descriptions rather than seeing them. More and

more researchers try to break through this limitation by in-

troduce Zero-Shot Learning (ZSL) into image classiﬁcation

(Lampert, Nickisch, and Harmeling 2009; Frome et al. 2013;

Norouzi et al. 2013; Socher et al. 2013; Fu et al. 2015;

Akata et al. 2015; Romera-Paredes and Torr 2015; Bucher,

Herbin, and Jurie 2016; Akata et al. 2016; Huang, Loy,

and Tang 2016; Changpinyo et al. 2016; Xian et al. 2017;

Morgado and Vasconcelos 2017).

∗

The ﬁrst two authors contributed equally to this work.

 2018, Association for the Advancement of Artiﬁcial

Zero-shot learning seeks to make image classiﬁcation

models able to classify image categories which never ap-

pear in the training dataset. In the zero-shot learning task,

we refer to the image categories in the training set as seen

classes and those in the test set as unseen classes. The

category characteristic of unseen classes are learned from

the side information, i.e., the semantic features of the im-

ages. The commonly used side information could be the hu-

man annotated attribute features of images (Lampert, Nick-

isch, and Harmeling 2009; Akata et al. 2016), the text

descriptions of the image categories (Reed et al. 2016),

word vectors of the category labels (Frome et al. 2013;

Norouzi et al. 2013) and so on.

A large number of previous state-of-the-art methods focus

on building a common space where image features and se-

mantic features are embedded (Frome et al. 2013; Socher et

al. 2013; Akata et al. 2015; Romera-Paredes and Torr 2015;

Akata et al. 2016). The embedding space is built on the

correspondence between the seen images and their seman-

tic features. Then in the test stage, unseen image features

will be mapped to the embedding space where the classiﬁ-

cation method, such as nearest neighbour (NN) search, can

be operated easily. Most of these methods adopt a separated

two-step pipeline, i.e., extracting image features from pre-

trained CNN models and using ﬁxed image features to learn

the embedding space.

However, we argue that separating the image feature ex-

traction and the embedding space construction harms the

ZSL models severely. The separation leads to the result

that models cannot regulate the image features for the spe-

ciﬁc ZSL task during training. What’s more, the image fea-

tures extracted from a ﬁxed pre-trained CNN model will not

capture the plentiful semantic information in the side in-

formation. The semantic information of human annotated

attributes, text descriptions or word vectors constructs the

semantic structure of a speciﬁc category. We believe that

combining the learning of image features and embedding

space in an end-to-end manner, meanwhile, incorporating

the structural information into the whole learning process

would contribute to much better zero-shot performance.

In this paper, we come up with a new Deep Semantic

Structural Constraints (DSSC) model for zero-shot learning

looking forward to training the model in an end-to-end style

and using the semantic structural information to supervise

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38553478

粉丝: 7

端到端深度结构约束提升零样本学习性能

基于深度学习的语义通信系统.pdf

学习深度学习语义分割总结

MATLAB深度学习实战：从零到迁移学习与语义分割

深度学习语义分割Unet网络高效学习与测试

CamVid数据集：深度学习的语义分割资源

Matlab深度学习在语义图像分割中的应用

深度学习与语义表示：刘知远清华讲座精华

深度学习图像语义分割工具BiSeNet教程与实践

深度学习在语义分割技术中的应用研究

基于判别式稀疏分解的零射击学习新方法

最新资源