百科增强语义嵌入提升零样本学习性能

研究论文

142 浏览量更新于2024-08-27 收藏 448KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

本文探讨了增强语义嵌入在零-shot学习中的应用，针对现实世界中除图像数据集中存在的大量对象类别，零-shot学习的目标是识别训练集中未见过的图像类别。以往的许多零-shot学习模型倾向于直接使用类别标签的词向量作为语义嵌入空间中的原型，然而这种做法往往无法充分获取图像类别的全局知识。传统的词向量仅依赖于词汇表中的局部信息，对于表达一个类别整体概念的复杂性和上下文理解有所欠缺。因此，作者提出了一种新型的百科全书增强语义嵌入（Encyclopedia Enhanced Semantic Embedding, ESEE）模型，旨在提升词向量原型的区分能力，同时引入每个图像类别对应的全局知识。 ESEE模型的核心创新在于从百科全书中提取TF-IDF关键词。TF-IDF是一种统计方法，它衡量了一个词在文档中的重要性，即词频（Term Frequency, TF）与逆文档频率（Inverse Document Frequency, IDF）的乘积。通过这种方式，模型能够捕获每个类别特有的、具有代表性的信息，这些信息超越了简单词向量所包含的词汇表关联，从而提供更丰富的全球视野。具体来说，模型首先对百科文章进行处理，提取关键信息并计算TF-IDF值。然后，这些关键词被用来扩展或增强原始词向量，形成一个新的嵌入表示，它结合了局部词向量的精确性和全局知识的丰富性。这样，当遇到未见过的类别时，模型能够利用这些额外的信息进行更准确的预测，提高了零-shot学习的性能。此外，ESEE模型还可能采用了有效的组合策略，例如线性或非线性组合，以整合词向量和关键词嵌入，形成一个综合的、更具区分度的特征向量。这有助于模型在无监督情况下学习到不同类别之间的关系，进一步优化了跨领域的图像识别能力。总结来说，本文提出了一种创新的方法，通过百科全书辅助的语义嵌入，增强了零-shot学习的性能，特别是在处理现实世界中广泛存在的多样性和未知类别上。这种方法不仅展示了如何利用外部知识源来弥补词向量的局限，而且可能为未来的研究提供一种新的思路，即如何在深度学习模型中更好地融合结构化和非结构化数据。

资源详情

资源推荐

ENCYCLOPEDIA ENHANCED SEMANTIC EMBEDDING FOR ZERO-SHOT LEARNING

Zhen Jia

1,2

, Junge Zhang

1,2

, Kaiqi Huang

1,2,3

, Tieniu Tan

1,2,3

CRIPAC & NLPR, Institute of Automation, Chinese Academy of Sciences

University of Chinese Academy of Sciences

CAS Center for Excellence in Brain Science and Intelligence Technology

{zhen.jia, jgzhang, kqhuang, tnt}@nlpr.ia.ac.cn

ABSTRACT

There are tremendous object categories in the real world

besides those in image datasets. Zero-shot learning aims to

recognize image categories which are unseen in the training

set. A large number of previous zero-shot learning models

use word vectors of the class labels directly as category pro-

totypes in the semantic embedding space. But word vectors

cannot obtain the global knowledge of an image category suf-

ﬁciently. In this paper, we propose a new encyclopedia en-

hanced semantic embedding model to promote the discrim-

inative capability of word vector prototypes with the global

knowledge of each image category. The proposed model ex-

tracts the TF-IDF key words from encyclopedia articles to

acquire the global knowledge of each category. The con-

vex combination of the key words’ word vectors acts as the

prototypes of the object categories. The prototypes of seen

and unseen classes build up the embedding space where the

nearest neighbour search is implemented to recognize the un-

seen images. The experiments show that the proposed method

achieves the state-of-the-art performance on the challenging

ImageNet Fall 2011 1k2hop dataset.

Index Terms— zero-shot learning, image classiﬁcation

1. INTRODUCTION

Image classiﬁcation has gained huge progress in recent years,

due to the impressive improvement of deep learning methods,

such as convolutional neural networks (CNNs) [1, 2, 3, 4],

and large scale datasets [5]. Some CNN-based image classi-

ﬁcation methods [6] even trump human performance on Im-

ageNet classiﬁcation task. Meanwhile, almost all successful

image classiﬁcation methods mentioned above are supervised

models, which take large scale captioned image data to get

convergent. Early research on human cognition [7] shows

that human have the ability to recognize more than 30,000

object categories and objects with components removed or

non-rigid deformation. What’s more, human can recognize

objects they’ve never seen before. For instance, human can

easily tell apart different cat categories by just reading their

text descriptions. A child can also recognize a zebra at the

ﬁrst sight if he has seen a horse before and known that a zebra

looks like a horse with white and black stripes. We strongly

hope that the machine image classiﬁcation systems have the

similar ability as human beings to transfer knowledge from

other modalities to visual area, i.e. to recognize image cate-

gories which don’t appear in the training set.

Zero-shot learning (ZSL) aims to deal with image clas-

siﬁcation task in which the test categories have no overlap

with training categories. This topic draws an increasing atten-

tion of computer vision researchers. Many computer vision

and machine learning methods, such as probabilistic mod-

els [8, 9, 10], canonical correlation analysis [11, 12], met-

ric learning methods [13, 14] and graphical models [15] are

exploited to solve the ZSL problem. In order to classify un-

seen images, the ﬁrst step is to build a semantic embedding

space where all the image classes are represented as their

prototypes. Attribute features, word vectors and image de-

scriptions of the categories are the typical side information to

form the embedding space. C. Lampert et al. [8, 9] come up

with probabilistic models – direct and indirect attribute pre-

diction models (DAP and IAP) to predict the unseen images

using their attribute features as prototypes. The deep visual-

semantic embedding model (DeViSE) [16] maps CNN image

features to the word vector embedding space. DeViSE model

explore the semantic and syntactic properties of word vector

as shown in [17]. Recently, Z. Akata et al. [18, 19] utilize im-

age descriptions as side information to build the embedding

space. In these three kinds of side information, word vectors

demonstrate more advantages than attribute features and im-

age descriptions to materialize prototypes, because they are

liberated from human annotations which are quite expensive

and time consuming. Thus word vector is an ideal prototype

to solve large scale ZSL problem.

Many proposed ZSL methods [11, 12, 16, 20, 21, 22] use

the word vectors of the class labels as the classiﬁcation pro-

totypes directly, which has negative effect on zero-shot clas-

siﬁcation. The word vectors extracting algorithms, such as

skip-gram method [17], usually set the size of training win-

dow to a small number that makes word vectors unable to gain

the global knowledge of a category in the corpus. The global

knowledge is the more comprehensive and scientiﬁc repre-

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38571603

粉丝: 3
资源: 926

百科增强语义嵌入提升零样本学习性能

encyclopedia-of-chart-patterns-3rd.pdf

Encyclopedia of Mathematical Physics

encyclopedia of mathematical physics (five-volume…

paddlenlp.embeddings.list_embedding_name()

怎么利用paddle找出"king - man + woman"的相近词

使用python实现一段代码，能够完整保存网页内容、图片、音频文件到指定目录，保存的网页内容可在本地浏览器打开直接还原页面的动态效果。并使用 https://info.support.huawei.com/info-finder/encyclopedia/zh/WiFi.html 进行代码正确性验证

Internet Encyclopedia网页

encyclopedia of computational mechanics

yulab-smu/createkeggdb

RNA-seqKEGG富集

哪个网站能下载癌症单细胞的数据

matlab多尺度能量熵代码

将全球划分为五个温度带，界限分别为北回归线北极圈，请给出支撑性链接，或者文献pdf

Kegg数据库分级注释

哪个网站能找到癌症单细胞的数据

神经功能相关的KEGG注释

基因组的GO和KEGG注释

eggnog怎么注释哪个kegg属于哪个菌

R语言KEGG分析代码

最新资源