深度学习模型：跨模态迁移实现零样本识别

需积分: 9 201 浏览量更新于2024-09-10 收藏 529KB PDF 举报

"深度学习在文本处理中的应用，2014年的ACL新文章" 这篇描述提及的深度学习资料专注于将深度学习应用于文本处理，特别是在2014年发表于ACL(Association for Computational Linguistics)的一篇论文中。论文标题是“通过跨模态转移实现零样本学习(Zero-Shot Learning Through Cross-Modal Transfer)”，由Richard Socher、Milind Ganjoo、Christopher D. Manning和Andrew Y. Ng共同撰写，他们均来自斯坦福大学计算机科学系。零样本学习(Zero-Shot Learning)是一种机器学习方法，它允许模型识别图像中的对象，即使对于特定的对象类别没有训练数据。在传统的机器学习模型中，模型需要大量的标注数据来学习识别新的类别。然而，该论文提出的新模型能够利用未标注的文本语料库，从中提取关于未见过的视觉类别的知识。这是零样本学习的一个重大突破，因为它不再局限于区分未见类别，而是可以同时处理已知和未知类别，从而在有大量训练图像的类别上达到最先进的性能，并对未见类别提供合理的识别效果。模型的关键在于将文本中的词分布视为理解物体外观的语义空间。通过深度学习模型，图像被映射到与其类别对应的语义词向量附近，这样就能将视觉信息与文本语义关联起来。这种方法不需要人为定义的语义或视觉特征，无论是针对词语还是图像，这极大地简化了模型的构建和训练过程。在实际应用中，这种技术可能被用来增强图像识别系统，使其能够识别之前未在训练集中出现的新类型对象。例如，在自然语言处理(NLP)领域，它可以用于智能助手，帮助理解用户提出的复杂查询，即便这些查询中包含了未在训练数据中出现的词汇或概念。此外，它还可以用于搜索引擎优化，提高对新话题搜索结果的相关性，或者在自动内容推荐系统中提供更个性化的建议。这篇论文展示了深度学习如何通过跨模态学习和零样本学习技术，将文本理解与图像识别相结合，以解决传统机器学习面临的数据限制问题，为文本处理和计算机视觉领域的研究提供了新的视角和工具。

Zero-Shot Learning Through Cross-Modal Transfer

Richard Socher, Milind Ganjoo, Christopher D. Manning, Andrew Y. Ng

Computer Science Department, Stanford University, Stanford, CA 94305, USA

richard@socher.org, {mganjoo, manning}@stanford.edu, ang@cs.stanford.edu

Abstract

This work introduces a model that can recognize objects in images even if no

training data is available for the object class. The only necessary knowledge about

unseen visual categories comes from unsupervised text corpora. Unlike previous

zero-shot learning models, which can only differentiate between unseen classes,

our model can operate on a mixture of seen and unseen classes, simultaneously

obtaining state of the art performance on classes with thousands of training im-

ages and reasonable performance on unseen classes. This is achieved by seeing

the distributions of words in texts as a semantic space for understanding what ob-

jects look like. Our deep learning model does not require any manually deﬁned

semantic or visual features for either words or images. Images are mapped to be

close to semantic word vectors corresponding to their classes, and the resulting

image embeddings can be used to distinguish whether an image is of a seen or un-

seen class. We then use novelty detection methods to differentiate unseen classes

from seen classes. We demonstrate two novelty detection strategies; the ﬁrst gives

high accuracy on unseen classes, while the second is conservative in its prediction

of novelty and keeps the seen classes’ accuracy high.

1 Introduction

The ability to classify instances of an unseen visual class, called zero-shot learning, is useful in sev-

eral situations. There are many species and products without labeled data and new visual categories,

such as the latest gadgets or car models, that are introduced frequently. In this work, we show how

to make use of the vast amount of knowledge about the visual world available in natural language

to classify unseen objects. We attempt to model people’s ability to identify unseen objects even if

the only knowledge about that object came from reading about it. For instance, after reading the

description of a two-wheeled self-balancing electric vehicle, controlled by a stick, with which you

can move around while standing on top of it, many would be able to identify a Segway, possibly after

being brieﬂy perplexed because the new object looks different from previously observed classes.

We introduce a zero-shot model that can predict both seen and unseen classes. For instance, without

ever seeing a cat image, it can determine whether an image shows a cat or a known category from

the training set such as a dog or a horse. The model is based on two main ideas.

Fig. 1 illustrates the model. First, images are mapped into a semantic space of words that is learned

by a neural network model [15]. Word vectors capture distributional similarities from a large, unsu-

pervised text corpus. By learning an image mapping into this space, the word vectors get implicitly

grounded by the visual modality, allowing us to give prototypical instances for various words. Sec-

ond, because classiﬁers prefer to assign test images into classes for which they have seen training

examples, the model incorporates novelty detection which determines whether a new image is on the

manifold of known categories. If the image is of a known category, a standard classiﬁer can be used.

Otherwise, images are assigned to a class based on the likelihood of being an unseen category. We

explore two strategies for novelty detection, both of which are based on ideas from outlier detection

methods. The ﬁrst strategy prefers high accuracy for unseen classes, the second for seen classes.

Unlike previous work on zero-shot learning which can only predict intermediate features or differ-

entiate between various zero-shot classes [21, 27], our joint model can achieve both state of the art

accuracy on known classes as well as reasonable performance on unseen classes. Furthermore, com-

pared to related work on knowledge transfer [21, 28] we do not require manually deﬁned semantic

下载后可阅读完整内容，剩余9页未读，立即下载

tghdlut

粉丝: 0
资源: 3

深度学习模型：跨模态迁移实现零样本识别

深度学习资料包

鱼眼感知和数学基础-深度学习资料-1119.zip

分享一套Matlab深度学习资料

作为深度学习的研究生，你为我推荐一些学习深度学习模型部署方面的资料和学习路线

MATLAB 深度学习语音降噪相关的资料

csdn 深度学习 项目

深度学习500问 pdf

李宏毅深度学习笔记 pdf

深度学习demo平台

深度学习算法习题手册

最新资源

csdn 深度学习项目