连续视觉特征模型：解决语义图像标注与检索的挑战

164 浏览量更新于2024-08-28 收藏 1.08MB PDF 举报

本文主要探讨了"Modeling continuous visual features for semantic image annotation and retrieval"这一主题，它针对自动图像标注这一具有挑战性的问题提出了新的方法。在传统自动图像标注中，由于存在所谓的“语义鸿沟”（Semantic Gap），即图像内容与描述词汇之间的理解差异，准确地捕捉和表达图像的深层含义是一项关键任务。为了克服这个难题，研究者们尝试扩展概率隐含语义分析（Probabilistic Latent Semantic Analysis, PLSA）模型，使其能够处理连续的视觉特征。在论文中，作者首先介绍了如何将PLSA从离散模型转变为处理连续量的模型。这一步涉及对PLSA的概率模型进行修改，使得模型能够适应连续数据的分布特性。通过这种方式，模型可以更精细地捕捉图像中的细微变化和复杂关系。接着，作者设计并推导出了一种对应的期望-最大化（Expectation-Maximization, EM）算法，用于估计这种连续PLSA模型的参数，从而优化模型的性能。针对不同模态的数据（如RGB图像、深度图像或红外图像等），每种数据可能具有独特的特征和表示方式。因此，作者进一步提出了一种融合了连续PLSA和标准PLSA的多模态图像标注模型。这种模型能够根据各个模态的特点，同时考虑它们之间的相互影响，从而提高图像标注的准确性和鲁棒性。此外，文章还探讨了该模型在图像检索任务中的应用。由于连续PLSA能够捕获图像的语义信息，因此它有助于构建一个有效的检索系统，使得用户可以通过输入自然语言描述来快速找到最相关的图像，或者通过提供图片来获取精确的标签描述。这对于大规模图像数据库管理和信息检索有着重要的实际价值。总结来说，本文的核心贡献在于提出了一种新的图像标注和检索框架，通过结合连续PLSA和EM算法，有效地解决了图像语义表示和理解的问题。这种方法有望在解决图像理解和检索中的语义鸿沟方面取得突破，为未来的计算机视觉和人工智能领域提供有力支持。

Modeling continuous visual features for semantic image annotation and retrieval

Zhixin Li

a,b,

⇑

, Zhiping Shi

, Xi Liu

, Zhongzhi Shi

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

College of Computer Science and Information Technology, Guangxi Normal University, Guilin 541004, China

article info

Article history:

Received 13 October 2009

Available online 17 November 2010

Communicated by H.H.S. Ip

Keywords:

Automatic image annotation

Continuous PLSA

Latent aspect model

Semantic gap

Image retrieval

abstract

Automatic image annotation has become an important and challenging problem due to the existence of

semantic gap. In this paper, we ﬁrstly extend probabilistic latent semantic analysis (PLSA) to model con-

tinuous quantity. In addition, corresponding Expectation–Maximization (EM) algorithm is derived to

determine the model parameters. Furthermore, in order to deal with the data of different modalities in

terms of their characteristics, we present a semantic annotation model which employs continuous PLSA

and standard PLSA to model visual features and textual words respectively. The model learns the

correlation between these two modalities by an asymmetric learning approach and then it can predict

semantic annotation precisely for unseen images. Finally, we compare our approach with several

state-of-the-art approaches on the Corel5k and Corel30k datasets. The experiment results show that

our approach performs more effectively and accurately.

1. Introduction

Content-based image retrieval (CBIR) has been studied and

explored for decades. Its performance, however, is not ideal en-

ough due to the notorious semantic gap (Smeulders et al., 2000).

CBIR retrieves images in terms of their visual features, while users

often prefer intuitive text-based image searching. Since manual

image annotation is expensive and difﬁcult to be extended to large

image databases, automatic image annotation has emerged as a

striking and crucial problem (Datta et al., 2008).

The state-of-the-art techniques of image auto-annotation can

be roughly categorized into two different schools of thought. The

ﬁrst one deﬁnes auto-annotation as a traditional supervised classi-

ﬁcation problem (Chang et al., 2003; Li and Wang, 2003; Cusano

et al., 2004; Carneiro et al., 2007), which treats each word (or

semantic concept) as an independent class and creates different

classiﬁers for every word. This approach computes similarity at

the visual level and annotates a new image by propagating the cor-

responding words. The second perspective takes a different stand

and treats images and texts as equivalent data. It attempts to dis-

cover the correlation between visual features and textual words on

an unsupervised basis, by estimating the joint distribution of fea-

tures and words. Thus, it poses annotation as statistical inference

in a graphical model. Under this perspective, images are treated

as bags of words and features, each of which are assumed gener-

ated by a hidden variable. Various approaches differ in the deﬁni-

tion of the states of the hidden variable: some associate them with

images in the database (Jeon et al., 2003; Lavrenko et al., 2003;

Feng et al., 2004), while others associate them with image clusters

(Duygulu et al., 2002; Barnard et al., 2003) or latent aspects (topics)

(Blei and Jordan, 2003; Monay and Gatica-Perez, 2007; Zhang et al.,

2005).

As latent aspect models, PLSA (Hofmann, 2001) and latent

Dirichlet allocation (LDA) (Blei et al., 2003) have been successfully

applied to annotate and retrieve images. PLSA-WORDS (Monay

and Gatica-Perez, 2007) is a representative approach, which

achieves the annotation task by constraining the latent space to

ensure its consistency in words. However, since standard PLSA

can only handle discrete quantity (such as textual words), this ap-

proach quantizes feature vectors into discrete visual words for

PLSA modeling. Therefore, its annotation performance is sensitive

to the clustering granularity. In the area of automatic image anno-

tation, it is generally believed that using continuous feature vectors

will give rise to better performance (Lavrenko et al., 2003; Blei and

Jordan, 2003; Zhang et al., 2005; Li et al., 2010). In order to model

image data precisely, it is required to deal with continuous quan-

tity using PLSA.

This paper proposes continuous PLSA, which assumes that fea-

ture vectors in an image are governed by a Gaussian distribution

under a given latent aspect other than a multinomial one. In addi-

tion, corresponding EM algorithm is derived to estimate the

parameters. Then, as general treatment, each image can be treated

as a mixture of Gaussians under this model. Furthermore, based on

the continuous PLSA and the standard PLSA, we present a semantic

doi:10.1016/j.patrec.2010.11.015

⇑

Corresponding author at: Key Laboratory of Intelligent Information Processing,

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190,

China. Tel.: +86 10 62600506; fax: +86 10 82610254.

E-mail addresses: lizx@ics.ict.ac.cn, lizx@gxnu.edu.cn (Z. Li), shizp@ics.ict.ac.cn

(Z. Shi), liux@ics.ict.ac.cn (X. Liu), shizz@ics.ict.ac.cn (Z. Shi).

Pattern Recognition Letters 32 (2011) 516–523

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier.com/locate/patrec

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38640168

粉丝: 6
资源: 959

连续视觉特征模型：解决语义图像标注与检索的挑战

Modeling Continuous-time Event Data with Neural Temporal Point P

Modeling Image Data for Effective Indexing and Retrieval in Large General Image Databases

[GA458]IBM Pattern Modeling and Analysis Tool for Java Garbage Collector

practical poissonian-gaussian noise modeling and fitting for single-image ra

Visual Studio Modeling Tools怎么安装

Visual Studio Modeling Tools在哪下

Visual Studio 2015没有Modeling Projects

Visual Studio Modeling Tools在哪

topic modeling matlab

multi-head self-attention进行特征学习，和CNN，RNN进行特征学习的区别

最新资源