基于LDA的主题模型：降低维度并挖掘图像高级语义

需积分: 5 119 浏览量更新于2024-08-12 收藏 861KB PDF 举报

本文主要探讨了一种新型的图像表示方法——基于潜在狄利克雷分配（Latent Dirichlet Allocation, LDA）的主题模型在图像检索中的应用。在当前的图像检索工作中，传统的词袋模型（Bag-of-Words, BoW）被广泛采用，它通过统计图像中每个视觉词汇的出现频率来构建图像特征向量。然而，随着图像数量的急剧增长，词典的维度也随之大幅增加，这不仅带来了巨大的存储和计算资源消耗，还限制了模型处理大规模数据的能力。作者针对这些问题，提出了一种名为“Bag-of-Topics”（BoT）的策略，将图像表示从低层次的视觉词汇提升到抽象的主题层次。BoT利用LDA这种概率主题模型来挖掘图像中的潜在主题，这些主题不仅仅是词汇的集合，它们包含了更为高级别的语义信息。LDA通过学习文本数据中的主题分布，能够识别出图像中的共同话题，从而降低特征维度，并增强对图像内容的语义理解。在BoT方法中，首先通过LDA对图像中的单词进行建模，每个文档（图像）被视为一个由不同主题混合而成的概率分布，而每个主题又被视为一组相关的单词。这样做的好处在于，即使词汇表的大小保持不变，主题的数量可以根据实际需求灵活调整，有效地解决了高维稀疏问题。此外，由于主题模型捕捉到了更深层次的语义关联，因此能够提高图像检索的准确性和效率，尤其是在处理大量图像时。为了实现BoT，研究者们可能采用了迭代的算法来估计单词和主题的分布，同时优化模型参数以最大化数据的似然性。实验部分可能会展示BoT模型与BoW模型在各种图像检索任务上的性能比较，以及它在减少存储需求、提升检索精度方面的优势。总结来说，这篇文章的核心贡献是引入了基于LDA的主题模型作为图像表示的新方法，旨在提高图像检索的效率和语义理解能力，尤其适用于大规模图像数据集。这种方法通过挖掘主题信息，降低了特征维度，为图像检索领域的未来发展提供了一个有前景的研究方向。

A More Effective Method For Image Representation: Topic Model Based on Latent

Dirichlet Allocation

Zongmin Li

∗

, Weiwei Tian

∗

, Yante Li

∗

, Zhenzhong Kuang

†

and Yujie Liu

∗

College of Computer and Communication Engineering

China University of Petroleum, Qingdao, China

Email: tianwei210@gmail.com

†

School of Geosciences

China University of Petroleum, Qingdao, China

Abstract—Nowadays, the Bag-of-words(BoW) representation

is well applied to recent state-of-the-art image retrieval works.

However, with the rapid growth in the number of images, the

dimension of the dictionary increases substantially which leads

to great storage and CPU cost. Besides, the local features do

not convey any semantic information which is very important

in image retrieval. In this paper, we propose to use “topics”

instead of “visual words” as the image representation by topic

model to reduce the feature dimension and mine more high-

level semantic information. We call this as Bag-of-Topics(BoT)

which is a type of statistical model for discovering the abstract

“topics” from the words. We extract the topics by Latent

Dirichlet Allocation (LDA) and calculate the similarity between

the images using BoT model instead of BoW directly. The

results show that the dimension of the image representation

has been reduced signiﬁcantly, while the retrieval performance

is improved.

Keywords-image retrieval; Bag-of-Words; Bag-of-Topics;

topics; Latent Dirichlet Allocation;

I. INTRODUCTION

The Bag-of-Words model was ﬁrst used in the methods of

document retrieval which uses the distribution of the words

to describe the document and retrieve the similarity on the

basis of the description. Recently, the Bag-of-Words model

has also been used for image retrieval [1] and it has been

proved to be one of the best methods [2], [3], [4], [10], [17]

as for now.Then many state-of-the-art methods build on it. In

the BoW model, local features such as the SIFT descriptors

[16] are extracted from the image and quantized to visual

words by using pre-trained codebook. The similarity match-

ing between the images based on visual words overcomes

the computational problem which is caused by matching the

original points one by one. So, the BoW model is suitable

for large scale settings.

However, with the data expanding sharply, the low-

dimension dictionary cannot have a good performance.

Flickr60k [4] trained dictionaries with different dimensions

for the same experiment, such as 10k,20k,30k and so on. The

results show that the retrieval performance becomes better

with the number of the dictionary dimension increasing.

However, at the same time, the speed of the retrieval has



Figure 1: The examples of image retrieval from Holidays

database. For each query(left), the ﬁrst row lists the results

obtained by the baseline(BoW) while the second row lists

the results obtained by our method. The dimension of the

dictionary in baseline is 20K, while there are only 128 topics

are learnt in our method.

a decrease because every image is represented by high

dimensional frequency vector, even though the inverted

index was employed.

Topic model is a type of statistical model which is ﬁrst

proposed in the ﬁeld of natural language processing for

discovering the abstract “topic” distribution by analyzing

the correlation of the documents and words in the corpus.

In this paper, we propose to solve the problem in image

retrieval by referring to the deﬁnition of topic and use Latent

Dirichlet Allocation (LDA) [8] to extract the image topics

2015 14th International Conference on Computer-Aided Design and Computer Graphics

DOI 10.1109/CADGRAPHICS.2015.19

143

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38730840

粉丝: 2
资源: 968

基于LDA的主题模型：降低维度并挖掘图像高级语义

清华出品 机器学习技术课程 统计学习方法第二版系列课程 第20章 潜在狄利克雷分配分布 共106页.pptx

lda_evaluation:使用潜在狄利克雷分配（LDA）评估主题模型

labs.mallet-tools.scala:使用潜在狄利克雷分配查找书籍主题的实验代码

潜在狄利克雷分配：潜在狄利克雷分配-matlab开发

基于潜在狄利克雷分配的图像检索

lda：潜在狄利克雷分配

minilda:基于吉布斯采样的潜在狄利克雷分配（LDA）的实现

基于潜在狄利克雷分配的层次动作识别模型

lda_tweets: 探索推文的潜在狄利克雷分配模型

服务主题检测：改进的潜在狄利克雷分配法

最新资源

清华出品机器学习技术课程统计学习方法第二版系列课程第20章潜在狄利克雷分配分布共106页.pptx