LDA驱动的图像检索提升：融合空间与语义信息

96 浏览量更新于2024-08-26 收藏 408KB PDF 举报

本文主要探讨了基于潜在狄利克雷分配（Latent Dirichlet Allocation, LDA）的图像检索方法。在计算机视觉领域，传统的 Bag-of-Visual-Words (BoVW) 模型因其简单易用而广泛使用，然而它存在两个主要缺点：首先，BoVW 忽视了图像中的空间信息；其次，它并未充分考虑视觉词之间的语义关联。为了克服这些局限性，研究人员提出了一个结合LDA主题模型与视觉语言模型（Visual Language Model, VLM）的新型图像表示策略。 LDA是一种概率图模型，能够识别文本数据中的潜在主题，并为每个文档赋予主题分布。在图像检索中，LDA可以捕捉到视觉词汇间的潜在语义关系，从而增强对图像内容的理解。然而，单纯使用LDA模型可能会导致性能下降，因此作者将视觉语言模型与LDA模型相结合，通过线性组合的方式，形成一个既包含空间信息又包含语义关联的图像表示。实验对比是在一个定制的数据集上进行的，该数据集包含了最新的技术，如BoVW（Bag-of-Visual-Words）、LLC（Local Linear Coding）、SPM（Sparse Representation-based Classification）以及原始的VLM。实验结果显示，基于LDA和VLM融合的图像检索方法在性能上显著优于上述传统方法，这表明该方法在保持空间信息的同时，有效地利用了视觉词的语义联系，提高了图像检索的准确性和效率。关键词：图像检索、潜在狄利克雷分配、视觉语言模型、查询似然模型、平滑技术。该研究不仅深化了我们对图像检索技术的理解，也为图像检索领域的未来发展提供了一种有前景的改进策略，特别是在处理大规模图像数据和复杂场景时，其优势将更为明显。通过将LDA的统计建模能力和VLM的直观表达结合起来，研究人员为解决图像检索中的语义理解和空间信息保留问题开辟了新的途径。

By using the visual language model, each image can be represented as a distribution

of probabilities. In Sect. 3, the value of the parameter

is set to 1000 by manually

selecting the optimal value.

2.2 LDA-Based Topic Model of Image

Latent Dirichlet allocation (LDA) is a topic model that be proposed by reference [13].

LDA was ﬁrst applied to the text ﬁelds originally, it can represent per document’s topic

as a probability distribution. We calculate the probability of a word in a document by

the following formula.







θ,







z=1

(



)

⋅ P







θ, d



(3)

Where z is a topic selected from the Dirichlet distribution

𝜃

and Z is the total number

of topics.



and



are the posterior estimates of θ which is a multinomial distribution

over document and

which is a multinomial distribution over topic, respectively. The

LDA model is diﬃcult to calculate exactly due to its complexity. So, it is an approximate

method which is used to calculate, we obtain



and



directly by using Gibbs sampling

[14]. Their formulas are

deﬁned

as follows.



(w)

+β



v=1

(v)

+ V ⋅ β



θ=

(d)

+α



t=1

(d)

+ T ⋅ α

(4)

Where V and T are the number of words and topics, respectively.

)

is the number of

word w assigned to topic j and

)

is the number of words in document d assigned to

topic j. And

and

are hyper-parameters. Therefore,



v=1

(v)

is the total number of

words assigned to topic j and



(d)

is the total number of words in document d.

When (3) is applied to the image, w means a visual word and d means an image. The

formula of combine visual language model with LDA-based topic model linearly is

deﬁned as follows.

(



)

=λ⋅



(



)

(

1 −λ

)

⋅





z=1

(



)

⋅ P







θ, d



(5)

Where

(

0 < λ < 1

)

is the parameter controlling proportion of linear combination. In

the Eq. (4), we can see that the larger

is the proportion of representation of visual

language model is much greater.

2.3 Retrieval Scheme

After the aforementioned processing, each image is represented as a probability histo‐

gram. Retrieval scheme is used the query likelihood model (QLM). The formula of rank

images for a given query is deﬁned as follows.

Latent Dirichlet Allocation Based Image Retrieval 213

剩余10页未读，继续阅读

weixin_38733333

粉丝: 4
资源: 922

LDA驱动的图像检索提升：融合空间与语义信息

基于潜在狄利克雷分配的乳腺癌组织病理学图像检索

一种更有效的图像表示方法：基于潜在狄利克雷分配的主题模型

潜在狄利克雷分配：潜在狄利克雷分配-matlab开发

基于LDA的单词图像表示法在蒙古历史文献上的关键词识别

基于视觉嵌入和空间约束的单词图像表示在历史文献中的关键词发现

信息存储与检索PPT

使用视觉单词嵌入和RNN表示单词图像以在历史文档图像上发现关键字

深度学习与一致性表示空间学习的跨媒体检索.pdf

基于WMF_LDA主题模型的文本相似度计算

基于LDA的主题模型：降低维度并挖掘图像高级语义

最新资源