PL-ranking：跨模态检索的新颖排名方法

149 浏览量更新于2024-08-27 收藏 944KB PDF 举报

"PL-ranking: A Novel Ranking Method for Cross-Modal Retrieval" 这篇研究论文提出了一种新的跨模态检索排名方法，称为Pairwise-Listwise ranking（PL-ranking），该方法基于低秩优化框架。跨模态检索是当前信息技术领域的一个重要课题，其目标是通过一种模式的数据（如文本或图像）检索到另一种模式的相关数据。PL-ranking方法特别关注提高给定样本排名列表顶部的精度，学习低维公共子空间来处理多模态数据。在PL-ranking中，设计了三个主要的约束条件来优化排名性能。首先，引入了对偶排序损失约束，它专注于优化排名的顶部，这是因为在实际应用中，用户通常更关心排名靠前的结果。这种对偶排序损失能够比较两个样本的相对顺序，但仅关注前几项，从而提高检索的准确性。其次，由于对偶排序损失可能忽视类别信息，论文进一步引入了一个类别一致性约束。这个约束旨在保持同一类别的样本在排名中的紧密性，确保具有相同语义内容的多模态数据能够被正确配对和检索。这有助于增强检索结果的相关性，降低误报率。第三，为了学习到低维度的公共子空间，PL-ranking采用了低秩表示约束。低秩假设可以捕获数据的基本结构，减少噪声和冗余信息，使得不同模态之间的映射更加简洁和有效。这种方法有助于提高不同模态数据之间的匹配度，降低跨模态检索的难度。此外，论文可能还详细讨论了PL-ranking的优化算法、实验设置和性能评估。实验部分通常会对比PL-ranking与其他现有方法在多个公开数据集上的表现，展示其优势和潜在的应用价值。可能包括精确率、召回率、F1分数等指标的提升，以及对不同参数敏感性的分析。 PL-ranking为解决跨模态检索中的排名问题提供了一种创新的途径，通过优化排名列表的顶部、考虑类别信息和采用低秩表示，提升了检索的准确性和效率。这种方法对于多媒体信息检索、社交媒体分析和智能推荐系统等领域有重要的理论和实践意义。

PL-ranking: A Novel Ranking Method for Cross-Modal

Retrieval

Liang Zhang

1,3

, Bingpeng Ma

1,2,3

∗

, Guorong Li

1,3

, Qingming Huang

1,2,3

, Qi Tian

University of Chinese Academy of Sciences, China

Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS, China

Key Laboratory of Big Data Mining and Knowledge Management, CAS, China

Department of Computer Science, University of Texas at San Antonio, TX, 78249, USA

zhangliang14@mails.ucas.ac.cn {bpma, liguorong, qmhuang}@ucas.ac.cn

qitian@cs.utsa.edu

ABSTRACT

This paper proposes a novel method for cross-modal retrieval

named Pairwise-Listwise ranking (PL-ranking) based on

the low-rank optimization framework. Motivated by the fac-

t that optimizing the top of ranking is more applicable in

practice, we focus on improving the precision at the top of

ranked list for a given sample and learning a low-dimensional

common subspace for multi-modal data. Concretely, there

are three constraints in PL-ranking. First, we use a pair-

wise ranking loss constraint to optimize the top of ranking.

Then, considering that the pairwise ranking loss constraint

ignores class information, we further adopt a listwise con-

straint to minimize the intra-neighbors variance and max-

imize the inter-neighbors separability. By this way, class

information is preserved while the number of iterations is

reduced. Finally, low-rank based regularization is applied to

exploit the correlations between features and labels so that

the relevance between the diﬀerent modalities can be en-

hanced after mapping them into the common subspace. We

design an eﬃcient low-rank stochastic subgradient descent

method to solve the proposed optimization problem. The

experimental results show that the average MAP scores of

PL-ranking are improved 5.1%, 9.2%, 4.7% and 4.8% than

those of the state-of-the-art methods on the Wiki, Flickr,

Pascal and NUS-WIDE datasets, respectively.

Keywords

Multi-modal analysis; Cross-modal retrieval; Subspace learn-

ing; Learning to rank

1. INTRODUCTION

With the rapid growth of multi-modal data, including im-

age, text, video and audio, cross-modal retrieval has been

∗

Corresponding author.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

MM ’16, October 15-19, 2016, Amsterdam, Netherlands

 2016 ACM. ISBN 978-1-4503-3603-1/16/10. . . $15.00

DOI: http://dx.doi.org/10.1145/2964284.2964336

widely studied in recent years [7, 9, 13, 15, 17, 18, 20, 21, 22,

26, 27]. The key problem for cross-modal matching is how to

push relevant samples from another modality appearing at

the top of the ranked list when we give a query sample from

one modality. This motivates that the techniques of the

learning to rank have become increasingly popular, which

can exploit the correlation shared by diﬀerent modalities.

These methods optimize the top of ranking by maximizing

a criterion (e.g., MAP or NDCG) related to the ultimate

retrieval performance.

The most successful ranking method in cross-modal re-

trieval may be bi-directional cross-media semantic represen-

tation model (Bi-CMSRM), which optimizes ranking perfor-

mance directly [27]. Bi-CMSRM is based on the structural

SVM and optimized by using the 1-Slack cutting plane al-

gorithm, and it has shown good performance in cross-modal

retrieval. However, despite using an eﬃcient convex method

to solve the dual problem,Bi-CMSRM has shown the weak-

nesses on the scalability to large, high-dimensional dataset

[16]. Besides, Bi-CMSRM only focuses on learning optimal

mappings but ignores the structure of mappings such that

it can not further exploit the label relevance between the

diﬀerent modalities.

In this paper, we propose an eﬃcient ranking method for

cross-modal retrieval named PL-ranking. PL-ranking in-

tegrates the weighted approximate rank pairwise (WARP)

loss

, listwise loss

and low-rank constraint into a generic

minimization formulation, and then is optimized by extend-

ing the recently proposed FAST-SSGD [8]. By this way,

PL-ranking not only optimizes the top of ranking, but also

eﬀectively captures the label correlations as well as scales

to high-dimensional and large datasets. Thus, we can ef-

fectively retrieve relevant samples by searching in a small

neighborhood of the query sample. Speciﬁcally, there are

three important components contained in PL-ranking.

We ﬁrst extend WARP to bi-directional WARP (bWARP)

such that the learned model can be applied to image-query-

texts and text-query-images simultaneously. Since both the

directions of retrieval are optimized in the training period,

bWARP ensures that the diﬀerent modalities are projected

The pairwise ranking method takes the sample pairs as the

training instances and formulates the ranking as a task of

learning a classiﬁcation or regression model from the collec-

tion of the pairwise instances of samples.

The listwise information reﬂects the class relation of mul-

tiple samples, e.g., intra-class and inter-class relations.

1355

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38596117

粉丝: 12
资源: 913

PL-ranking：跨模态检索的新颖排名方法

gnn-re-ranking:一种基于GNN的实时方法。 了解图像检索重新排列

分层排序系统（基于Inventor-Ranking的发明人排序算法）-java版

github-ranking：:rocket:创建您的github贡献排名

github-readme-ranking::high_voltage:为您的github自述文件动态生成排名

baidu-ranking:看法

matlab代码影响-Re-ranking:重新排名

PHP-Game-Ranking:显示事件的排名

Tabela-de-Ranking:现实世界中的一切

Go-ORM-Frameworks-Ranking:Go ORM框架Github明星排名列表

java版多用户商城源码-Github-Ranking:Github-排名

最新资源

gnn-re-ranking:一种基于GNN的实时方法。了解图像检索重新排列