稀疏学习排名：一种高效的 primal-dual 算法

184 浏览量更新于2024-08-28 收藏 262KB PDF 举报

"Sparse Learning-to-Rank via an Efficient Primal-Dual Algorithm" 本文是一篇关于"学习排名"（Learning-to-Rank）领域的研究论文，发表在2011年7月的《计算机学报》（IEEE Transactions on Computers）上。作者包括Hanjiang Lai、Yan Pan、Cong Liu、Liang Lin以及Jie Wu（IEEE Fellow）。论文主要关注的是如何在信息检索中实现稀疏的Learning-to-Rank模型，即构建只有少量非零系数的排名模型。学习排名是信息检索和推荐系统中的关键技术，它通过训练模型来预测用户对不同文档的偏好，从而进行排序。近年来，由于其在提高搜索结果质量和用户体验方面的显著效果，这一领域受到了越来越多的关注。在本论文中，作者受到稀疏模型成功的启发，探讨了稀疏学习排名的问题。他们将稀疏学习排名问题定义为一个带有稀疏诱导的ℓ1约束的凸优化问题。在优化问题中，ℓ1范数常被用来鼓励模型的稀疏性，因为它能产生具有较少非零元素的解。然而，由于ℓ1约束是非光滑的，这给求解带来了挑战。为了解决这个问题，作者提出了一种基于 primal-dual 视角的学习算法。这种算法从原问题和对偶问题两个角度出发，巧妙地处理了非光滑约束，保证了优化过程的效率。他们进一步证明，该算法在最多O(1/ϵ)次迭代后，能够保证找到一个满足精度要求的解，即所谓的ϵ-准确解。这一收敛速度优于传统的子梯度下降算法，后者通常需要O(1/ϵ^2)次迭代。这意味着新算法在计算复杂性和收敛速度上都有优势。为了验证算法的有效性，作者在几个公开的基准数据集上进行了实验。实验结果表明，提出的算法在实践中表现出良好的性能，不仅能够生成高效的稀疏模型，而且在预测准确性上也有出色表现。这篇论文为解决学习排名中的稀疏优化问题提供了一个创新且高效的算法，为信息检索和推荐系统的模型设计提供了新的思路，并通过实验证明了其理论上的优越性和实践中的有效性。

IEEE TRANSACTIONS ON COMPUTERS, VOL. , NO. , JULY 2011 3

boosting algorithm that optimizes an exponential loss,

which upper bounds the metrics of MAP and NDCG.

(2) The second stream deﬁnes several listwise loss

functions, which take the list of retrieved documents

for the same query as a sample. ListNet [4] deﬁnes

a loss function based on the KL-divergence between

two permutation probability distributions. ListMLE

[18] deﬁnes another listwise likelihood loss function

based on the Luce Model [17].

Another aspect related to the work in this pa-

per is sparse learning, which has been widely ap-

plied to many applications in computer vision, sig-

nal processing and bioinformatics. Many learning

algorithms have been proposed for sparse clas-

siﬁcation/regression, such as decomposition algo-

rithms [28], [27], algorithms for 

constrained opti-

mization problem [31], [30], [29] (interested readers

please refer to [27] for more discussions about sparse

classiﬁcation/regression algorithms).

Since the pairwise approach reduces the ranking

problem to a classiﬁcation problem on document

pairs, in principle, many algorithms for sparse clas-

siﬁcation can be applied to obtain sparse ranking

models. However, few efforts have been made to

tackle the problem of learning a sparse solution for

ranking. Recently, Sun et al. [10] proposed a reduction

framework to reduce ranking to importance-weighted

pairwise classiﬁcation and then used an 

regularized

algorithm to learn a sparse ranking predictor. Despite

success, it does not justify the individual contribution

of each of its two parts, the reduction framework

and the sparse learning algorithm. Sparse learning

for ranking is a relatively new topic that needs more

exploration.

3NOTATIONS

We introduce the notations used throughout this pa-

per. In the learning-to-rank problem, there is a la-

beled training set S={(q

)}

k=1

and a test set

T = {(q

)}

n+u

k=n+1

. Here q

denotes a query, X

k,i

}

n(q

)

i=1

denotes the list of corresponding retrieved

objects (i.e., documents) for q

, and Y

= {y

k,i

}

n(q

)

i=1

the list of corresponding relevance labels provided by

human, where y

k,i

∈{0, 1, 2, 3, 4}, n(q

) represents the

number of objects in the retrieved object list belongs

to query q

, and X

k,i

represents the i

object in

the retrieved object list belongs to query q

. Each

k,i

∈ R

is an m-dimensional feature vector and

each attribute of X

k,i

is scaled to the range [0, 1].

We deﬁne a pairs set P of comparable object pairs

as following: (k, i, j) ∈ P if and only if X

k,i

k,j

belong to the same query q

and y

k,i

= y

k,j

.Weuse

p to denote the number of pairs in P . In addition,

we deﬁne an object pairwise comparison error matrix

K ∈ R

p×m

as follows: each pair in P corresponds to a

row in K. Denote the l

pair in P as {k

}, the l

row of K as K

. We deﬁne K

= y

−X

TABLE 1

List of notations

Notations Meaning

S={(q

)}

i=1

training set

m dimension of data

p number of pairs in set P

r the radius of 

-ball: ||w||

≤ r

K matrix in R

p×m

that contains

the pairwise information

(w) I

(w)=0if condition C is satisﬁed,

otherwise I

(w)=∞.

where y

=1if y

, and otherwise

= −1. Since X

i,j

∈ [0, 1]

for all i, j, we have

∈ [−1, 1]

for all l.

We use x, y to represent the inner product of two

vectors x and y. Let r denote the radius of an 

ball: ||w||

≤ r . We introduce an indictor function

(w): I

(w)=0if and only if for a given vector w,

condition C is satisﬁed, otherwise I

(w)=+∞. The

above notations are summarized in Table 1.

4PROBLEM STATEMENT

The learning-to-rank problem has a wide range of

applications in information retrieval systems. We are

given a labeled training set S = {(q

)}

k=1

and

a test set T = {(q

)}

n+u

k=n+1

. The task of learning

to rank is to construct a ranking predictor from the

training data, and then sort the examples in the test

set using the ranking predictor.

Following the common practice in learning-to-rank,

in this paper, we only focus on learning a linear rank-

ing predictor f(x)=w,x. Many existing learning-to-

rank algorithms use this setting. The SVM methods,

such as the recently proposed RankSVM-Struct [19]

and RankSVM-Primal [16], are notable algorithms

for learning linear ranking predictors, which achieve

a state-of-the-art performance on several benchmark

datasets. These methods learn ranking models by

minimizing the following form of regularized pair-

wise loss functions:

min



||w



+ C



(k,i,j)∈P

(y

k,i,j



k,i

− X

k,j

)) (1)

where (x) can be the hinge loss (x)=max(0, 1 − x)

or the squared hinge loss (x)=max(0, 1 − x)

and C is a parameter to control the trade-off be-

tween training error and the model complexity. Ex-

isting work [34] in learning-to-rank revealed that

the classiﬁcation based pairwise loss function (i.e.,

hinge loss) is both an upper bound of 1-NDCG and

1-MAP. When we take (x)=max(0, 1 − x), the

objective function given by (1) is the objective of

Ranking SVM. There exist several algorithms, such

as the quadratic programming [26] or the cutting

plane algorithm [19], which minimize this objective

function. If (x)=max(0, 1 − x)

, the function given

by (1) becomes the objective of RankSVM-Primal [16],

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

剩余13页未读，继续阅读

weixin_38669832

粉丝: 5
资源: 956

稀疏学习排名：一种高效的 primal-dual 算法

Multi-view Low-rank Sparse Subspace Clustering Algorithm代码及各种数据集

Implementing Sparse Matrix-Vector Multiplication on CUDA

omp算法matlab代码-Learning-Macroscopic-Brain-Connectomes-via-Group-Sparse-F

Consistent-Sparse-Deep-Learning-Theory-and-Computation

Sparse-Bayesian-Learning-master_matlab_

Dictionary-learning-Sparse-representation-for-defect-detection:稀疏编码的缺陷检测＆http

A Hybrid Algorithm of Extreme Learning Machine and Sparse Auto-encoder

Sparse Bayesian Learning -压缩感知

Sparse-algorithm.rar_sparse algorithm_稀疏 去噪_稀疏去噪

Algorithm-sparse-som.zip

最新资源

Sparse-algorithm.rar_sparse algorithm_稀疏去噪_稀疏去噪