LearningtoRankforInformationRetrieval - CSDN文库

4星 · 超过85%的资源需积分: 10 18 浏览量更新于2023-03-03 评论收藏 981KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Foundations and Trends

R



in

sample

Vol. xx, No xx (2008) 1–112

c

 2008 xxxxxxxxx

DOI: xxxxxx

Learning to Rank for Information Retrieval

Tie-Yan Liu

1

1

Microsoft Research Asia, Sigma Center, No. 49, Zhichun Road, Haidian

District, Beijing, 100190, P. R. China, Tie-Yan.Liu@microsoft.com

Abstract

Learning to rank for information retrieval (IR) is a task to automat-

ically construct a ranking model using training data, such that the

model can sort new objects according to their degrees of relevance,

preference, or importance. Many IR problems are by nature ranking

problems, and many IR technologies can be potentially enhanced by

using learning-to-rank techniques. The objective of this tutorial is to

give an introduction to this research direction. Speciﬁcally, the exist-

ing learning-to-rank algorithms are reviewed and categorized into three

approaches: the pointwise, pairwise, and listwise approaches. The ad-

vantages and problems with each approach are analyzed, and the rela-

tionships between the loss functions used in these approaches and IR

evaluation measures are discussed. Then the empirical evaluations on

typical learning-to-rank methods are shown, with the LETOR collec-

tion as a benchmark dataset, which seem to suggest that the listwise ap-

proach be the most eﬀective one among all the approaches. After that,

a statistical ranking theory is introduced, which can describe diﬀerent

learning-to-rank algorithms, and be used to analyze their query-level

generalization abilities. At the end of the tutorial, we make a summary

and discuss potential future work on learning to rank.

2 Introduction

vance with regards to the query.

Scenario 2: Consider the relationships of similarity [118], website

structure [35], and diversity [139] between documents in the ranking

process. This is also referred to as relational ranking [102].

Scenario 3: Aggregate several candidate ranked lists to get a bet-

ter ranked list. This scenario is also referred to as meta search [7]. The

candidate ranked lists may come from diﬀerent index servers, or diﬀer-

ent vertical search engines, and the target ranked list is the ﬁnal result

presented to users.

Scenario 4: Find whether and to what degree a property of a

webpage inﬂuences the ranking result. This is referred to as “reverse

engineering” in search engine optimization (SEO)

2

.

To tackle the problem of document retrieval, many heuristic ranking

models have been proposed and used in the literature of IR. Recently,

given the amount of potential training data available, it has become

possible to leverage machine learning (ML) technologies to build eﬀec-

tive ranking models. Speciﬁcally, we call those methods that learn how

to combine predeﬁned features for ranking by means of discriminative

learning “learning-to-rank” methods.

In recent years, learning to rank has become a very hot research

direction in IR, and a large number of learning-to-rank algorithms have

been proposed, such as [49] [73] [33] [90] [78] [34] [59] [114] [26] [9] [29]

[14] [122] [47] [62] [97] [16] [117] [136] [134] [13] [104] [99] [17] [129]. We

can foresee that learning to rank will have an even bigger impact on

IR in the future.

When a research area comes to this stage, several questions as fol-

lows naturally arise.

•

To what respect are these learning-to-rank algorithms similar

and in which aspects do they diﬀer? What are the strengths

and weaknesses of each algorithm?

•

Empirically, which of those many learning-to-rank algorithms

perform the best? What kind of datasets can be used to make

2

http://www.search-marketing.info/newsletter/reverse-engineering.htm

剩余114页未读，继续阅读

评论4

zby732

2012-10-11

这只是一个论文，并不是那本书啊。

mzfor2004

粉丝: 4
资源: 5

会员权益专享

图片转文字

全年可省5，000元立即开通

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈