信息检索的排序学习：刘铁岩的深度解析

信息检索

5星 · 超过95%的资源需积分: 50 113 浏览量更新于2024-07-21 收藏 797KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"《Learning to Rank for Information Retrieval》是由刘铁岩编著的一本关于信息检索中排序算法的经典教程。这本书深入探讨了在信息检索领域如何通过学习来优化排名方法，涵盖了点wise、对wise和列表wise三种主要的排序方法，并对各种算法进行了分析与实验验证。" 在信息检索（Information Retrieval, IR）中，排名是一个关键问题，它涉及到如何将搜索结果按照相关性从高到低进行排列，以帮助用户快速找到最相关的文档。刘铁岩的这本教程深入浅出地介绍了学习排序（Learning to Rank, LTR）的概念和技术。首先，书中介绍了点wise方法，这是一种将每个文档视为独立实体，通过回归或分类算法直接预测其相关性分数的策略。包括基于回归的算法（如线性回归）和基于分类的算法（如逻辑回归），这些方法简单直观，但可能忽视了文档间的关系。接着，对wise方法引入了文档对之间的比较，通过优化文档对的相对排序来提高整体排名质量。例如，提升排序（RankSVM）和基于梯度下降的对wise学习算法是此类方法的代表。对wise方法能更好地处理排序的相对性质，但计算复杂度相对较高。然后，列表wise方法更进一步，直接优化整个搜索结果列表，考虑所有文档的整体效果，如通过最小化列表级别的损失函数来提升IR评价指标。这包括直接优化信息检索评价指标（如NDCG, MAP）和设计新的损失函数（如ListNet）。列表wise方法更接近实际的评估标准，但实现起来更为复杂。书中还对这三种方法进行了深入的分析，讨论了各自的优点和局限性，以及它们在实际应用中的表现。此外，刘铁岩提供了基准测试（Benchmarking）部分，详细介绍了LETOR数据集及其在验证各种LTR算法性能时的重要作用。统计排名理论部分则探讨了传统的泛化分析在排序问题上的应用，并提出了一种基于查询级别的排名框架，更深入地理解排序模型的统计性质。这有助于研究人员和实践者在设计和评估新的排序算法时，更好地理解其内在的统计特性。《Learning to Rank for Information Retrieval》是一本全面介绍信息检索中排序算法的权威著作，对于理解并改进搜索引擎的性能具有极高的参考价值。无论是研究者还是工程师，都能从中受益，提升其在信息检索领域的专业技能。

资源详情

资源推荐

238 Introduction

We call those ranking methods that have the following two properties

learning-to-rank methods.

Feature based: All the documents under investigation are represented

by feature vectors,

reﬂecting the relevance of the documents to the

query. That is, for a given query q, its associated document d can be

represented by a vector x =Φ(d,q), where Φ is a feature extractor.

Typical features used in learning to rank include the frequencies of the

query terms in the document, the BM25 and PageRank scores, and the

relationship between this document and other documents. If one wants

to know more about widely used features, please refer to Tables 6.2

and 6.3 in Section 6.

Even if a feature is the output of an existing retrieval model, in

the context of learning to rank, one assumes that the parameter in the

model is ﬁxed, and only the optimal way of combining these features is

learned. In this sense, the previous work on automatically tuning the

parameters of existing models [60, 120] is not categorized as “learning-

to-rank” methods.

The capability of combining a large number of features is a very

important advantage of learning to rank. It is easy to incorporate any

new progress on the retrieval model by including the output of the

model as one dimension of the features. Such a capability is highly

demanding for real search engines, since it is almost impossible to use

only a few factors to satisfy complex information needs of Web users.

Discriminative training: The learning process can be well described

by the four components of discriminative learning as mentioned in the

previous subsection. That is, a learning-to-rank method has its speciﬁc

input space, output space, hypothesis space, and loss function.

In ML literature, discriminative methods have been widely used to

combine diﬀerent kinds of features, without the necessity of deﬁning a

probabilistic framework to represent the objects and the correctness of

prediction. In this sense, previous works that train generative ranking

Note that, hereafter in this tutorial, when we refer to a document, we will not use d any

longer. Instead, we will directly use its feature representation x. Furthermore, since our

discussions will focus more on the learning process, we will always assume the features

are pre-speciﬁed, and will not purposely discuss how to extract them.

240 Introduction

the creation of the test set for evaluation. For example, a typical train-

ing set consists of n training queries q

(i =1,...,n), their associated

documents represented by feature vectors x

(i)

= {x

(i)

}

(i)

j=1

(where m

(i)

is the number of documents associated with query q

), and the corre-

sponding relevance judgments.

Then a speciﬁc learning algorithm is

employed to learn the ranking model (i.e., the way of combining the

features), such that the output of the ranking model can predict the

ground truth label in the training set

as accurately as possible, in

terms of a loss function. In the test phase, when a new query comes in,

the model learned in the training phase is applied to sort the documents

according to their relevance to the query, and return the corresponding

ranked list to the user as the response to her/his query.

1.2.3 Approaches to Learning to Rank

Many learning-to-rank algorithms can ﬁt into the above framework.

In order to better understand them, we perform a categorization on

these algorithms. In particular, we group the algorithms, according to

the four pillars of ML, into three approaches: the pointwise approach,

the pairwise approach, and the listwise approach. Note that diﬀerent

approaches model the process of learning to rank in diﬀerent ways. That

is, they deﬁne diﬀerent input and output spaces, use diﬀerent hypothe-

ses, and employ diﬀerent loss functions. Note that the output space is

used to facilitate the learning process, which can be diﬀerent from the

relevance judgments on the documents. That is, even if provided with

the same format of judgments, one can derive diﬀerent ground truth

labels from it, and use them for diﬀerent approaches.

The pointwise approach

The input space of the pointwise approach contains the feature vector

of each single document.

Please distinguish between the judgment for evaluation and the judgment for constructing

the training set, although the processes of obtaining them may be very similar.

Hereafter, when we mention the ground truth labels in the remainder of the tutorial, we

will mainly refer to the ground truth labels in the training set, although we assume every

document has its intrinsic label no matter whether it is judged or not.

1.2 Learning to Rank 241

The output space contains the relevance degree of each single doc-

ument. The ground truth label in the output space is usually deﬁned

in the following way. If the judgment is directly given as relevance

degree l

, the ground truth label for document x

is deﬁned as y

= l

If the judgment is given as total order π

, one can get the ground truth

label by using a mapping function.

However, if the judgment is given

as pairwise preference l

u,v

, it is not straightforward to make use of it

to generate the ground truth label.

The hypothesis space contains functions that take the feature vector

of a document as the input and predict the relevance degree of the

document. We usually call such a function f the scoring function. Note

that, based on the scoring function, one can sort all the documents and

produce the ﬁnal ranked list.

The loss function examines the accurate prediction of the ground

truth label for each single document. In diﬀerent pointwise ranking

algorithms, ranking is modeled as regression, classiﬁcation, and ordi-

nal regression (see Section 2). Therefore the corresponding regression

loss, classiﬁcation loss, and ordinal regression loss are used as the loss

function. Note that the pointwise approach does not consider the inter-

dependency among documents, and thus the position of a document in

the ﬁnal ranked list is invisible to its loss function. Furthermore, the

approach does not make use of the fact that some documents are actu-

ally associated with the same query. Considering that most IR evalua-

tion measures are query-level and position-based, intuitively speaking,

the pointwise approach has its limitations.

Example algorithms belonging to the pointwise approach include

[24, 25, 26, 31, 33, 34, 49, 53, 73, 78, 90, 114]. We will introduce some

of them in Section 2.

The pairwise approach

The input space of the pairwise approach contains a pair of documents,

both represented as feature vectors.

The output space contains the pairwise preference (which takes val-

ues from {1,−1}) between each pair of documents. The ground truth

For example, the position of the document in π

can be used to deﬁne the relevance

degree.

242 Introduction

label in the output space is usually deﬁned in the following way. If the

judgment is given as relevance degree l

, then the order for document

pair (x

) can be deﬁned as y

u,v

=2· I

l

}

− 1. Here I

{A}

is an

indicator function, which is deﬁned to be 1 if predicate A holds and 0

otherwise. If the judgment is given directly as pairwise preference l

u,v

then it is straightforward to set y

u,v

= l

u,v

. If the judgment is given as

total order π

, one can deﬁne y

u,v

=2· I

{π

(u)<π

(v)}

− 1.

The hypothesis space contains bi-variate functions h that take a

pair of documents as the input and output the relative order between

them. Some pairwise ranking algorithms directly deﬁne their hypothe-

ses as such [29], however, in more algorithms, the hypothesis is

still deﬁned with a scoring function f for simplicity, i.e., h(x

2 · I

{f(x

)>f(x

)}

− 1.

The loss function measures the inconsistency between h(x

) and

the ground truth label y

u,v

. For example, in some algorithms, ranking

is modeled as a pairwise classiﬁcation, and the corresponding classiﬁ-

cation loss on a pair of documents is used as the loss function. Note

that the loss function used in the pairwise approach only considers

the relative order between two documents. When one looks at only a

pair of documents, however, the position of the documents in the ﬁnal

ranked list can hardly be derived. Furthermore, the approach ignores

the fact that some pairs are generated from the documents associated

with the same query. Considering that most IR evaluation measures

are query-level and position-based, intuitively speaking, there is still a

gap between this approach and ranking for IR.

Example algorithms belonging to the pairwise approach include

[9, 14, 16, 29, 47, 63, 97, 122]. We will introduce some of them in

Section 3.

The listwise approach

The input space of the listwise approach contains the entire group of

documents associated with query q, e.g., x = {x

}

j=1

There are two types of output spaces used in the listwise approach.

For some listwise ranking algorithms, the output space contains the rele-

vance degrees of all the documents associated with a query. In this case,

the ground truth label y = {y

}

j=1

can be derived from the judgment

剩余108页未读，继续阅读

suiyuan4325

粉丝: 0
资源: 3

信息检索的排序学习：刘铁岩的深度解析

Learning to Rank for Information Retrieval pdf

Learning to Rank for Information Retrieval

Learning to Rank for Information Retrieval and Natural Language Processing

Statistical Machine Learning for Information Retrieval

A_Baseline_for_Visual_Instance_Retrieval_with_Deep.pdf

L2R-Software-TR-QAs:这是“Learning to Rank for Question Oriented Software Text Retrieval”实验数据集

Intorduction_to_Information_Retrieval(2008 Edition)

Wavelet-CBIR.rar_cbir_cbir matlab_image retrieval_wavelet cbir

GS.rar_??GS_GS_GS算法_phase retrieval_激光整形

Learning To Rank

Statistical Language Model for Information Retrieval

learning to rank 资料集合

PermissionError: [Errno 13] Permission denied: 'D:/all_code/Image_text_retrieval/ImageData'

CCIR2011刘铁岩关于learning to rank的keynote

numexpr-2.8.3-cp38-cp38-win_amd64.whl

ujson-5.3.0-cp311-cp311-win_amd64.whl

基于MATLAB车牌识别程序技术实现面板GUI.zip

最新资源