学习到排名：NDCG与MAP评估标准的深度解析

需积分: 33 66 浏览量更新于2024-09-07 收藏 105KB PDF 举报

本文档深入探讨了在机器学习领域日益重要的"学习到排序"（Learning to Rank）方法中的评估指标，特别是NDCG（Normalized Discounted Cumulative Gain）和MAP（Mean Average Precision）。这两者在衡量排序算法性能时起着关键作用，尽管许多学习到排序方法如Ranking SVM、RankBoost、RankNet和ListMLE等通过最小化损失函数来训练模型，但其核心其实与评估指标密切相关。首先，作者揭示了排序指标与损失函数之间的内在联系。这些损失函数，如Ranking SVM的Hinge Loss、RankBoost的Boosting Loss、RankNet的Pairwise Error Loss以及ListMLE的Log-likelihood Loss，实际上是对基于评价的排序错误的上界。这意味着，当这些损失函数被优化时，实际上是在最大化对应的排序指标，因为它们的目标是使模型的预测尽可能接近理想的排序。作者通过将排序问题转化为一系列分类任务的方式，阐明了这一关系。他们定义了一种被称为"核心"的概念，即对于每个查询，将相关的文档视为正例，其余的作为负例，然后通过一系列二分类问题来处理。这种序列化的分类任务使得损失函数的设计能自然地映射到排序质量的提升，因为优化过程中会优先考虑那些对最终排序结果影响最大的决策。 NDCG是一种常用的排序度量，它考虑了排序列表中正确结果的位置权重，对用户满意度有很好的体现。而MAP则关注的是平均精度，特别是在检索结果的顶部，它更关注于精确匹配的准确率。因此，通过最小化这些损失函数，学习到的排序模型实际上是在追求一个综合了位置和精度的最优排序效果。总结来说，本文深入剖析了学习到排序方法中的评估指标和损失函数的关系，揭示了优化损失函数实际上就是在优化排序性能，这对于理解和设计高效的排序算法具有重要的理论价值。理解并掌握这些评估指标和它们与损失函数的对应关系，对于开发出满足实际需求的推荐系统和搜索引擎至关重要。

Ranking Measures and Loss Functions

in Learning to Rank

Wei Chen

∗

Chinese Academy of sciences

chenwei@amss.ac.cn

Tie-Yan Liu

Microsoft Research Asia

tyliu@micorsoft.com

Yanyan Lan

Chinese Academy of sciences

lanyanyan@amss.ac.cn

Zhiming Ma

Chinese Academy of sciences

mazm@amt.ac.cn

Hang Li

Microsoft Research Asia

hangli@micorsoft.com

Abstract

Learning to rank has become an important research topic in machine learning.

While most learning-to-rank methods learn the ranking functions by minimizing

loss functions, it is the ranking measures (such as NDCG and MAP) that are used

to evaluate the performance of the learned ranking functions. In this work, we

reveal the relationship between ranking measures and loss functions in learning-

to-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We

show that the loss functions of these methods are upper bounds of the measure-

based ranking errors. As a result, the minimization of these loss functions will lead

to the maximization of the ranking measures. The key to obtaining this result is to

model ranking as a sequence of classiﬁcation tasks, and deﬁne a so-called essen-

tial loss for ranking as the weighted sum of the classiﬁcation errors of individual

tasks in the sequence. We have proved that the essential loss is both an upper

bound of the measure-based ranking errors, and a lower bound of the loss func-

tions in the aforementioned methods. Our proof technique also suggests a way to

modify existing loss functions to make them tighter bounds of the measure-based

ranking errors. Experimental results on benchmark datasets show that the modiﬁ-

cations can lead to better ranking performances, demonstrating the correctness of

our theoretical analysis.

1 Introduction

Learning to rank has become an important research topic in many ﬁelds, such as machine learning

and information retrieval. The process of learning to rank is as follows. In training, a number of

sets are given, each set consisting of objects and labels representing their rankings (e.g., in terms of

multi-level ratings

). Then a ranking function is constructed by minimizing a certain loss function

on the training data. In testing, given a new set of objects, the ranking function is applied to produce

a ranked list of the objects.

Many learning-to-rank methods have been proposed in the literature, with different motivations and

formulations. In general, these methods can be divided into three categories [3]. The pointwise

approach, such as subset regression [5] and McRank [10], views each single object as the learn-

ing instance. The pairwise approach, such as Ranking SVM [7], RankBoost [6], and RankNet [2],

regards a pair of objects as the learning instance. The listwise approach, such as ListNet [3] and

∗

he work was performed when the ﬁrst and the third authors were interns at Microsoft Research Asia.

In information retrieval, such a label represents the relevance of a document to the given query.

下载后可阅读完整内容，剩余8页未读，立即下载

爆炒小青蛙

粉丝: 50

学习到排名：NDCG与MAP评估标准的深度解析

path ranking 算法详解

图像噪声检测评估标准（干净像素点检测和噪声像素点检测）

大赛评分系统excel 自动评分排名

dv-scroll-ranking-board怎么修改字体

根据排名修改dv-scroll-ranking-board组件的背景颜色

dual-rank对single rank的提升

AUC=1-Ranking loss？

OCR-VQA数据集评估指标

最新资源