Learning to Rank with Partially-Labeled Data
Kevin Duh
∗
Dept. of Electrical Engineering
University of Washington
Seattle, WA, USA
kevinduh@u.washington.edu
Katrin Kirchhoff
Dept. of Electrical Engineering
University of Washington
Seattle, WA, USA
katrin@ee.washington.edu
ABSTRACT
Ranking algorithms, whose goal is to appropriately order a
set of objects/documents, are an important component of in-
formation retrieval systems. Previous work on ranking algo-
rithms has focused on cases where only labeled data is avail-
able for training (i.e. supervised learning). In this paper,
we consider the question whether unlabeled (test) data can
be exploited to improve ranking performance. We present a
framework for transductive learning of ranking functions and
show that the answer is affirmative. Our framework is based
on generating better features from the test data (via Ker-
nelPCA) and incorporating such features via Boosting, thus
learning different ranking functions adapted to the individ-
ual test queries. We evaluate this method on the LETOR
(TREC, OHSUMED) dataset and demonstrate significant
improvements.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Informa-
tion Search and Retrieval; H.4.m [Information Systems
Applications]: Miscellaneous—machine learning
General Terms
Algorithms, Experimentation
Keywords
Information retrieval, Learning to rank, Transductive learn-
ing, Boosting, Kernel principal components analysis
1. INTRODUCTION
Ranking algorithms, whose goal is to appropriately order
a set of objects/documents, are an important component of
information retrieval (IR) systems. In applications such as
web search, accurately presenting the most relevant docu-
ments to satisfy an information need is of utmost impor-
tance: a suboptimal ranking of search results may frustrate
the entire user experience.
∗
Work supported by an NSF Graduate Research Fellowship
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGIR’08 July 20–24, 2008, Singapore
Copyright 2008 ACM 978-1-60558-164-4/08/07 ...$5.00.
While many methods have been proposed to tackle the
(document) ranking problem, a recent and promising re-
search direction is to apply techniques in machine learning.
In this approach, a training set consisting of user queries and
the corresponding list of retrieved documents and relevance
jugdements is provided to the machine learning algorithm.
The relevance judgments may be provided by a human an-
notator or an implicit feedback mechanism (e.g. query logs)
[13]. The algorithm then learns a “ranking function” that
predicts relevance judgments close to that of the training
set. Thus far, much research has focused on (a) different
learning algorithms [8, 7, 5, 11, 35, 17], and (b) the in-
terplay between optimization objectives and IR evaluation
measures [20, 28, 32, 4, 33].
We explore an orthogonal research direction here: We ask
the question, “Can additional unlabeled data (i.e. query-
document pairs without relevance judgments) be exploited
to improve ranking performance?” In particular, we con-
sider the case known as transductive learning, where such
unlabeled data is the test data to be evaluated.
To be precise, let q = query, d = list of retrieved doc-
uments, and y = list of relevance judgments. Let S =
{(q
l
, d
l
, y
l
)}
l=1..L
be the training set consisting of L tuples of
query-doc-labels. The traditional task of “supervised learn-
ing” is to learn a ranking function using S; the ranker is
then evaluated on a previously unseen and unlabeled test
set E = {(q
u
, d
u
)}
u=1..U
. In transductive learning, both
S and E are available when building the ranking function,
which is also then evaluated on E. This has the potential to
outperform supervised learning since (1) it has more data,
and (2) it can adapt to the test set.
Due to its promise, transductive learning (and more gener-
ally, semi-supervised learning
1
) is an active area of research
in machine learning. Previous work mostly focused on clas-
sification problems; work on semi-sup ervised ranking is be-
ginning to emerge.
In this paper, we demonstrate that “learning to rank with
both labeled and unlabeled data” is a research direction
worth exploring. Our contribution is a flexible transductive
framework that plugs in and improves upon existing super-
vised rankers. We demonstrate promising results on TREC
and OHSUMED and discuss a variety of future research di-
rections.
The paper is divided as follows: Section 2 outlines the
1
Semi-supervised (inductive) learning is more general in that
the unlabeled data E need not to be the test set; it can give
predictions and be evaluated on another previously unseen
and unlabeled data. See [37] for a good review.
251