![](https://csdnimg.cn/release/download_crawler_static/3967358/bg1.jpg)
Polynomial Semantic Indexing
Bing Bai
(1)
Jason Weston
(1)(2)
David Grangier
(1)
Ronan Collobert
(1)
Kunihiko Sadamasa
(1)
Yanjun Qi
(1)
Corinna Cortes
(2)
Mehryar Mohri
(2)(3)
(1)
NEC Labs America, Princeton, NJ
{bbai, dgrangier, collober, kunihiko, yanjun}@nec-labs.com
(2)
Google Research, New York, NY
{jweston, corinna, mohri}@google.com
(3)
NYU Courant Institute, New York, NY
mohri@cs.nyu.edu
Abstract
We present a class of nonlinear (polynomial) models that are discriminatively
trained to directly map from the word content in a query-document or document-
document pair to a ranking score. Dealing with polynomial models on word fea-
tures is computationally challenging. We propose a low-rank (but diagonal pre-
serving) representation of our polynomial models to induce feasible memory and
computation requirements. We provide an empirical study on retrieval tasks based
on Wikipedia documents, where we obtain state-of-the-art performance while pro-
viding realistically scalable methods.
1 Introduction
Ranking text documents given a text-based query is one of the key tasks in information retrieval.
A typical solution is to: (i) embed the problem in a feature space, e.g. model queries and target
documents using a vector representation; and then (ii) choose (or learn) a similarity metric that
operates in this vector space. Ranking is then performed by sorting the documents based on their
similarity score with the query.
A classical vector space model, see e.g. [24], uses weighted word counts (e.g. via tf-idf) as the
feature space, and the cosine similarity for ranking. In this case, the model is chosen by hand and no
machine learning is involved. This type of model often performs remarkably well, but suffers from
the fact that only exact matches of words between query and target texts contribute to the similarity
score. That is, words are considered to be independent, which is clearly a false assumption.
Latent Semantic Indexing [8], and related methods such as pLSA and LDA [18, 2], are unsupervised
methods that choose a low dimensional feature representation of “latent concepts” where words
are no longer independent. They are trained with reconstruction objectives, either based on mean
squared error (LSI) or likelihood (pLSA, LDA). These models, being unsupervised, are still agnostic
to the particular task of interest.
More recently, supervised models for ranking texts have been proposed that can be trained on a
supervised signal (i.e., labeled data) to provide a ranking of a database of documents given a query.
For example, if one has click-through data yielding query-target relationships, one can use this to
train these models to perform well on this task. Or, if one is interested in finding documents related
to a given query document, one can use known hyperlinks to learn a model that performs well on this
task. Many of these models have typically relied on optimizing over only a few hand-constructed
features, e.g. based on existing vector space models such as tf-idf, the title, URL, PageRank and
other information [20, 5]. In this work, we investigate an orthogonal research direction, as we
analyze supervised methods that are based on words only. Such models are both more flexible, e.g.
can be used for tasks such as cross-language retrieval, and can still be used in conjunction with
1