which reveals interesting heuristics that can be used to guide the exploitation of different signals to
develop effective ranking features. Finally, we also discuss several interesting future research directions.
Chapter 6 is about entity ranking, which is a recent paradigm that refers to retrieving and ranking related
objects and entities from different structured sources in various scenarios. Entities typically have
associated categories and relationships with other entities. In this chapter, we introduce how to build a
Web-scale entity ranking system based on machine = learned ranking models. Specifically, the entity
ranking system usually takes advantage of structured knowledge bases, entity relationship graphs, and
user data to derive useful features for facilitating semantic search with entities directly within the
learning-to-rank framework. Similar to generic Web search ranking, entity pairwise preference can be
leveraged to form the objective function of entity ranking. More than that, this chapter introduces ways to
incorporate the categorization information and preference of related entities into the objective function for
learning. This chapter further discusses how entity ranking is different from regular Web search in terms
of presentation bias and the interaction of categories of query entities and result facets.
Chapter 7 presents learning to rank with multiaspect relevance for vertical searches. Many vertical
searches, such as local search, focus on specific domains. The meaning of relevance in these verticals is
domain-specific and usually consists of multiple well-defined aspects. For example, in local search, text
matching and distance are two important aspects to assess relevance. Usually, the overall relevance
between a query and a document is a tradeoff among multiple aspect relevancies. Given a single vertical,
such a tradeoff can vary for different types of queries or in different contexts. In this chapter, we explore
these vertical-specific aspects in the learning-to-rank setting. We propose a novel formulation in which
the relevance between a query and a document is assessed with respect to each aspect, forming the
multiaspect relevance. To compute a ranking function, we study two types of learning-based approaches
to estimate the tradeoff among these aspect relevancies: a label aggregation method and a model
aggregation method. Since there are only a few aspects, a minimal amount of training data is needed to
learn the tradeoff. We conduct both offline and online bucket-test experiments on a local vertical search
engine, and the experimental results show that our proposed multiaspect relevance formulation is very
promising. The two types of aggregation methods perform more effectively than a set of baseline
methods including a conventional learning-to-rank method.
Chapter 8 focuses on aggregated vertical search. Commercial information access providers increasingly
incorporate content from a large number of specialized services created for particular information-
seeking tasks. For example, an aggregated Web search page may include results from image databases
and news collections in addition to the traditional Web search results; a news provider may dynamically
arrange related articles, photos, comments, or videos on a given article page. These auxiliary services,
known as verticals, include search engines that focus on a particular domain (e.g., news, travel, sports),
search engines that focus on a particular type of media (e.g., images, video, audio), and application
programming interfaces (APIs) to highly targeted information (e.g., weather forecasts, map directions, or
stock prices). The goal of aggregated search is to provide integrated access to all verticals within a single
information context. Although aggregated search is related to classic work in distributed information
retrieval, it has unique signals, techniques, and evaluation methods in the context of the Web and other
production information access systems. In this chapter, we present the core problems associated with
aggregated search, which include sources of predictive evidence, relevance modeling, and evaluation.
Chapter 9 presents recent advances in cross-vertical ranking. A traditional Web search engine conducts
ranking mainly in a single domain, i.e., it focuses on one type of data source, and effective modeling relies
on a sufficiently large number of labeled examples, which require an expensive and time-consuming
labeling process. On the other side, it is very common for a vertical search engine to conduct ranking
tasks in various verticals, which presents a more challenging ranking problem, that of cross-domain
ranking. Although in this book our focus is on cross-vertical ranking, the proposed approaches can be
applied to more general cases, such as cross-language ranking. Therefore, we use a more general term,
cross-domain ranking, in this book. For cross-domain ranking, in some domains we may have a relatively
large amount of training data, whereas in other domains we can only collect very little. Theretofore,
finding a way to leverage labeled information from related heterogeneous domain to improve ranking in
a target domain has become a problem of great interest. In this chapter, we propose a novel probabilistic
model, pairwise cross-domain factor (PCDF) model, to address this problem. The proposed model learns
latent factors (features) for multidomain data in partially overlapped heterogeneous feature spaces. It is