个性化垂直搜索引擎排序学习技术

需积分: 10 166 浏览量更新于2024-07-23 收藏 5.11MB PDF 举报

"Relevance Ranking for Vertical Search Engine" Relevance Ranking for Vertical Search Engine是垂直搜索引擎中排序学习的重要组成部分。垂直搜索引擎是一种专门为特定领域或行业设计的搜索引擎，旨在提供更加精准和相关的搜索结果。排序学习是垂直搜索引擎的核心组件之一，负责对搜索结果进行排序和排名，以便用户更容易找到所需的信息。在介绍Relevance Ranking for Vertical Search Engine之前，我们首先需要了解什么是垂直搜索引擎。垂直搜索引擎是一种特殊类型的搜索引擎，它专门为特定领域或行业设计，旨在提供更加精准和相关的搜索结果。与通用搜索引擎不同，垂直搜索引擎可以更好地理解特定领域的需求和特点，从而提供更加精准的搜索结果。 Relevance Ranking for Vertical Search Engine是垂直搜索引擎中排序学习的重要组成部分。排序学习是指根据用户的查询请求和搜索结果的相关性，确定搜索结果的排名顺序。排序学习可以分为两种类型：点wise排序和pairwise排序。点wise排序是指根据搜索结果的相关性，确定其排名顺序，而pairwise排序是指根据搜索结果之间的相似性，确定其排名顺序。 Relevance Ranking for Vertical Search Engine还涉及到多种技术和算法，包括学习到排序（Learning to Rank）、点击流分析（Clickthrough Analysis）、新闻聚类（News Clustering）等。学习到排序是指使用机器学习算法来学习排序模型，从而确定搜索结果的排名顺序。点击流分析是指分析用户的点击行为，以确定搜索结果的相关性。新闻聚类是指将新闻文章聚类成不同的主题，以便更好地理解用户的需求。在Relevance Ranking for Vertical Search Engine中，还涉及到多种领域的应用，包括新闻搜索排名、医疗领域搜索排名、视觉搜索排名和移动搜索排名。新闻搜索排名是指根据新闻文章的相关性，确定其排名顺序。医疗领域搜索排名是指根据医疗领域的搜索结果，确定其排名顺序。视觉搜索排名是指根据视觉特征，确定搜索结果的排名顺序。移动搜索排名是指根据移动设备的搜索结果，确定其排名顺序。 Relevance Ranking for Vertical Search Engine是垂直搜索引擎中排序学习的重要组成部分，涉及到多种技术和算法，以及多种领域的应用。它旨在提供更加精准和相关的搜索结果，以便用户更容易找到所需的信息。在本书中，我们将详细介绍Relevance Ranking for Vertical Search Engine的原理和技术，包括学习到排序、点击流分析、新闻聚类等多种技术和算法。我们还将介绍Relevance Ranking for Vertical Search Engine在多种领域的应用，包括新闻搜索排名、医疗领域搜索排名、视觉搜索排名和移动搜索排名。本书旨在为读者提供一个全面的了解Relevance Ranking for Vertical Search Engine的机会，从而帮助读者更好地理解和应用Relevance Ranking for Vertical Search Engine技术。

1.1 Defining the Area

In the past decade, the impact of general Web search capabilities has been stunning. However, with

exponential information growth on the Internet, it becomes more and more difficult for a general Web

search engine to address the particular informational and research needs of niche users. As a response to

the great need for deeper, more specific, more relevant search results, vertical search engines have

emerged in various domains. By leveraging domain knowledge and focusing on specific user tasks,

vertical search has great potential to serve users highly relevant search results from specific domains.

The core component of vertical search is relevance ranking, which has attracted more and more attention

from both industry and academia during the past few years. This book aims to present systematic study

of practices and theories for vertical search ranking. The studies in this book can be categorized into to

two major classes. One class is single-domain-related ranking that focuses on ranking for a specific

vertical, such as news search ranking and medical domain search ranking. However, in this book the term

vertical has a more general meaning than topic. It refers to specific topics such as news and medical

information, specific result types such as entities, and specific search interfaces such as mobile search. The

second class of vertical search study covered in this book class is multidomain-related ranking, which

focuses on ranking involving multiple verticals, such as multiaspect ranking, aggregating vertical search

ranking, and cross-vertical ranking.

1.2 The Content and Organization of This Book

This book aims to present an in-depth and systematic study of practices and theories related to vertical

search ranking. The organization of this book is as follows.

Chapter 2 covers news vertical search ranking. News is one of the most important of Internet users’

online activities. For a commercial news search engine, it is critical to provide users with the most

relevant and fresh ranking results. Furthermore, it is necessary to group the related news articles so that

users can browse search results in terms of news stories rather than individual news articles. This chapter

describes a few algorithms for news search engines, including ranking algorithms and clustering

algorithms. For the ranking problem, the main challenge is achieving appropriate balance between topical

relevance and freshness. For the clustering problem, the main challenge is how to group related news

articles into clusters in a scalable mode. Chapter 2 introduces a few news search ranking approaches,

including a learning-to-rank approach and a joint learning approach from clickthroughs. The chapter then

describes a scalable clustering approach to group news search results.

Chapter 3 studies another important vertical search, the medical domain search. With the exponential

growth of electronic health records (EHRs), it is imperative to identify effective means to help medical

clinicians as well as administrators and researchers retrieve information from EHRs. Recent research

advances in natural language processing (NLP) have provided improved capabilities for automatically

extracting concepts from narrative clinical documents. However, before these NLP-based tools become

widely available and versatile enough to handle vaguely defined information retrieval needs by EHR

users, a convenient and cost-effective solution continues to be in great demand. In this chapter, we

introduce the concept of medical information retrieval, which provides medical professionals a handy

tool to search among unstructured clinical narratives via an interface similar to that of general-purpose

Web search engines, e.g., Google. In the latter part of the chapter, we also introduce several advanced

features, such as intelligent, ontology-driven medical search query recommendation services and a

collaborative search feature that encourages sharing of medical search knowledge among end users of

EHR search tools.

Chapter 4 is intended to introduce some fundamental and practical technologies as well as some major

emerging trends in visual search ranking. The chapter first describes the generic visual search system, in

which three categories of visual search are presented: i.e., text-based, query example-based and concept-based

visual search ranking. Then we describe the three categories in detail, including a review of various

popular algorithms. To further improve the performance of initial search results, visual search re-ranking

of four paradigms will be presented: 1) self-reranking, which focuses on detecting relevant patterns from

initial search results without any external knowledge; 2) example-based reranking, in which the query

examples are provided by users so that the relevant patterns can be discovered from these examples; 3)

crowd-reranking, which mines relevant patterns from crowd-sourcing information available on the Web;

and 4) interactive reranking, which utilizes user interaction to guide the reranking process. In addition, we

also discuss the relationship between learning and visual search, since most recent visual search ranking

frameworks are developed based on machine learning technologies. Last, we conclude with several

promising directions for future research.

Chapter 5 introduces mobile search ranking. The wide availability of Internet access on mobile devices,

such as phones and personal media players, has allowed users to search and access Web information

while on the go. The availability of continuous fine-grained location information on these devices has

enabled mobile local search, which employs user location as a key factor to search for local entities (e.g., a

restaurant, store, gas station, or attraction) to overtake a significant part of the query volume. This is also

evident by the rising popularity of location-based search engines on mobile devices, such as Bing Local,

Google Local, Yahoo! Local, and Yelp. The quality of any mobile local search engine is mainly determined

by its ranking function, which formally specifies how we retrieve and rank local entities in response to a

user’s query. Acquiring effective ranking signals and heuristics to develop an effective ranking function is

arguably the single most important research problem in mobile local search. This chapter first overviews

the ranking signals in mobile local search (e.g., distance and customer rating score of a business), which

have been recognized to be quite different from general Web search. We next present a recent data

analysis that studies the behavior of mobile local search ranking signals using a large-scale query log,

which reveals interesting heuristics that can be used to guide the exploitation of different signals to

develop effective ranking features. Finally, we also discuss several interesting future research directions.

Chapter 6 is about entity ranking, which is a recent paradigm that refers to retrieving and ranking related

objects and entities from different structured sources in various scenarios. Entities typically have

associated categories and relationships with other entities. In this chapter, we introduce how to build a

Web-scale entity ranking system based on machine = learned ranking models. Specifically, the entity

ranking system usually takes advantage of structured knowledge bases, entity relationship graphs, and

user data to derive useful features for facilitating semantic search with entities directly within the

learning-to-rank framework. Similar to generic Web search ranking, entity pairwise preference can be

leveraged to form the objective function of entity ranking. More than that, this chapter introduces ways to

incorporate the categorization information and preference of related entities into the objective function for

learning. This chapter further discusses how entity ranking is different from regular Web search in terms

of presentation bias and the interaction of categories of query entities and result facets.

Chapter 7 presents learning to rank with multiaspect relevance for vertical searches. Many vertical

searches, such as local search, focus on specific domains. The meaning of relevance in these verticals is

domain-specific and usually consists of multiple well-defined aspects. For example, in local search, text

matching and distance are two important aspects to assess relevance. Usually, the overall relevance

between a query and a document is a tradeoff among multiple aspect relevancies. Given a single vertical,

such a tradeoff can vary for different types of queries or in different contexts. In this chapter, we explore

these vertical-specific aspects in the learning-to-rank setting. We propose a novel formulation in which

the relevance between a query and a document is assessed with respect to each aspect, forming the

multiaspect relevance. To compute a ranking function, we study two types of learning-based approaches

to estimate the tradeoff among these aspect relevancies: a label aggregation method and a model

aggregation method. Since there are only a few aspects, a minimal amount of training data is needed to

learn the tradeoff. We conduct both offline and online bucket-test experiments on a local vertical search

engine, and the experimental results show that our proposed multiaspect relevance formulation is very

promising. The two types of aggregation methods perform more effectively than a set of baseline

methods including a conventional learning-to-rank method.

Chapter 8 focuses on aggregated vertical search. Commercial information access providers increasingly

incorporate content from a large number of specialized services created for particular information-

seeking tasks. For example, an aggregated Web search page may include results from image databases

and news collections in addition to the traditional Web search results; a news provider may dynamically

arrange related articles, photos, comments, or videos on a given article page. These auxiliary services,

known as verticals, include search engines that focus on a particular domain (e.g., news, travel, sports),

search engines that focus on a particular type of media (e.g., images, video, audio), and application

programming interfaces (APIs) to highly targeted information (e.g., weather forecasts, map directions, or

stock prices). The goal of aggregated search is to provide integrated access to all verticals within a single

information context. Although aggregated search is related to classic work in distributed information

retrieval, it has unique signals, techniques, and evaluation methods in the context of the Web and other

production information access systems. In this chapter, we present the core problems associated with

aggregated search, which include sources of predictive evidence, relevance modeling, and evaluation.

Chapter 9 presents recent advances in cross-vertical ranking. A traditional Web search engine conducts

ranking mainly in a single domain, i.e., it focuses on one type of data source, and effective modeling relies

on a sufficiently large number of labeled examples, which require an expensive and time-consuming

labeling process. On the other side, it is very common for a vertical search engine to conduct ranking

tasks in various verticals, which presents a more challenging ranking problem, that of cross-domain

ranking. Although in this book our focus is on cross-vertical ranking, the proposed approaches can be

applied to more general cases, such as cross-language ranking. Therefore, we use a more general term,

cross-domain ranking, in this book. For cross-domain ranking, in some domains we may have a relatively

large amount of training data, whereas in other domains we can only collect very little. Theretofore,

finding a way to leverage labeled information from related heterogeneous domain to improve ranking in

a target domain has become a problem of great interest. In this chapter, we propose a novel probabilistic

model, pairwise cross-domain factor (PCDF) model, to address this problem. The proposed model learns

latent factors (features) for multidomain data in partially overlapped heterogeneous feature spaces. It is

剩余264页未读，继续阅读

marquisthunder

粉丝: 0
资源: 1

个性化垂直搜索引擎排序学习技术

Relevance Ranking for Vertical Search Engines

Relevance Ranking

使用python语言实现对excel表格建立倒排索引，表格有3列：ID，Title，Content，实现输入关键词搜索的布尔检索和结果相关性排序

21434 Relevance Assessment 是由项目中的哪个角色输出呢

layer-wise relevance propagation

r语言 minimum redundancy–maximum relevance’ (mrmr)

相关向量机 matlab,相关向量机 (Relevance Vector Machine , RVM) 训练和预测的实现 – MATLAB中文论坛...

二手书交易系统外文文献

python实现FARS（Fuzzy Attribute Reduction System）算法，并给出具体案例

相关向量机 matlab,相关向量机 (Relevance Vector Machine , RVM) 训练和预测的实现 – MATLAB中文论坛

最新资源