Facebook搜索中的嵌入式检索技术

需积分: 15 37 浏览量更新于2024-08-30 收藏 1.86MB PDF 举报

“Embedding-based Retrieval in Facebook Search” 在Facebook搜索中，嵌入式检索（Embedding-based Retrieval，EBR）技术的应用为提供个性化、相关的搜索结果带来了新的突破。传统的布尔匹配模型虽然能处理基本的查询文本，但在社交网络搜索这种复杂的环境中，考虑到用户的上下文，尤其是他们的社交图谱，其效能受到了限制。Facebook搜索开始引入EBR，以利用语义嵌入来更好地理解和关联用户查询与内容。 EBR的核心在于将文本数据（如用户查询、帖子、评论等）转化为低维向量空间中的“嵌入”（embeddings），这些向量能够捕捉到文本的语义信息。这使得机器可以理解文本之间的关系，而不仅仅是基于关键词的匹配。在Facebook的场景下，这种个性化搜索的嵌入框架是关键，因为它考虑了用户的社会关系和行为模式，从而能为每个用户提供更为精准的搜索结果。 Facebook开发的统一嵌入框架结合了用户个人资料、社交网络结构以及用户的行为历史，构建了个性化语义嵌入。具体来说，这个框架可能包括以下步骤： 1. **预训练模型**: 使用大规模无标注数据，如用户生成的内容，通过自监督学习或预训练任务（如 masked language model 或 next sentence prediction）生成基础的文本嵌入。 2. **个性化增强**: 结合用户的社交网络信息（例如，朋友列表、兴趣爱好、互动记录等），对预训练模型进行微调，使嵌入更适应用户的特定上下文。 3. **查询和内容嵌入**: 对用户输入的查询和Facebook上的内容（如帖子、图片描述等）分别生成嵌入向量。 4. **相似度计算**: 使用余弦相似度或其他距离度量方法，在嵌入空间中比较查询和内容向量，找出最匹配的结果。 5. **检索优化**: 为了处理海量数据，可能需要采用近似最近邻算法（Approximate Nearest Neighbor, ANN）来加速检索过程，同时保持较高的召回率和准确性。 6. **反馈循环**: 用户的搜索行为和点击反馈可以进一步用于优化模型，提升未来搜索结果的相关性。 7. **系统集成**: EBR系统需要与现有的索引和排序系统无缝集成，以确保整体性能和用户体验。通过这样的嵌入式检索，Facebook不仅能提供更相关的信息，还可以识别出用户的潜在需求，比如推荐用户可能感兴趣的朋友、群组或者事件。此外，由于嵌入向量可以捕获语义关系，搜索系统也能更好地处理模糊查询和多义词问题。总结起来，Facebook在搜索中应用EBR技术，是为了提升搜索的智能化和个性化水平，通过理解和利用用户的社会网络信息，为用户提供更为精准和丰富的搜索体验。这一技术的发展对于社交媒体平台来说具有重要的实践价值，也代表了现代搜索引擎技术的一个重要发展方向。

Embedding-based Retrieval in Facebook Search

Jui-Ting Huang

juiting@fb.com

Facebook Inc.

Ashish Sharma

ashishsharma@fb.com

Facebook Inc.

Shuying Sun

shuyingsun@fb.com

Facebook Inc.

Li Xia

xiali824@fb.com

Facebook Inc.

David Zhang

shihaoz@fb.com

Facebook Inc.

Philip Pronin

philipp@fb.com

Facebook Inc.

Janani Padmanabhan

jananip@fb.com

Facebook Inc.

Giuseppe Ottaviano

ott@fb.com

Facebook Inc.

Linjun Yang

∗

yang.linjun@microsoft.com

Microsoft

ABSTRACT

Search in social networks such as Facebook poses dierent chal-

lenges than in classical web search: besides the query text, it is

important to take into account the searcher’s context to provide

relevant results. Their social graph is an integral part of this context

and is a unique aspect of Facebook search. While embedding-based

retrieval (EBR) has been applied in eb search engines for years,

Facebook search was still mainly based on a Boolean matching

model. In this paper, we discuss the techniques for applying EBR

to a Facebook Search system. We introduce the unied embedding

framework developed to model semantic embeddings for person-

alized search, and the system to serve embedding-based retrieval

in a typical search system based on an inverted index. We discuss

various tricks and experiences on end-to-end optimization of the

whole system, including ANN parameter tuning and full-stack opti-

mization. Finally, we present our progress on two selected advanced

topics about modeling. We evaluated EBR on verticals

for Face-

book Search with signicant metrics gains observed in online A/B

experiments. We believe this paper will provide useful insights

and experiences to help people on developing embedding-based

retrieval systems in search engines.

CCS CONCEPTS

• Information systems → Retrieval models and ranking

; Search

engine architectures and scalability;

• Computing methodologies

→ Learning latent representations.

KEYWORDS

Embedding, deep learning, search, information retrieval

ACM Reference Format:

Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip

Pronin, Janani Padmanabhan, Giuseppe Ottaviano, and Linjun Yang. 2020.

∗

This work was performed when the author was at Facebook.

In Facebook search, verticals are based on result types, e.g., people, page, group, etc.

KDD ’20, August 23–27, 2020, Virtual Event, CA, USA

ACM ISBN 978-1-4503-7998-4/20/08.

https://doi.org/10.1145/3394486.3403305

Embedding-based Retrieval in Facebook Search. In Proceedings of the 26th

ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD

’20), August 23–27, 2020, Virtual Event, CA, USA. ACM, New York, NY, USA,

9 pages. https://doi.org/10.1145/3394486.3403305

1 INTRODUCTION

Search engines have been an important tool to help people access

the huge amount of information online. Various techniques have

been developed to improve search quality in the last decades, espe-

cially in web search engines including Bing and Google. Since it is

dicult to accurately compute the search intent from query text

and represent the semantic meaning of documents, search tech-

niques are mostly based on various term matching methods [

which performs well for the cases that keyword match can address.

It still remains a challenging problem for semantic matching [

which is to address desired results that are not exact match of the

query text but can satisfy users’ search intent.

In the last years, deep learning has made signicant progress

in speech recognition, computer vision, and natural language un-

derstanding [

]. Among them embedding, which is also called

representation learning, has been proven to be successful techniques

contributing to the success [

]. In essence, embedding is a way to

represent a sparse vector of ids as a dense feature vector, which

is also called semantic embedding in that it can often learn the

semantics. Once the embeddings are learned, it can be used as a

representation of query and documents to apply in various stages of

a search engine. Due to the huge success of this technique in other

domains including computer vision and recommendation system,

it has been an active research topic in information retrieval com-

munity and search engine industry as the next generation search

technology [13].

In general, a search engine comprises a recall layer targeting to

retrieve a set of relevant documents in low latency and computa-

tional cost, usually called retrieval , and a precision layer targeting

to rank the most desired documents on the top with more complex

algorithms or models, usually called ranking. While embeddings

can be applied to both layers, it usually has more opportunities to

leverage embeddings in the retrieval layer, since it is at the bottom

of the system which is often the bottleneck. The application of

embeddings in retrieval is called embedding-based retrieval or EBR

for short. Briey, embedding-based retrieval is a technique to use

embeddings to represent query and documents, and then convert

Applied Data Science Track Paper

KDD '20, August 23–27, 2020, Virtual Event, USA

2553

This work is licensed under a Creative Commons Attribution International 4.0 License.

下载后可阅读完整内容，剩余8页未读，立即下载

做一个有趣的程序员

粉丝: 0
资源: 3

Facebook搜索中的嵌入式检索技术

经过处理的腾讯中文词汇/短语向量 tencent-ailab-embedding-zh-d200-v0.2.0-s

修改了deepwalk代码的GraphEmbedding-master

CNN-Prediction-Based-Reversible-Data-Hiding-main (1).zip

给出embedding-layer示例

tencent-ailab-embedding-zh-d200 加载速度慢

text-embedding-3-large 如何使用

Embedding-GRU

给出embedding-layer函数的示例

bce-embedding-base_v1模型属于bert模型嘛？

Embedding-2

最新资源