大规模推荐系统中的二进制编码学习

需积分: 9 33 浏览量更新于2024-09-10 收藏 572KB PDF 举报

"这篇论文《Learning Binary Codes for Collaborative Filtering》关注的是在大规模用户和物品空间下推荐系统的效率问题。作者提出了一种学习二进制编码的方法，用于协同过滤，使得推荐的时间复杂度与物品总数无关。通过构建用户的二进制编码，可以利用汉明距离精确地保留用户对物品的偏好。他们通过两个损失函数来衡量训练和预测评分之间的差异，将学习二进制编码的问题转化为离散优化问题，并提出了有效的松弛解法，这些解法能够被现有方法高效解决。实验结果显示，该方法在三个公开数据集上的表现优于几个基线方法。" 本文主要讨论的是如何改进推荐系统在处理大量用户和物品时的效率。传统的协同过滤算法在面对大规模数据时，计算量和时间复杂度通常会随着物品数量的增加而显著增加，这限制了其在实际应用中的效果。为了解决这个问题，作者引入了二进制编码（Binary Codes）的概念。二进制编码是一种将高维数据压缩成固定长度二进制串的技术，它在保持数据关键特征的同时，减少了表示和比较数据所需的计算资源。在推荐系统中，每个用户和物品都被表示为一个二进制串，用户的偏好可以通过计算其与物品二进制串之间的汉明距离来判断。汉明距离是衡量两个二进制串差异的指标，距离越小，表示两者相似度越高。为了学习这些二进制编码，作者采用了两个损失函数，这两个函数分别量化了训练数据和预测评分之间的不一致程度。这使得优化过程的目标是找到一组二进制编码，使得基于汉明距离的预测结果尽可能接近原始的用户评分。然而，由于这个问题本质上是一个离散优化问题，直接求解非常困难。因此，他们发展了有效的松弛策略，将原问题转化为可连续优化的问题，从而能够应用现有的优化算法求解。在实际应用中，将松弛解转化为二进制编码有两种方法，这些方法在保持预测精度的同时，确保了编码的离散性。通过对三个公开数据集的评估，提出的这种方法在推荐准确性上优于其他基础算法，验证了其在处理大规模推荐问题时的有效性和优势。这篇文章提出的二进制编码方法为解决大规模推荐系统中的效率挑战提供了一个创新的解决方案，通过降低计算复杂度，提高了推荐系统在大数据环境下的性能。

Learning Binary Codes for Collaborative Filtering

Ke Zhou

College of Computing

Georgia Institute of Technology

Atlanta, GA 30032

kzhou@gatech.edu

Hongyuan Zha

College of Computing

Georgia Institute of Technology

Atlanta, GA 30032

zha@cc.gatech.edu

ABSTRACT

This paper tackles the eﬃciency problem of making recom-

mendations in the context of large user and item spaces.

In particular, we address the problem of learning binary

codes for collaborative ﬁltering, which enables us to eﬃ-

ciently make recommendations with t ime complexity that

is independent of the total number of items. We propose

to construct binary codes for users and items such that the

preference of users over items can be accurately preserved

by the Hamming distance between their respective binary

codes. By usin g two loss functions measuring th e degree of

divergence between the train in g and predicted ratings, we

formulate the problem of learning bina r y codes as a discrete

optimization problem. Alt h o u g h this optimization problem

is intractable in general, we develop eﬀective relaxatio n s that

can be eﬃciently solved by existing methods. Moreover, we

investigate two methods to obtain the binary codes fro m

the relaxed solutions. Evaluations are conducted on three

public-domain data sets and the results suggest that our pro -

posed method outperforms several baseline alternatives.

Categories and Subject Descriptors

H.3.3 [Information Search and Retrieval]: Infor ma tio n

ﬁltering; I.2.6 [Artiﬁcial Intelligence]: Lear n in g

General Terms

Algorithms, Performance, Experimentation

Keywords

Recommender systems, Collaborative ﬁltering, Learning bi-

nary codes, Discrete optimization, Relaxed solutions

1. INTRODUCTION

With the rapid growth of E-commerce, hundred s of thou-

sands of products, ranging from books, mp3s to automobiles,

are so ld through online marketplaces nowadays. In addition,

millions of customers with diverse backgrounds and prefer-

ences make purchases online, generating great opportunities

as well as challenges for E-commerce compa n ies — How to

match products to their potential buyers not only accur a t ely

but also eﬃciently. Since collaborative ﬁltering is an essen-

tial component for many existing recommendation systems,

it ha s been actively investigated by a wide range of previ-

ous studies to improve its a c c u r a c y [1, 19]. On the other

hand, due to the natu r e of their applicatio n s , collaborative

ﬁltering systems are usually required to lea r n and predict

the preferences between a large number of users and items.

Therefore, for a given u s er , it is important to retrieve prod -

ucts that satisfy her preferences eﬃciently, leading to fast

response time and better user experience. Naturally, the

problem can be viewed as a similarity search problem where

we seek “similar” items for a given user. Recent studies show

that binary coding is a promising app r o a ch for fast s imila r ity

search [9, 13, 14, 17, 21]. The basic idea is to r ep r es ent data

points by binary codes that preserve the original similarities

between them. One signiﬁcant advantage of this approach

is that the retrieval of similar data points c a n be conducted

by searching for data points within a small Hamming dis-

tance, which can be performed in time that is independent

of the total number of data [17]. However, no prior stud-

ies have b een focus ed on cons tr u c tin g binary codes for both

users and items in the context of collaborative ﬁltering —

to the best of ou r knowledge — a gap we propose to ﬁll in

this paper.

One key obstac le th a t hinders direct exploitation of the

existing approaches to learning binary codes to the collab-

orative ﬁlterin g context is that mos t of them assume the

similarities between any pairs o f data points are given ex-

plicitly, e.g., in the form of kernel functions or similarity

graphs [13, 21, 24]. However, in collaborative ﬁltering, the

similarities between us ers and items are not known explic-

itly. In fact, the main goal of collaborative ﬁltering algo-

rithms is to estimate and predict unobserved simila r ities

between users a n d items fro m the tra in in g data in order

to make recommend a tio n s . In this paper, we address the

problem of learning binary codes for collaborative ﬁltering.

Speciﬁcally, we propose to learn compact yet eﬀective binary

codes fo r both users and items from the training rating data.

Unlike previous works on learning binary codes, we do not

assume the similarity between users and items are known ex-

plicitly. Ther efo r e, the binary co d es we construct not only

accurately preserve the observed preferences of users, but

they also can be used to predict the unobserved preferences,

making the pro posed method conceptually unique compared

with the existing methods.

Our approach is based on the idea that the binary codes

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that

copies bear this notice and the full citation on the first page. To copy

otherwise, or republish, to post on servers or to redistribute to lists,

requires prior specific permission and/or a fee.

KDD’12, August 12–16, 2012, Beijing, China.

498

下载后可阅读完整内容，剩余8页未读，立即下载

baidu_25358777

粉丝: 0
资源: 1

大规模推荐系统中的二进制编码学习

Deep Hashing for Compact Binary Codes Learning

Jointly Learning Binary Code for Large-scale Face Image Retrieval and Attributes

Learning to rank binary codes

Optimal binary codes from one-lee weight codes and two-lee weight projective codes over Z_4

Binary_Codes.pdf

Learning Linux Binary Analysis

Learning Deep Models for Face Anti-Spoofing Binary or Auxiliary Supervision.zip

Learning Linux Binary Analysis epub

Learning Linux Binary Analysis mobi

A class of binary cyclic codes and sequence families

最新资源