Learning Binary Codes for Collaborative Filtering
Ke Zhou
College of Computing
Georgia Institute of Technology
Atlanta, GA 30032
kzhou@gatech.edu
Hongyuan Zha
College of Computing
Georgia Institute of Technology
Atlanta, GA 30032
zha@cc.gatech.edu
ABSTRACT
This paper tackles the efficiency problem of making recom-
mendations in the context of large user and item spaces.
In particular, we address the problem of learning binary
codes for collaborative filtering, which enables us to effi-
ciently make recommendations with t ime complexity that
is independent of the total number of items. We propose
to construct binary codes for users and items such that the
preference of users over items can be accurately preserved
by the Hamming distance between their respective binary
codes. By usin g two loss functions measuring th e degree of
divergence between the train in g and predicted ratings, we
formulate the problem of learning bina r y codes as a discrete
optimization problem. Alt h o u g h this optimization problem
is intractable in general, we develop effective relaxatio n s that
can be efficiently solved by existing methods. Moreover, we
investigate two methods to obtain the binary codes fro m
the relaxed solutions. Evaluations are conducted on three
public-domain data sets and the results suggest that our pro -
posed method outperforms several baseline alternatives.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Infor ma tio n
filtering; I.2.6 [Artificial Intelligence]: Lear n in g
General Terms
Algorithms, Performance, Experimentation
Keywords
Recommender systems, Collaborative filtering, Learning bi-
nary codes, Discrete optimization, Relaxed solutions
1. INTRODUCTION
With the rapid growth of E-commerce, hundred s of thou-
sands of products, ranging from books, mp3s to automobiles,
are so ld through online marketplaces nowadays. In addition,
millions of customers with diverse backgrounds and prefer-
ences make purchases online, generating great opportunities
as well as challenges for E-commerce compa n ies — How to
match products to their potential buyers not only accur a t ely
but also efficiently. Since collaborative filtering is an essen-
tial component for many existing recommendation systems,
it ha s been actively investigated by a wide range of previ-
ous studies to improve its a c c u r a c y [1, 19]. On the other
hand, due to the natu r e of their applicatio n s , collaborative
filtering systems are usually required to lea r n and predict
the preferences between a large number of users and items.
Therefore, for a given u s er , it is important to retrieve prod -
ucts that satisfy her preferences efficiently, leading to fast
response time and better user experience. Naturally, the
problem can be viewed as a similarity search problem where
we seek “similar” items for a given user. Recent studies show
that binary coding is a promising app r o a ch for fast s imila r ity
search [9, 13, 14, 17, 21]. The basic idea is to r ep r es ent data
points by binary codes that preserve the original similarities
between them. One significant advantage of this approach
is that the retrieval of similar data points c a n be conducted
by searching for data points within a small Hamming dis-
tance, which can be performed in time that is independent
of the total number of data [17]. However, no prior stud-
ies have b een focus ed on cons tr u c tin g binary codes for both
users and items in the context of collaborative filtering —
to the best of ou r knowledge — a gap we propose to fill in
this paper.
One key obstac le th a t hinders direct exploitation of the
existing approaches to learning binary codes to the collab-
orative filterin g context is that mos t of them assume the
similarities between any pairs o f data points are given ex-
plicitly, e.g., in the form of kernel functions or similarity
graphs [13, 21, 24]. However, in collaborative filtering, the
similarities between us ers and items are not known explic-
itly. In fact, the main goal of collaborative filtering algo-
rithms is to estimate and predict unobserved simila r ities
between users a n d items fro m the tra in in g data in order
to make recommend a tio n s . In this paper, we address the
problem of learning binary codes for collaborative filtering.
Specifically, we propose to learn compact yet effective binary
codes fo r both users and items from the training rating data.
Unlike previous works on learning binary codes, we do not
assume the similarity between users and items are known ex-
plicitly. Ther efo r e, the binary co d es we construct not only
accurately preserve the observed preferences of users, but
they also can be used to predict the unobserved preferences,
making the pro posed method conceptually unique compared
with the existing methods.
Our approach is based on the idea that the binary codes
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
KDD’12, August 12–16, 2012, Beijing, China.
Copyright 2012 ACM 978-1-4503-1462-6/12/08... $15.00.