arXiv:1801.02294v3 [stat.ML] 21 May 2018
Learning Tree-based Deep Model for Recommender Systems
Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, Kun Gai
Alibaba Group
{zhuhan.zh,yushi.lx,pengye.zpy,guozheng .lgz,jay.hj,lihan.lh,jingshi.gk}@alibaba-inc.c om
ABSTRACT
Model-based methods for recommender systems have been stud-
ied extensively in recent years. In systems with large corpus, how-
ever, the calculation cost for the learnt model to predict all user-
item preferences is tremendous, which makes full corpus retrieval
extremely difficult. To overcome the calculation barriers, models
such as matrix factorization resort to inner p roduct form (i.e., model
user-item preference as the inner product of user, item latent fac-
tors) and indexes to facilitate efficient approximate k-nearest neigh-
bor searches. However, it still remains challenging to incorporate
more expressive interaction forms between user and item features,
e.g., interactions through deep neural networks, bec ause of the cal-
culation cost.
In t his paper, we focus on the problem of introducing arbitrary
advanced models to recommender systems with lar ge corpus. We
propose a novel tree-based metho d which can provide logarithmic
complexity w.r.t. corpus size even with more expressive models
such as deep neural networks. Our main idea is to predict user in-
terests from coarse to fine by traversing tree nodes in a top-down
fashion and making decisions for each user-node pair. We also
show that the tree structure can be jointly learnt towards better
compatibility with users’ interest distribution and hence facilitate
both training and prediction. Experimental evaluations with two
large-scale real-world datasets show that the prop osed method sig-
nificantly outperforms traditional method s. Online A/B test results
in Taobao display advertising platform also demonstrate the effec-
tiveness of the proposed method in production environments.
CCS CONCEPTS
• Computing methodologies → Classification and regression
trees; Neural networks; • Information systems → Recom-
mender systems;
KEYWORDS
Tree-based Learning, Recommender Systems, Implicit Feedback
1 INTRODUCTION
Recommendation has been widely used by various kinds of content
providers. Personalized reco mmendation method, base d on t he in-
tuition t hat users’ interests can b e inferred from their historical
behaviors or other users with similar preference, has been proven
to be effective in YouTube [7] and Amazon [22].
Designing such a recommendation model to predict the best
candidate set from the entire corpus for each user has many chal-
lenges. In systems with enormous corpus, some well-performed
recommendation algorithms may fail to predict from the entire
corpus. The linear prediction complexity w.r.t. the corpus size is
unacceptable. Deploying such large-scale recommender system re-
quires the amount of calculation to predict for each single user
be limited. And besides preciseness, the novelty of recommended
items should also be responsible for user experience. Results that
only contain homogeneous items with user’s historical behaviors
are not expected.
To reduce the amount of calculation and handle enormous cor-
pus, memory-based collaborative filtering methods are widely de-
ployed in industry [22]. As a representative method in co llabo-
rative filtering family, item-based collaborative fi ltering [31] can
recommend from very large corpus with relatively much fewer
computations, depending on the pre-calculated similarity between
item pairs and using user’s historical behaviors as triggers to recall
those most similar items. However, there exists restriction on the
scope of candidate set, i.e., not all items but only items similar to
the triggers can be ultimately recommended. This intuitio n pre-
vents the recommender system from jumping out of historical be-
havior to explore potential user interests, which limits the accu racy
of recalled results. And in practice the recommendation novelty is
also criticized. Another way to reduce calculation is making co ar se-
grained recommendation. For example, the system recommends a
small number of item categories for users and picks out all corre-
sponding it ems, with a fol lowing ranking stage. However, for large
corpus, the calculatio n problem is still not solved. If the category
number is large, the category recommendation itself also me ets the
calculation barrier. If not, some categories will inevitably include
too many items, making the following ranking calculation imprac-
ticable. Besides, the used categories are usually not designed for
recommendation problem, which can seriously harm the recom-
mendation accuracy.
In the literatures of recommender systems, model-based meth-
ods are an active t opic. Models such as matrix factorization (MF)
[19, 30] try to decompose pairwise user-item preferences (e.g., rat-
ings) into u ser and item factors, then reco mmend to each user its
most preferred items. Factorization machine (FM) [28] further pro-
poses a unified model that can mimic different factorization models
with any kind of input data. In some real-world scenarios that have
no explicit preference but only implicit user feedback (e.g., user
behaviors like clicks or pu rchases), Bayesian personalized ranking
[29] gives a solution that formulates the preference in triplets with
partial order, and applies it to MF models. In industry, YouTube
uses deep neural network [7] to learn both user and item’s embed-
dings, where two kinds of embeddings are generated from their
corresponding features separately. In all the above kinds of meth-
ods, the preference of user-item pair can be formulated as the in-
ner product of user and item’s vector representations. The predic-
tion stage thus is equivalent to retrieve user vector’s nearest neigh-
bors in inner product space. For vector search problem, indices like
hashing or quantization [18] for approximate k-nearest neighbor
(kNN) search can ensure the efficiency of retrieval.
However, the inner product interaction form between user and
item’s vector representations severely limits model’s capability. There