Collaborative Autoencoder for Recommender Systems
Qibing Li
College of Computer Science
Zhejiang University
qblee@zju.edu.cn
Xiaolin Zheng
College of Computer Science
Zhejiang University
xlzheng@zju.edu.cn
Xinyue Wu
College of Computer Science
Zhejiang University
wxinyue@zju.edu.cn
ABSTRACT
In recent years, deep neural networks have yielded state-of-the-art
performance on several tasks. Although some recent works have
focused on combining deep learning with recommendation, we
highlight three issues of existing works. First, most works perform
deep content feature learning and resort to matrix factorization,
which cannot eectively model the highly complex user-item
interaction function. Second, due to the diculty on training deep
neural networks, existing models utilize a shallow architecture, and
thus limit the expressive potential of deep learning. Third, neural
network models are easy to overt on the implicit setting, because
negative interactions are not taken into account. To tackle these
issues, we present a generic recommender framework called Neural
Collaborative Autoencoder (NCAE) to perform collaborative ltering,
which works well for both explicit feedback and implicit feedback.
NCAE can eectively capture the relationship between interactions
via a non-linear matrix factorization process. To optimize the
deep architecture of NCAE, we develop a three-stage pre-training
mechanism that combines supervised and unsupervised feature
learning. Moreover, to prevent overtting on the implicit setting,
we propose an error reweighting module and a sparsity-aware
data-augmentation strategy. Extensive experiments on three real-
world datasets demonstrate that NCAE can signicantly advance
the state-of-the-art.
KEYWORDS
Recommender System, Collaborative Filtering, Neural Network,
Deep Learning
1 INTRODUCTION
In recent years, recommender systems (RS) have played an
signicant role in E-commerce services. A good recommender
system may enhance both satisfaction for users and prot for
content providers. For example, nearly 80% of movies watched on
Netix are recommended by RS [
6
]. The key to design such a system
is to predict users’ preference on items based on past activities,
which is known as collaborative ltering (CF) [
27
]. Among the
various CF methods, matrix factorization (MF) [
10
,
12
,
14
,
29
] is the
most used one, which models the user-item interaction function
as the inner product of user latent vector and item latent vector.
Due to the eectiveness of MF, many integrated models have been
devised, such as CTR [
35
], HFT [
21
] and timeSVD [
13
]. However,
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
XXX, 2017
© 2017 Copyright held by the owner/author(s).
ACM ISBN X-XXXXX-XX-X/XX/XX.. . $00.00
https://doi.org/00.0000/0000
MF-based models cannot capture subtle hidden factors, since the
inner product is not sucient for capturing the complex inherent
structure of interaction data [7].
Currently, a trend in the recommendation literature is the
utilization of deep learning to handle the auxiliary information
[
19
,
36
,
37
] or directly model the interaction function [
7
,
28
,
32
,
38
].
Thus, based on these two usage scenarios, deep learning based
recommender systems can be roughly categorized into integration
models and neural network models [
41
]. Integration models utilize
deep neural networks to extract the hidden features of auxiliary
information. The features are then integrated into the CF framework
to perform hybrid recommendation. For example, Collaborative
Deep Learning (CDL) [
36
] integrates Stack Denoising Autoencoder
(SDAE) [
34
] and PMF [
22
] into a unied probabilistic graph model
to jointly perform deep content feature learning and collaborative
ltering. Although integration models involve both deep learning
and CF, they actually belong to MF-based models because they use
an inner product to model the interaction data, and thus face the
same issue like MF.
On the other hand, neural network models directly perform
collaborative ltering via utilizing the interaction data. Due to the
eectiveness of deep components, neural network models are able
to discover the non-linear hidden relationships from data [
17
,
32
].
For example, Collaborative Filtering Network (CFN) [
32
] is a state-
of-the-art model for explicit feedback, which utilizes DAE [
33
]
to encode sparse user/item preferences and aims to reconstruct
them in the decoder layer. However, we notice that existing models
do not exploit the representation power of deep architectures,
i.e., generally a shallow network structure is used. This is mainly
caused by two reasons. First, without a proper pre-training strategy,
training deep neural networks is dicult [
5
]. Second, due to the
sparse nature of RS, conventional layer-wise unsupervised pre-
training [
2
,
9
] does not work in this case
1
. Besides, existing models
primarily focus on explicit feedback, formulating it as a regression
problem, where negative interactions are not taken into account.
Thus, these models are easy to overt on the implicit setting, since
the models may learn to predict all ratings as 1. Although there
exist implicit neural models like CDAE [
38
] and NCF [
7
] that
sample negative feedback from unobserved data, these sample-
based models may fall into poor local optimum due to huge item
space and data sparsity (see Sec. 4.3).
To address the aforementioned issues, we present a new deep
learning based recommender framework called Neural Collaborative
Autoencoder (NCAE) for both explicit feedback and implicit feed-
back. By utilizing a sparse forward module and a sparse backward
module, NCAE is scalable to large datasets and robust to sparse
data. The central idea of NCAE is to learn hidden structures that
can reconstruct user/item preferences via a non-linear matrix
1
Supervised pre-training in the rst hidden layer is critical to the performance, since
unsupervised reconstruction method may lose user/item information. (see Sec. 4.2)
arXiv:1712.09043v2 [cs.LG] 30 Jan 2018