CosRec: 2D Convolutional Neural Networks for Sequential
Recommendation
An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, Julian McAuley
University of California, San Diego
{ayan,scheng,wckang,m5wan,jmcauley}@ucsd.edu
ABSTRACT
Sequential patterns play an important role in building modern rec-
ommender systems. To this end, several recommender systems have
been built on top of Markov Chains and Recurrent Models (among
others). Although these sequential models have proven successful
at a range of tasks, they still struggle to uncover complex relation-
ships nested in user purchase histories. In this paper, we argue
that modeling pairwise relationships directly leads to an ecient
representation of sequential features and captures complex item
correlations. Specically, we propose a 2D convolutional network
for sequential recommendation (
CosRec
). It encodes a sequence
of items into a three-way tensor; learns local features using 2D
convolutional lters; and aggregates high-order interactions in a
feedforward manner.
Quantitative results on two public datasets show that our method
outperforms both conventional methods and recent sequence-based
approaches, achieving state-of-the-art performance on various eval-
uation metrics.
ACM Reference Format:
An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, Julian McAuley.
2019. CosRec: 2D Convolutional Neural Networks for Sequential Recom-
mendation. In The 28th ACM International Conference on Information and
Knowledge Management (CIKM’19), November 3–7, 2019, Beijing, China. ACM,
New York, NY, USA, 4 pages. https://doi.org/10.1145/3357384.3358113
1 INTRODUCTION
The goal of sequential recommendation is to predict users’ future
behavior based on their historical action sequences. Dierent from
traditional personalized recommendation algorithms (e.g. Matrix
Factorization [
10
]) which seek to capture users’ global tastes, se-
quential models introduce additional behavioral dynamics by taking
the order of users’ historical actions into consideration.
A classic line of work to model such dynamics is based on Markov
Chains (MCs), which assumes that a user’s next interaction is de-
rived from the preceding few actions only [
3
,
12
]. Recently, many
neural network based approaches have achieved success on this
task, where users’ complete interaction sequences can be incor-
porated through Recurrent Neural Networks (RNNs) [
5
] or Con-
volutional Neural Networks (CNNs) [
14
]. Note that most existing
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
CIKM ’19, November 3–7, 2019, Beijing, China
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6976-3/19/11.. .$15.00
https://doi.org/10.1145/3357384.3358113
camera SD card lens tripod
… …
… …
RNNs/CNNs/etc.
locally concentrated pattern
(a)
camera bike lens tripod
… …
… …
2D CNN
Pair-Wise Encoding
camera
lens
(b)
Figure 1: Illustrations of (a) locally concentrated dynamics
and how they are preserved in existing models; and (b) an
example where ‘skip’ behavior (bike) exists between two
closely related items (camera and lens), and how this pattern
is preserved by the proposed framework.
models operate on ordered item representations directly, and thus
are constrained by the one-directional chain-structure of action
sequences. This leads to one advantage that these algorithms are
capable of preserving locally concentrated dynamics, e.g. as shown
in Figure 1a: consecutive purchases of a camera, a memory card,
and a camera lens may lead to a strong indication of buying a tripod.
In this paper
, we surprisingly nd that relaxing the above struc-
ture constraint may yield more eective recommendations. Speci-
cally, we propose a 2D CNN-based framework—2D
co
nvolutional
networks for
s
equential
rec
ommendation (
CosRec
). In particular,
we enable interactions among nonadjacent items by introducing a
simple but eective pairwise encoding module. As shown in Fig-
ure 1b, the ‘skip’ behavior within item sequences (i.e., the purchase
of a bike is less relevant to the context of consuming photography
products) may break the locality of the chain-structure but can be
easily bypassed through this pairwise encoding. On account of this
module, we show that standard 2D convolutional kernels can be
applied to solve sequential recommendation problems, where small
lters (e.g. 3
×
3) can be successfully incorporated. This also allows
us to build an extendable 2D CNN framework, which can be easily
adapted to either shallow or deep structures for dierent tasks.
2 RELATED WORK
Sequential recommendation methods typically seek to capture
sequential patterns among previously consumed items, in order
to accurately predict the next item. To this end, various mod-
els have been adopted, including Markov Chains (MCs) [
12
], Re-
current Neural Networks (RNNs) [
5
], Temporal Convolutional
Networks (TCNs) [
15
], and Self Attention [
7
,
13
], among others.
arXiv:1908.09972v1 [cs.IR] 27 Aug 2019