xDeepFM: Combining Explicit and Implicit Feature Interactions
for Recommender Systems
Jianxun Lian
University of Science and Technology
of China
jianxun.lian@outlook.com
Xiaohuan Zhou
Beijing University of Posts and
Telecommunications
maggione@bupt.edu.cn
Fuzheng Zhang
Microsoft Research
fuzzhang@microsoft.com
Zhongxia Chen
University of Science and Technology
of China
czx87@mail.ustc.edu.cn
Xing Xie
Microsoft Research
xingx@microsoft.com
Guangzhong Sun
University of Science and Technology
of China
gzsun@ustc.edu.cn
ABSTRACT
Combinatorial features are essential for the success of many com-
mercial models. Manually crafting these features usually comes
with high cost due to the variety, volume and velocity of raw data
in web-scale systems. Factorization based models, which measure
interactions in terms of vector product, can learn patterns of com-
binatorial features automatically and generalize to unseen features
as well. With the great success of deep neural networks (DNNs)
in various elds, recently researchers have proposed several DNN-
based factorization model to learn both low- and high-order feature
interactions. Despite the powerful ability of learning an arbitrary
function from data, plain DNNs generate feature interactions im-
plicitly and at the bit-wise level. In this paper, we propose a novel
Compressed Interaction Network (CIN), which aims to generate
feature interactions in an explicit fashion and at the vector-wise
level. We show that the CIN share some functionalities with con-
volutional neural networks (CNNs) and recurrent neural networks
(RNNs). We further combine a CIN and a classical DNN into one
unied model, and named this new model eXtreme Deep Factor-
ization Machine (xDeepFM). On one hand, the xDeepFM is able
to learn certain bounded-degree feature interactions explicitly; on
the other hand, it can learn arbitrary low- and high-order feature
interactions implicitly. We conduct comprehensive experiments on
three real-world datasets. Our results demonstrate that xDeepFM
outperforms state-of-the-art models. We have released the source
code of xDeepFM at https://github.com/Leavingseason/xDeepFM.
CCS CONCEPTS
• Information systems → Personalization
;
• Computing method-
ologies → Neural networks; Factorization methods;
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
KDD ’18, August 19–23, 2018, London, United Kingdom
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-5552-0/18/08... $15.00
https://doi.org/10.1145/3219819.3220023
KEYWORDS
Factorization machines, neural network, recommender systems,
deep learning, feature interactions
ACM Reference Format:
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie,
and Guangzhong Sun. 2018. xDeepFM: Combining Explicit and Implicit
Feature Interactions for Recommender Systems. In KDD ’18: The 24th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining,
August 19–23, 2018, London, United Kingdom. ACM, New York, NY, USA,
10 pages. https://doi.org/10.1145/3219819.3220023
1 INTRODUCTION
Features play a central role in the success of many predictive sys-
tems. Because using raw features can rarely lead to optimal results,
data scientists usually spend a lot of work on the transformation of
raw features in order to generate best predictive systems [
14
,
24
]
or to win data mining games [
21
,
22
,
26
]. One major type of feature
transformation is the cross-product transformation over categorical
features [
5
]. These features are called cross features or multi-way
features, they measure the interactions of multiple raw features. For
instance, a 3-way feature
AND(user_organization=msra,
item_category=deeplearning, time=monday)
has value
1 if the user works at Microsoft Research Asia and is shown a tech-
nical article about deep learning on a Monday.
There are three major downsides for traditional cross feature en-
gineering. First, obtaining high-quality features comes with a high
cost. Because right features are usually task-specic, data scien-
tists need spend a lot of time exploring the potential patterns from
the product data before they become domain experts and extract
meaningful cross features. Second, in large-scale predictive systems
such as web-scale recommender systems, the huge number of raw
features makes it infeasible to extract all cross features manually.
Third, hand-crafted cross features do not generalize to unseen inter-
actions in the training data. Therefore, learning to interact features
without manual engineering is a meaningful task.
Factorization Machines (FM) [
32
] embed each feature
i
to a
latent factor vector
v
i
= [v
i1
, v
i2
, . . ., v
iD
]
, and pairwise feature
interactions are modeled as the inner product of latent vectors:
f
(2)
(i, j) = ⟨v
i
, v
j
⟩x
i
x
j
. In this paper we use the term bit to denote
a element (such as
v
i1
) in latent vectors. The classical FM can be
extended to arbitrary higher-order feature interactions [
2
], but one
arXiv:1803.05170v3 [cs.LG] 30 May 2018