AutoInt: Automatic Feature Interaction Learning via
Self-Aentive Neural Networks
Weiping Song
∗
Department of Computer Science,
School of EECS, Peking University
weiping.song@pku.edu.cn
Chence Shi
Department of Computer Science,
School of EECS, Peking University
chenceshi@pku.edu.cn
Zhiping Xiao
Department of Computer Science,
University of California, Los Angeles
patriciaxiao@g.ucla.edu
Zhijian Duan, Yewen Xu
Department of Computer Science,
School of EECS, Peking University
{zjduan,xuyewen}@pku.edu.cn
Ming Zhang
†
Department of Computer Science,
School of EECS, Peking University
mzhang_cs@pku.edu.cn
Jian Tang
†
Mila-Quebec AI Institute,
HEC Montreal & CIFAR AI Chair
jian.tang@hec.ca
ABSTRACT
Click-through rate (CTR) prediction, which aims to predict the
probability of a user clicking on an ad or an item, is critical to many
online applications such as online advertising and recommender
systems. The problem is very challenging since (1) the input features
(e.g., the user id, user age, item id, item category) are usually sparse
and high-dimensional, and (2) an eective prediction relies on high-
order combinatorial features (a.k.a. cross features), which are very
time-consuming to hand-craft by domain experts and are impossible
to be enumerated. Therefore, there have been eorts in nding low-
dimensional representations of the sparse and high-dimensional
raw features and their meaningful combinations.
In this paper, we propose an eective and ecient method called
the AutoInt to automatically learn the high-order feature interac-
tions of input features. Our proposed algorithm is very general,
which can be applied to both numerical and categorical input fea-
tures. Specically, we map both the numerical and categorical fea-
tures into the same low-dimensional space. Afterwards, a multi-
head self-attentive neural network with residual connections is
proposed to explicitly model the feature interactions in the low-
dimensional space. With dierent layers of the multi-head self-
attentive neural networks, dierent orders of feature combinations
of input features can be modeled. The whole model can be eciently
t on large-scale raw data in an end-to-end fashion. Experimental
results on four real-world datasets show that our proposed ap-
proach not only outperforms existing state-of-the-art approaches
for prediction but also oers good explainability. Code is available
at: https://github.com/DeepGraphLearning/RecommenderSystems.
∗
Part of this work was performed when the rst author was visiting Mila.
†
Corresponding authors.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
CIKM ’19, November 3–7, 2019, Beijing, China
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6976-3/19/11.. . $15.00
https://doi.org/10.1145/3357384.3357925
CCS CONCEPTS
• Information systems → Recommender systems
;
• Comput-
ing methodologies → Neural networks
; Learning latent repre-
sentations;
KEYWORDS
High-order feature interactions, Self attention, CTR prediction,
Explainable recommendation
ACM Reference Format:
Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming
Zhang, and Jian Tang. 2019. AutoInt: Automatic Feature Interaction Learn-
ing via Self-Attentive Neural Networks. In The 28th ACM International
Conference on Information and Knowledge Management (CIKM ’19), No-
vember 3–7, 2019, Beijing, China. ACM, New York, NY, USA, 10 pages.
https://doi.org/10.1145/3357384.3357925
1 INTRODUCTION
Predicting the probabilities of users clicking on ads or items (a.k.a.,
click-through rate prediction) is a critical problem for many appli-
cations such as online advertising and recommender systems [
8
,
10
,
15
]. The performance of the prediction has a direct impact on
the nal revenue of the business providers. Due to its importance,
it has attracted growing interest in both academia and industry
communities.
Machine learning has been playing a key role in click-through
rate prediction, which is usually formulated as supervised learn-
ing with user proles and item attributes as input features. The
problem is very challenging for several reasons. First, the input fea-
tures are extremely sparse and high-dimensional [
8
,
11
,
13
,
21
,
32
].
In real-world applications, a considerable percentage of user’s de-
mographics and item’s attributes are usually discrete and/or cat-
egorical. To make supervised learning methods applicable, these
features are rst converted to a one-hot encoding vector, which
can easily result in features with millions of dimensions. Taking
the well-known CTR prediction data Criteo
1
as an example, the
feature dimension is approximately 30 million with sparsity over
99.99%. With such sparse and high-dimensional input features, the
machine learning models are easily overtted. Second, as shown in
extensive literature [
8
,
11
,
19
,
32
], high-order feature interactions
2
1
http://labs.criteo.com/2014/09/kaggle-contest-dataset-now-available-academic-use/
2
In this paper, we will use “combinatorial feature” and “feature interaction” inter-
changeably as they are both used in the literature [11, 19, 32] .
arXiv:1810.11921v2 [cs.IR] 23 Aug 2019