自注意力神经网络的自动特征交互学习：AutoInt方法 - CSDN文库

需积分: 5 156 浏览量更新于2024-08-10 收藏 1.52MB PDF 举报

自动特征交互学习（AutoInt）是一项前沿的机器学习技术，特别是在点击率预测（Click-through Rate, CTR）领域中发挥着关键作用。CTR预测是在线广告和推荐系统等应用中的核心任务，其目标是估计用户对广告或商品的点击概率。这项工作的挑战性主要体现在两个方面：首先，输入特征通常具有高维且稀疏的特性。例如，这些特征可能包括用户ID、用户年龄、商品ID以及商品类别等，这些数据在大规模用户和商品的场景下，大部分情况下不是频繁出现的，这就增加了模型处理和理解这些信息的复杂性。其次，有效的预测依赖于高阶特征交互。传统的线性模型往往只能捕捉到特征之间的简单关系，而真实用户行为往往涉及到多个特征间的复杂相互作用，如用户兴趣的交叉、商品属性的组合效应等。这就需要一种方法能够自动学习并捕捉这些复杂的特征组合模式，以提高预测精度。 AutoInt通过利用自注意力神经网络（Self-Attentive Neural Networks, SANs）来解决这一问题。自注意力机制允许模型在处理输入时动态地关注和组合不同特征的重要性，这使得它能够有效地捕捉和量化特征之间的潜在交互。这种方法不仅能够处理高维稀疏数据，还能适应不同特征之间的复杂关系，从而显著提升CTR预测的性能。论文作者们——来自北京大学和加州大学洛杉矶分校的研究者们，Weiping Song、Chen Shi、Zhiping Xiao、Zhijian Duan、Yewen Xu、Ming Zhang和Jian Tang，共同提出了AutoInt模型，他们通过实验验证了该方法在多项基准数据集上的优秀表现，并展示了其在实际应用中的潜力。他们的工作对于理解用户行为、优化广告投放策略以及改进推荐系统的个性化推荐能力具有重要意义，同时也为后续研究提供了新的思路和方法论。AutoInt代表了一种创新的深度学习技术，对于理解和提升在线平台的用户体验具有重大价值。

AutoInt: Automatic Feature Interaction Learning via

Self-Aentive Neural Networks

Weiping Song

∗

Department of Computer Science,

School of EECS, Peking University

weiping.song@pku.edu.cn

Chence Shi

Department of Computer Science,

School of EECS, Peking University

chenceshi@pku.edu.cn

Zhiping Xiao

Department of Computer Science,

University of California, Los Angeles

patriciaxiao@g.ucla.edu

Zhijian Duan, Yewen Xu

Department of Computer Science,

School of EECS, Peking University

{zjduan,xuyewen}@pku.edu.cn

Ming Zhang

†

Department of Computer Science,

School of EECS, Peking University

mzhang_cs@pku.edu.cn

Jian Tang

†

Mila-Quebec AI Institute,

HEC Montreal & CIFAR AI Chair

jian.tang@hec.ca

ABSTRACT

Click-through rate (CTR) prediction, which aims to predict the

probability of a user clicking on an ad or an item, is critical to many

online applications such as online advertising and recommender

systems. The problem is very challenging since (1) the input features

(e.g., the user id, user age, item id, item category) are usually sparse

and high-dimensional, and (2) an eective prediction relies on high-

order combinatorial features (a.k.a. cross features), which are very

time-consuming to hand-craft by domain experts and are impossible

to be enumerated. Therefore, there have been eorts in nding low-

dimensional representations of the sparse and high-dimensional

raw features and their meaningful combinations.

In this paper, we propose an eective and ecient method called

the AutoInt to automatically learn the high-order feature interac-

tions of input features. Our proposed algorithm is very general,

which can be applied to both numerical and categorical input fea-

tures. Specically, we map both the numerical and categorical fea-

tures into the same low-dimensional space. Afterwards, a multi-

head self-attentive neural network with residual connections is

proposed to explicitly model the feature interactions in the low-

dimensional space. With dierent layers of the multi-head self-

attentive neural networks, dierent orders of feature combinations

of input features can be modeled. The whole model can be eciently

t on large-scale raw data in an end-to-end fashion. Experimental

results on four real-world datasets show that our proposed ap-

proach not only outperforms existing state-of-the-art approaches

for prediction but also oers good explainability. Code is available

at: https://github.com/DeepGraphLearning/RecommenderSystems.

∗

Part of this work was performed when the rst author was visiting Mila.

†

Corresponding authors.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

CIKM ’19, November 3–7, 2019, Beijing, China

© 2019 Association for Computing Machinery.

ACM ISBN 978-1-4503-6976-3/19/11.. . $15.00

https://doi.org/10.1145/3357384.3357925

CCS CONCEPTS

• Information systems → Recommender systems

;

• Comput-

ing methodologies → Neural networks

; Learning latent repre-

sentations;

KEYWORDS

High-order feature interactions, Self attention, CTR prediction,

Explainable recommendation

ACM Reference Format:

Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming

Zhang, and Jian Tang. 2019. AutoInt: Automatic Feature Interaction Learn-

ing via Self-Attentive Neural Networks. In The 28th ACM International

Conference on Information and Knowledge Management (CIKM ’19), No-

vember 3–7, 2019, Beijing, China. ACM, New York, NY, USA, 10 pages.

https://doi.org/10.1145/3357384.3357925

1 INTRODUCTION

Predicting the probabilities of users clicking on ads or items (a.k.a.,

click-through rate prediction) is a critical problem for many appli-

cations such as online advertising and recommender systems [

8

,

10

,

15

]. The performance of the prediction has a direct impact on

the nal revenue of the business providers. Due to its importance,

it has attracted growing interest in both academia and industry

communities.

Machine learning has been playing a key role in click-through

rate prediction, which is usually formulated as supervised learn-

ing with user proles and item attributes as input features. The

problem is very challenging for several reasons. First, the input fea-

tures are extremely sparse and high-dimensional [

8

,

11

,

13

,

21

,

32

].

In real-world applications, a considerable percentage of user’s de-

mographics and item’s attributes are usually discrete and/or cat-

egorical. To make supervised learning methods applicable, these

features are rst converted to a one-hot encoding vector, which

can easily result in features with millions of dimensions. Taking

the well-known CTR prediction data Criteo

1

as an example, the

feature dimension is approximately 30 million with sparsity over

99.99%. With such sparse and high-dimensional input features, the

machine learning models are easily overtted. Second, as shown in

extensive literature [

8

,

11

,

19

,

32

], high-order feature interactions

2

1

http://labs.criteo.com/2014/09/kaggle-contest-dataset-now-available-academic-use/

2

In this paper, we will use “combinatorial feature” and “feature interaction” inter-

changeably as they are both used in the literature [11, 19, 32] .

arXiv:1810.11921v2 [cs.IR] 23 Aug 2019

下载后可阅读完整内容，剩余9页未读，立即下载

shuterlo

粉丝: 0
资源: 7

最新资源