FFM算法详解：提升CTR预测的利器

需积分: 9 30 浏览量更新于2024-09-10 收藏 361KB PDF 举报

本文档深入探讨了Field-aware Factorization Machines (FFM) 的原理、数学推导以及在点击率（Click-through Rate, CTR）预测中的应用，这是一种在计算广告领域广泛应用的模型。FFM 是基于 degree-2 项式的扩展，通过引入域感知特征交互，能够捕捉更复杂的用户和广告特征之间的关联性。 FFM 的核心思想是将高阶特征转换为低阶特征的线性组合，同时考虑到不同特征域（如用户的兴趣类别和广告的属性）的影响。它通过一种称为“块稀疏矩阵”（Block Sparse Matrix）的结构，有效地处理大规模稀疏数据，这在 CTR 预测等场景中尤为重要，因为广告数据通常包含大量的空值或非活跃特征。文中详细介绍了 FFMs 的公式推导过程，特别关注于如何将用户和广告特征的交互表示为线性部分和非线性部分的加权和。这种模型包括两个主要部分：一是基础的 factorization machine 模型，通过线性交互项捕捉特征之间的简单关系；二是域感知项，通过特征组合来捕捉跨域的复杂关联。为了正则化模型，文档还讨论了如何选择合适的参数和避免过拟合，例如使用 L2 正则化。与 Support Vector Machines (SVM) 相比，FFM 在某些世界范围内的 CTR 预测竞赛中表现出色，这表明其在处理大规模稀疏数据和复杂特征交互方面的优势。作者基于赢得两次竞赛的经验，强调 FFMs 是分类大型稀疏数据的有效工具，尤其适用于 CTR 预测任务。实验部分展示了 FFMs 在特定类别数据上的优秀性能，通过对训练效率的优化实现，FFMs 能够在保持预测精度的同时，有效处理高维度和稀疏的数据集。这些实验结果进一步证实了 FFMs 的实用价值，并提供了深入理解其工作原理和改进策略的参考。总结来说，这篇文献不仅提供了 FFMs 的理论基础，还包括了其实现细节、与 SVM 等其他模型的比较以及实际应用中的有效性验证。对于从事 CTR 预测或处理大规模稀疏数据的工程师和研究人员，理解和掌握 FFMs 的方法和技术是提升广告推荐系统性能的关键。

Field-aware Factorization Machines for CTR Prediction

Yuchin Juan

Criteo Research

∗

Palo Alto, CA

yc.juan@criteo.com

Yong Zhuang

Dept. of ECE

∗

Carnegie Mellon Univ., USA

yong.zhuang22@gmail.com

Wei-Sheng Chin

Dept. of Computer Science

National Taiwan Univ., Taiwan

d01944006@csie.ntu.edu.tw

Chih-Jen Lin

Dept. of Computer Science

National Taiwan Univ., Taiwan

cjlin@csie.ntu.edu.tw

ABSTRACT

Click-through rate (CTR) prediction plays an important role

in computational advertising. Models based on degree-2

polynomial mappings and factorization machines (FMs) are

widely used for this task. Recently, a variant of FMs, ﬁeld-

aware factorization machines (FFMs), outperforms existing

models in some world-wide CTR-prediction competitions.

Based on our experiences in winning two of them, in this

paper we establish FFMs as an eﬀective method for clas-

sifying large sparse data including those from CTR predic-

tion. First, we propose eﬃcient implementations for training

FFMs. Then we comprehensively analyze FFMs and com-

pare this approach with competing models. Experiments

show that FFMs are very useful for certain classiﬁcation

problems. Finally, we have released a package of FFMs for

public use.

Keywords

Machine learning; Click-through rate prediction; Computa-

tional advertising; Factorization machines

1. INTRODUCTION

Click-through rate (CTR) prediction plays an important

role in advertising industry [1, 2, 3]. Logistic regression is

probably the most widely used model for this task [3]. Given

a data set with m instances (y

, x

), i = 1, . . . , m, where y

is the label and x

is an n-dimensional feature vector, the

model w is obtained by solving the following optimization

problem.

min

kwk

i=1

log(1 + exp(−y

(w, x

))). (1)

∗

Part of the work was done when these authors were in

National Taiwan University.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

RecSys ’16, September 15-19, 2016, Boston , MA, USA

 2016 ACM. ISBN 978-1-4503-4035-9/16/09. . . $15.00

DOI: http://dx.doi.org/10.1145/2959100.2959134

Publisher Advertiser

+80 −20 ESPN Nike

+10 −90 ESPN Gucci

+0 −1 ESPN Adidas

+15 −85 Vogue Nike

+90 −10 Vogue Gucci

+10 −90 Vogue Adidas

+85 −15 NBC Nike

+0 −0 NBC Gucci

+90 −10 NBC Adidas

Table 1: An artiﬁcial CTR data set, where + (−) represents

the number of clicked (unclicked) impressions.

In problem (1), λ is the regularization parameter, and in the

loss function we consider the linear model:

(w, x) = w · x.

Learning the eﬀect of feature conjunctions seems to be

crucial for CTR prediction; see, for example, [1]. Here, we

consider an artiﬁcial data set in Table 1 to have a better

understanding of feature conjunctions. An ad from Gucci

has a particularly high CTR on Vogue. This information

is however diﬃcult for linear models to learn because they

learn the two weights Gucci and Vogue separately. To ad-

dress this problem, two models have been used to learn the

eﬀect of feature conjunction. The ﬁrst model, degree-2 poly-

nomial mappings (Poly2) [4, 5], learns a dedicate weight for

each feature conjunction. The second model, factorization

machines (FMs) [6], learns the eﬀect of feature conjunction

by factorizing it into a product of two latent vectors. We

will discuss details about Poly2 and FMs in Section 2.

A variant of FM called pairwise interaction tensor factor-

ization (PITF) [7] was proposed for personalized tag recom-

mendation. In KDD Cup 2012, a generalization of PITF

called “factor model” was proposed by “Team Opera Solu-

tions” [8]. Because this term is too general and may easily

be confused with factorization machines, we refer to it as

“ﬁeld-aware factorization machines” (FFMs) in this paper.

The diﬀerence between PITF and FFM is that PITF con-

siders three special ﬁelds including “user,”“item,” and “tag,”

while FFM is more general. Because [8] is about the over-

all solution for the competition, its discussion of FFM is

limited. We can conclude the following results in [8]:

下载后可阅读完整内容，剩余7页未读，立即下载

论时间煮雨

粉丝: 1
资源: 13

FFM算法详解：提升CTR预测的利器

关于机器学习最前沿的deep learning的相关文献

首次提出SVM的英文论文，105页pdf

机器学习经典论文（人工智能）

ffm-adminpermissions

FFM博客1

Kaggle FFM 建模

FFM简介及实践1

spark-ffm, 关于 Spark，FFM ( 字段Awared分解机).zip

深入理解FFM原理与实践

深入FFM原理与实践1

最新资源