电商重复购买预测：数据挖掘与交叉特征分析

3星 · 超过75%的资源需积分: 50 143 浏览量更新于2024-09-08 收藏 285KB PDF 举报

"这篇论文‘RepeatBuyerPredictionforE-Commerce’探讨了在电子商务中利用数据挖掘技术预测重复购买者的问题，特别是在大型促销活动后如何识别并转化一次性购买者为忠诚的常客。在2015年国际人工智能联合会议(IJCAI)上，阿里巴巴基于2014年天猫‘双11’购物节的销售数据举办了重复购买者预测竞赛，作者团队在第一阶段获得了第一名。" 在数据挖掘中，交叉特征是一种强大的技术，它通过组合不同的特征来创建新的、可能更具有预测性的特征。在电子商务领域，这种技术对于理解用户行为、优化营销策略和提高ROI（投资回报率）至关重要。在本文中，作者们详细介绍了他们获奖的解决方案，其中交叉特征起到了关键作用。首先，数据预处理是任何数据挖掘项目的基础。在本案例中，可能涉及清洗销售数据，处理缺失值，标准化数值特征，以及对分类变量进行独热编码。这一步骤确保了数据的质量和一致性，使得后续分析更加准确。接着，交叉特征的构建是核心环节。例如，作者可能将用户的购买时间与浏览历史、购买频率与商品类别、用户属性（如年龄、性别）与其他购物行为等进行交叉，创造出反映用户购物模式的新特征。这些新特征可以揭示出单个特征无法捕捉到的潜在关联和模式。在模型选择和训练阶段，作者可能尝试了多种机器学习算法，如逻辑回归、决策树、随机森林、支持向量机、神经网络等，以找到最能预测重复购买的模型。模型的性能通过验证集或交叉验证进行评估，指标可能包括准确率、召回率、F1分数和AUC-ROC曲线。模型优化是提升预测能力的关键步骤。这可能包括特征选择，通过正则化避免过拟合，调整超参数以找到最佳模型配置，或者采用集成学习方法（如bagging、boosting）来提高整体预测性能。最后，论文中提到的竞赛结果表明，交叉特征的构建和有效利用对于识别潜在的忠诚买家至关重要。通过精准预测哪些一次性购买者有可能转化为重复购买者，商家可以更有效地分配营销资源，减少不必要的推广成本，从而提高ROI。这个案例研究展示了数据挖掘和交叉特征在电子商务中的应用价值，为其他企业和研究者提供了宝贵的实践经验。通过深入理解和运用这些技术，企业可以更好地理解用户行为，制定更有效的市场策略，促进业务增长。

Repeat Buyer Prediction for E-Commerce

Guimei Liu

⋆

, Tam T. Nguyen

⋆

, Gang Zhao

, Wei Zha

⋆

, Jianbo Yang

Jianneng Cao

⋆

, Min Wu

⋆

, Peilin Zhao

⋆

, Wei Chen

⋆

Data Analytics Department, Institute for Infocomm Research, Singapore 138632,

{liug,nguyentt,zhaw,caojn,wumin,zhaop}@i2r.a-star.edu.sg

Development Bank of Singapore, {george.g.zhao, nus.waltchan}@gmail.com

General Electric, jianbo.yang@ge.com

ABSTRACT

A large number of new buyers are often acquired by mer-

chants during promotions. However, many of the attracted

buyers are one-time deal hunters, and the promotions may

have little long-lasting impact on sales. It is important for

merch ants to identify who can be converted to regular loyal

buyers and then target them to reduce promotion cost and

increase the return on investment (ROI). At International

Joint Conferences on Artiﬁcial Intelligence (IJCAI) 2015, Al-

ibaba hosted an international competition for rep eat buyer

prediction based on the sales data of the “Double 11” shop-

ping event in 2014 at Tmall.com. We won the ﬁrst place at

stage 1 of the competition out of 753 teams. In this paper, we

present our winning solution, which con sists of comprehen-

sive feature engineering and model training. We created pro-

ﬁles for users, merchants, brands, categories, items an d their

interactions via extensive feature engineering. These proﬁles

are not only useful for this particular prediction task, but

can also be used for other important tasks in e- commerce,

such as customer segmentation, product recommendation,

and customer base augmentation for brands. Feature engi-

neering is often the most important factor for the success

of a prediction task, but not much work can be found in

the literature on feature engineering for prediction tasks in

e-commerce. Our work prov ides some useful hints and in-

sights for data science practitioners in e-commerce.

Keywords

Repeat Buyer Prediction; Feature Engineering; E-commerce

1. INTRODUCTION

Large business-to-consumer (B2C) e-commerce websites,

such as Amazon and Alibaba, often run nationwide sales

promotions on sp ecial days like Black Friday and Double

11 (Singles’ Day). Merchants acquire new customers during

these events. However, most new customers are one-time

Permission to make digital or hard copies of all or pa rt of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

KDD ’16, August 13-17, 2016, San Francisco, CA, USA

 2016 ACM. ISBN 978-1-4503-4232-2/16/08. . . $15.00

DOI: http://dx.doi.org/10.1145/2939672.2939674

deal hunters, and promotions to them usually do not gener-

ate return on investment (RO I) as expected by merchants.

Therefore, merchants need to identify potential loyal ones

from these new customers, so as to conduct targeted ad-

vertisements (and promotions) towards them to lower th e

promotion cost. It is diﬃcult for any individual merchant

to identify its potential loyal customers as it has little in-

formation on its new customers. B2C e-commerce websites

instead have the click stream data and purchase history of

all the customers at all the merchants on their platforms.

Thus, they can learn th e preferences and habits of the new

customers from their historical data, and then predict how

likely a new customer will buy again from a same merchant.

At IJCAI 2015, Alibaba hosted an international competi-

tion

for repeat buyer prediction based on the sales data of

the “Double 11” day of 2014 at Tmall.com—the largest B2C

platform in China. Double 11 is the biggest online shopping

event in China with sales (in Tmall and Taobao) at US$5.8

billion in 2013, US$9.3 billion in 2014, and over US$14.3 bil-

lion in 2015

. Data provided to the competition include a

number of merch ants an d their new buyers acquired during

the event, and six m onths of user activity log data before

the event. The task is to predict which new customers of

a given merchant would buy items from the same merchant

again within six months. These new buyers are called repeat

buyers of the respective merchants.

We won the ﬁrst place at stage 1 of the competition. Our

winning solution consists of comprehensive feature engineer-

ing and model training. In particular, we generated various

types of features to describe users, merchants, brands, cat-

egories, items and their interactions from diﬀerent aspects.

We have trained various classiﬁcation models, including Fac-

torization Machine [14, 11], Logistic Regression [1, 2], Ran-

dom Forest [5], GBM [10], and XGBoost [6]. We have also

used ensemble techniques to blend multiple classiﬁers to-

gether to further imp rove the performance.

The repeat buyer prediction problem can b e formulated

as a typical classiﬁcation problem, as most of the competi-

tion participants did. Model training of this task is not much

diﬀerent from that of other classiﬁcation tasks. Instead, fea-

ture engineering is the main component that distinguishes

this task from others. Feature engineering, an integral part

of data science, is often the key to th e success of a ma-

chine learning project. It can be more diﬃcult than learning

http://ijcai-15.org/index.php/

repeat-buyers-prediction-competition

https://en .wikipedia.org/wiki/S ingles

Day

下载后可阅读完整内容，剩余9页未读，立即下载

RoaringKitty

粉丝: 6w+

电商重复购买预测：数据挖掘与交叉特征分析

categorical-distribution-js:JavaScript 的分类分发库。 能够在线学习，对分布进行采样并将其转储到数组中存储以备后用

数据挖掘汽车评估

基于UCI中Car Evaluation数据集的分类、回归与聚类

数据挖掘案例

数据挖掘研究案例

数据挖掘的相关案例和demo.zip

数据挖掘技术及案例教程(含工具)

大数据数据挖掘案例大数据数据挖掘案例

IBM数据挖掘报告案例

数据挖掘与数据分析应用案例 数据挖掘算法实践 基于C++的Apriori算法的数据挖掘关联规则.docx

最新资源

categorical-distribution-js:JavaScript 的分类分发库。能够在线学习，对分布进行采样并将其转储到数组中存储以备后用

数据挖掘与数据分析应用案例数据挖掘算法实践基于C++的Apriori算法的数据挖掘关联规则.docx