Repeat Buyer Prediction for E-Commerce
Guimei Liu
⋆
, Tam T. Nguyen
⋆
, Gang Zhao
#
, Wei Zha
⋆
, Jianbo Yang
§
Jianneng Cao
⋆
, Min Wu
⋆
, Peilin Zhao
⋆
, Wei Chen
#
⋆
Data Analytics Department, Institute for Infocomm Research, Singapore 138632,
{liug,nguyentt,zhaw,caojn,wumin,zhaop}@i2r.a-star.edu.sg
#
Development Bank of Singapore, {george.g.zhao, nus.waltchan}@gmail.com
§
General Electric, jianbo.yang@ge.com
ABSTRACT
A large number of new buyers are often acquired by mer-
chants during promotions. However, many of the attracted
buyers are one-time deal hunters, and the promotions may
have little long-lasting impact on sales. It is important for
merch ants to identify who can be converted to regular loyal
buyers and then target them to reduce promotion cost and
increase the return on investment (ROI). At International
Joint Conferences on Artificial Intelligence (IJCAI) 2015, Al-
ibaba hosted an international competition for rep eat buyer
prediction based on the sales data of the “Double 11” shop-
ping event in 2014 at Tmall.com. We won the first place at
stage 1 of the competition out of 753 teams. In this paper, we
present our winning solution, which con sists of comprehen-
sive feature engineering and model training. We created pro-
files for users, merchants, brands, categories, items an d their
interactions via extensive feature engineering. These profiles
are not only useful for this particular prediction task, but
can also be used for other important tasks in e- commerce,
such as customer segmentation, product recommendation,
and customer base augmentation for brands. Feature engi-
neering is often the most important factor for the success
of a prediction task, but not much work can be found in
the literature on feature engineering for prediction tasks in
e-commerce. Our work prov ides some useful hints and in-
sights for data science practitioners in e-commerce.
Keywords
Repeat Buyer Prediction; Feature Engineering; E-commerce
1. INTRODUCTION
Large business-to-consumer (B2C) e-commerce websites,
such as Amazon and Alibaba, often run nationwide sales
promotions on sp ecial days like Black Friday and Double
11 (Singles’ Day). Merchants acquire new customers during
these events. However, most new customers are one-time
Permission to make digital or hard copies of all or pa rt of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
KDD ’16, August 13-17, 2016, San Francisco, CA, USA
c
2016 ACM. ISBN 978-1-4503-4232-2/16/08. . . $15.00
DOI: http://dx.doi.org/10.1145/2939672.2939674
deal hunters, and promotions to them usually do not gener-
ate return on investment (RO I) as expected by merchants.
Therefore, merchants need to identify potential loyal ones
from these new customers, so as to conduct targeted ad-
vertisements (and promotions) towards them to lower th e
promotion cost. It is difficult for any individual merchant
to identify its potential loyal customers as it has little in-
formation on its new customers. B2C e-commerce websites
instead have the click stream data and purchase history of
all the customers at all the merchants on their platforms.
Thus, they can learn th e preferences and habits of the new
customers from their historical data, and then predict how
likely a new customer will buy again from a same merchant.
At IJCAI 2015, Alibaba hosted an international competi-
tion
1
for repeat buyer prediction based on the sales data of
the “Double 11” day of 2014 at Tmall.com—the largest B2C
platform in China. Double 11 is the biggest online shopping
event in China with sales (in Tmall and Taobao) at US$5.8
billion in 2013, US$9.3 billion in 2014, and over US$14.3 bil-
lion in 2015
2
. Data provided to the competition include a
number of merch ants an d their new buyers acquired during
the event, and six m onths of user activity log data before
the event. The task is to predict which new customers of
a given merchant would buy items from the same merchant
again within six months. These new buyers are called repeat
buyers of the respective merchants.
We won the first place at stage 1 of the competition. Our
winning solution consists of comprehensive feature engineer-
ing and model training. In particular, we generated various
types of features to describe users, merchants, brands, cat-
egories, items and their interactions from different aspects.
We have trained various classification models, including Fac-
torization Machine [14, 11], Logistic Regression [1, 2], Ran-
dom Forest [5], GBM [10], and XGBoost [6]. We have also
used ensemble techniques to blend multiple classifiers to-
gether to further imp rove the performance.
The repeat buyer prediction problem can b e formulated
as a typical classification problem, as most of the competi-
tion participants did. Model training of this task is not much
different from that of other classification tasks. Instead, fea-
ture engineering is the main component that distinguishes
this task from others. Feature engineering, an integral part
of data science, is often the key to th e success of a ma-
chine learning project. It can be more difficult than learning
1
http://ijcai-15.org/index.php/
repeat-buyers-prediction-competition
2
https://en .wikipedia.org/wiki/S ingles
Day