Image Maers: Visually modeling user behaviors using
Advanced Model Server
Tiezheng Ge, Liqin Zhao, Guorui Zhou, Keyu Chen, Shuying Liu
Huiming Yi, Zelin Hu, Bochao Liu, Peng Sun, Haoyu Liu, Pengtao Yi, Sui Huang
Zhiqiang Zhang, Xiaoqiang Zhu, Yu Zhang, Kun Gai
Alibaba Inc.
{tiezheng.gtz, zhang.zhiqiang, jingshi.gk}@alibaba-inc.com
ABSTRACT
In Taobao, the largest e-commerce platform in China, billions of
items are provided and typically displayed with their images. For
better user experience and business eectiveness, Click Through
Rate (CTR) prediction in online advertising system exploits abun-
dant user historical behaviors to identify whether a user is inter-
ested in a candidate ad. Enhancing behavior representations with
user behavior images will bring user’s visual preference and can
greatly help CTR prediction. So we propose to model user prefer-
ence jointly with user behavior ID features and behavior images.
However, comparing with utilizing candidate ad image in CTR pre-
diction which only introduces one image in one sample, training
with user behavior images brings tens to hundreds of images in one
sample, giving rise to a great challenge in both communication and
computation. With the well-known Parameter Server (PS) frame-
work, implementing such model needs to communicate the raw
image features, leading to unacceptable communication load. It in-
dicates PS is not suitable for this scenario. In this paper, we propose
a novel and ecient distributed machine learning paradigm called
Advanced Model Server (AMS). In AMS, the forward/backward pro-
cess can also happen in the server side, and only high level semantic
features with much smaller size need to be sent to workers. AMS
thus dramatically reduces the communication load, which enables
the arduous joint training process. Based on AMS, the methods
of eectively combining the images and ID features are carefully
studied, and then we propose a Deep Image CTR Model. Our ap-
proach is shown to achieve signicant improvements in both online
and oine evaluations, and has been deployed in Taobao display
advertising system serving the main trac.
CCS CONCEPTS
• Information systems → Online advertising
;
Recommender
systems;
KEYWORDS
Online advertising; User modeling; Computer vision
1 INTRODUCTION
Taobao is the largest e-commerce platform in China, serving hun-
dreds of millions of users with billions of items through both mobile
app and PC website. Users come to Taobao to browse these items
through the search or personalized recommendation. Each item is
usually displayed by an item image along with some describing
texts. When interested in an item, users can click that image to see
the details. Fig 1(a) shows an example of recommended items in
Taobao mobile app.
Taobao also established one of the world’s leading display adver-
tising systems, helping millions of advertisers to connect to users.
Actually display advertising is an indispensable form of online ad-
vertisement. By identifying user interests, it can be presented in
various spots like Guess What You Like and eciently delivers
marketing messages to the right customers. Cost-per-click (CPC)
pricing method is adopted for Taobao display advertising and is
suciently eective [
32
]. In CPC mode, the ad publishers rank
the candidate ads by eective cost per mille (eCPM), which can
be estimated by multiplying the bid price by the estimated click
through rate (CTR). Such strategy makes CTR prediction the core
task in the advertising system.
CTR prediction scores a user’s preference to an item, and largely
relies on understanding user interests from historical behaviors.
Users browse and click items billions of times in Taobao everyday,
and these visits bring a huge amount of log data weakly reecting
user interests. Traditional researches on CTR prediction focus on
carefully designed feedback feature [
1
,
28
] and shallow models, e.g.,
Logistic Regression [
23
]. In recent years, the deep learning based
CTR prediction system emerged overwhelmingly [
30
]. These meth-
ods mainly involve the sparse ID features, e.g., ad ID, user interacted
item ID, etc. However, when an ID occurs less frequently in the
data, its parameter may not be well trained. Images can provide
intrinsic visual descriptions, and thus bring better generalization
for the model. Considering that item images are what users directly
interact with, these images can provide more visual information
about user interests. We propose to naturally describe each behav-
ior by such images, and jointly model them with ID features in CTR
prediction.
Training CTR models with image data requires huge computa-
tion and storage consumption. There are pioneering works [
3
,
21
]
dedicating to represent ad with image features in CTR prediction.
These studies did not explore user behavior images. Modeling user
behavior images can help understand user visual preference and
improve the accuracy of CTR prediction. Moreover, combining
both user visual preference and ad visual information could further
benet CTR prediction. However, modeling user preference with
interacted images is more challenging. Because the number of one
typical user’s behaviors ranges from tens to hundreds, it will bring
the same number of times the consumption than that when only
modeling ad images. Considering Taobao are serving hundreds of
millions of users with billions of items, it is a non-trivial problem
arXiv:1711.06505v2 [cs.CV] 15 Feb 2018