Wide & Deep Learning for Recommender Systems
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,
Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil,
Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, Hemal Shah
Google Inc.
⇤
ABSTRACT
Generalized linear mode ls with nonlinear feature transfor-
mations are widely used for large-scale regression and clas-
sification problems with sparse inputs. Memorization of fea-
ture interactions through a wide s et of cross-product feature
transformations are e↵ective and interpretable, while gener-
alization requires more feature engineering e↵ort. With less
feature engineering, deep neural networks can generalize bet-
ter to unseen feature combinations through low-dimensional
dense embeddings learned for the sparse features. However,
deep neural networks with embeddings can over-generalize
and recommend less relevant items when the user-item inter-
actions are sparse and high-rank. In this paper, we present
Wide & Deep learning—jointly trained wide linear models
and deep neural networks—to combine the b enefits of mem-
orization and generalization for recommender systems. We
pro ductionized and evaluated the system on Google Play,
a commercial mobile app store with over one billion active
users and over one million apps. Online experiment results
show that Wide & Deep significantly increased app acquisi-
tions compared with wide-only and deep-only models. We
have also open-sourced our implementation in TensorFlow.
CCS Concepts
•Computing methodologies ! Machine learning; Neu-
ral networks; Supervised learning; •Information systems
! Recommender systems;
Keywords
Wide & Deep Learning, Recommender Systems.
1. INTRODUCTION
A recommender system can be viewed as a search ranking
system, where the input query is a set of user and contextual
information, and the output is a ranked list of items. Given
a query, the recommendation task is to find the relevant
items in a database and then rank the items based on certain
objectives, such as clicks or purchases.
One challenge in recommender systems, similar to the gen-
eral search ranking problem, is to achieve b oth memorization
and generalization. Memorization can be loosely defined as
learning the frequent co-occurrence of items or features and
exploiting the correlation available in the hi storical data.
Generalization, on the other hand, is based on transitivity
of correlation and explores new feature combinations that
⇤
Correspondi ng author: hengtze@google.com
have never or rarely occurred in the past. Recommenda-
tions bas ed on memorization are usually more topical and
directly rel evant to the items on which users have already
p erformed actions. Compared with memorization, general-
ization tends to improve the diversity of the recommended
items. In this paper, we focus on the apps recommendation
problem for the Google Play store, but the approach should
apply to generic recommender systems.
For massive-scale online recommendation and ranking sys-
tems in an industrial setting, generalized linear models such
as logistic regression are widely used because they are sim-
ple, scalable and interpretable. The models are often trained
on binarized sparse features with one-hot encoding. E.g., the
binary feature “user_installed_app=netflix” has value 1
if the user installed Netflix. Memorization can b e achieved
e↵ectively using cross-product transformations over sparse
features, such as AND(user_installed_app=netflix, impres-
sion_app=pandora”), whose value is 1 if the user installed
Netflix and then is later shown Pandora. This explains how
the co-occurrence of a feature pair correlates with the target
lab el. Generalization can be added by using features that are
less granular, such as AND(user_installed_category=video,
impression_category=music), but manual feature engineer-
ing is often required. One limitation of cross-product trans-
formations is that they do not generalize to query-item fea-
ture pairs that have not appeared in the training data.
Embedding-based models, such as factorization machines
[5] or deep neural networks, can generalize to previously un-
seen query-item feature pairs by learning a low-dimensional
dense embedding vector for each query and item feature,
with less burden of feature engineering. However, it is dif-
ficult to learn e↵ective low-dimensional representations for
queries and items when t he underlying query-item matrix is
sparse and high-rank, such as users with specific preferences
or niche items w ith a narrow appeal. In such cases, there
should be no interactions between most query-item pairs,
but dense embeddings will lead to nonzero predictions for all
query-item pairs, and thus can over-generalize and make less
relevant recommendations. On the other hand, linear mod-
els with cross-product feature transformations can memorize
these “exception rules” with much fewer parameters.
In this paper, we present the Wide & Deep learning frame-
work to achieve both memorization and generalization in one
mo del, by jointly training a linear model component and a
neural network component as shown in Figure 1.
The main contributions of the paper include:
• The Wide & Deep learning framework for jointly train-
ing feed-forward neural networks with embeddings and
arXiv:1606.07792v1 [cs.LG] 24 Jun 2016