Recommending What Video to Watch Next: A Multitask
Ranking System
Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar,
Maheswaran Sathiamoorthy, Xinyang Yi, Ed Chi
Google, Inc.
{zhezhao,lichan,liwei,jilinc,aniruddhnath,shawnandrews,aditeek,nlogn,xinyang,edchi}@google.com
ABSTRACT
In this paper, we introduce a large scale multi-objective ranking
system for recommending what video to watch next on an indus-
trial video sharing platform. The system faces many real-world
challenges, including the presence of multiple competing ranking
objectives, as well as implicit selection biases in user feedback. To
tackle these challenges, we explored a variety of soft-parameter
sharing techniques such as Multi-gate Mixture-of-Experts so as to
eciently optimize for multiple ranking objectives. Additionally,
we mitigated the selection biases by adopting a Wide & Deep frame-
work. We demonstrated that our proposed techniques can lead to
substantial improvements on recommendation quality on one of
the world’s largest video sharing platforms.
CCS CONCEPTS
• Information systems → Retrieval models and ranking
;
Rec-
ommender systems
;
• Computing methodologies → Rank-
ing; Multi-task learning; Learning from implicit feedback.
KEYWORDS
Recommendation and Ranking, Multitask Learning, Selection Bias
ACM Reference Format:
Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews,
Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, Ed Chi. 2019.
Recommending What Video to Watch Next: A Multitask Ranking System. In
Thirteenth ACM Conference on Recommender Systems (RecSys ’19), September
16–20, 2019, Copenhagen, Denmark. ACM, New York, NY, USA, 9 pages.
https://doi.org/10.1145/3298689.3346997
1 INTRODUCTION
In this paper, we describe a large-scale ranking system for video
recommendation. That is, given a video which a user is currently
watching, recommend the next video that the user might watch and
enjoy. Typical recommendation systems follow a two-stage design
with a candidate generation and a ranking [
10
,
20
]. This paper
focuses on the ranking stage. In this stage, the recommender has a
few hundred candidates retrieved from the candidate generation
(e.g. matrix factorization [
45
] or neural models [
25
]), and applies
a sophisticated large-capacity model to rank and sort the most
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
promising items. We present experiments and lessons learned from
building such a ranking system on a large-scale industrial video
publishing and sharing platform.
Designing and developing a real-world large-scale video recom-
mendation system is full of challenges, including:
•
There are often dierent and sometimes conicting objec-
tives which we want to optimize for. For example, we may
want to recommend videos that users rate highly and share
with their friends, in addition to watching.
•
There is often implicit bias in the system. For example, a user
might have clicked and watched a video simply because it
was being ranked high, not because it was the one that the
user liked the most. Therefore, models trained using data
generated from the current system will be biased, causing a
feedback loop eect [
33
]. How to eectively and eciently
learn to reduce such biases is an open question.
To address these challenges, we propose an ecient multitask
neural network architecture for the ranking system, as shown in
Figure 1. It extends the Wide & Deep [
9
] model architecture by
adopting Multi-gate Mixture-of-Experts (MMoE) [
30
] for multitask
learning. In addition, it introduces a shallow tower to model and
remove selection bias. We apply the architecture to video recom-
mendation as a case study: given what user currently is watching,
recommend the next video to watch. We present experiments of our
proposed ranking system on an industrial large-scale video pub-
lishing and sharing platform. Experimental results show signicant
improvements of our proposed system.
Specically, we rst group our multiple objectives into two cate-
gories: 1) engagement objectives, such as user clicks, and degree
of engagement with recommended videos; 2) satisfaction objec-
tives, such as user liking a video on YouTube, and leaving a rating
on the recommendation. To learn and estimate multiple types of
user behaviors, we use MMoE to automatically learn parameters
to share across potentially conicting objectives. The Mixture-of-
Experts [
21
] architecture modularizes input layer into experts, each
of which focuses on dierent aspects of input. This improves the
representation learned from complicated feature space generated
from multiple modalities. Then by utilizing multiple gating net-
works, each of the objectives can choose experts to share or not
share with others.
To model and reduce the selection bias (e.g., position bias) from
biased training data, we propose to add a shallow tower to the
main model, as shown in the left side of Figure 1. The shallow
RecSys ’19, September 16–20, 2019, Copenhagen, Denmark
tower takes input related to the selection bias, e.g., ranking order
© 2019 Copyright held by the owner/author(s).
decided by the current system, and outputs a scalar serving as a
ACM ISBN 978-1-4503-6243-6/19/09.
https://doi.org/10.1145/3298689.3346997
bias term to the nal prediction of the main model. This model
43