arXiv:1003.0146v2 [cs.LG] 1 Mar 2012
A Contextual-Bandit Approach to
Personalized News Article Recommendation
Lihong Li
†
, Wei Chu
†
,
†
Yahoo! Labs
lihong,chuwei@yahoo-
inc.com
John Langford
‡
‡
Yahoo! Labs
jl@yahoo-inc.com
Robert E. Schapire
+
∗
+
Dept of Computer Science
Princeton University
schapire@cs.princeton.edu
ABSTRACT
Personalized web services strive to adapt their services (advertise-
ments, news articles, etc.) to individual users by making use of
both content and user information. Despite a few recent advances,
this problem remains challenging for at least two reasons. First,
web service is featured with dynamically changing pools of con-
tent, rendering traditional collaborative filtering methods inappli-
cable. Second, the scale of most web services of practical interest
calls for solutions that are both fast in learning and computation.
In this work, we model personalized recommendation of news
articles as a contextual bandit problem, a principled approach in
which a learning algorithm sequentially selects articles to serve
users based on contextual information about the users and articles,
while simultaneously adapting its article-selection strategy based
on user-click feedback to maximize total user clicks.
The contributions of this work are three-fold. First, we propose
a new, general contextual bandit algorithm that is computationally
efficient and well motivated from learning theory. Second, we ar-
gue that any bandit algorithm can be reliably evaluated offline us-
ing previously recorded random traffic. Finally, using this offline
evaluation method, we successfully applied our new algorithm to
a Yahoo! Front Page Today Module dataset containing over 33
million events. Results showed a 12.5% click lift compared to a
standard context-free bandit algorithm, and the advantage becomes
even greater when data gets more scarce.
Categories and Subject Descriptors
H.3.5 [Information Systems]: On-line Information Services; I.2.6
[Computing Methodologies]: Learning
General Terms
Algorithms, Experimentation
Keywords
Contextual bandit, web service, personalization, recommender sys-
tems, exploration/exploitation dilemma
1. INTRODUCTION
This paper addresses the challenge of identifying the most appro-
priate web-based content at the best time for individual users. Most
∗
This work was done while R. Schapire visited Yahoo! Labs.
A version of this paper appears at WWW 2010, April 26–30, 2010,
Raleigh, North Carolina, USA.
.
service vendors acquire and maintain a large amount of content in
their repository, for instance, for filtering news articles [14] or for
the display of advertisements [5]. Moreover, the content of such a
web-service repository changes dynamically, undergoing frequent
insertions and deletions. In such a setting, it is crucial to quickly
identify interesting content for users. For instance, a news filter
must promptly identify the popularity of breaking news, while also
adapting to the fading value of existing, aging news stories.
It is generally difficult to model popularity and temporal changes
based solely on content information. In practice, we usually ex-
plore the unknown by collecting consumers’ feedback in real time
to evaluate the popularity of new content while monitoring changes
in its value [3]. For instance, a small amount of traffic can be des-
ignated for such exploration. Based on the users’ response (such
as clicks) to randomly selected content on this small slice of traf-
fic, the most popular content can be identified and exploited on the
remaining traffic. This strategy, with random exploration on an ǫ
fraction of the traffic and greedy exploitation on the rest, is known
as ǫ-greedy. Advanced exploration approaches such as EXP3 [8]
or UCB1 [7] could be applied as well. Intuitively, we need to dis-
tribute more traffic to new content to learn its value more quickly,
and fewer users to track temporal changes of existing content.
Recently, personalized recommendation has become a desirable
feature for websites to improve user satisfaction by tailoring con-
tent presentation to suit individual users’ needs [10]. Personal-
ization involves a process of gathering and storing user attributes,
managing content assets, and, based on an analysis of current and
past users’ behavior, delivering the individually best content to the
present user being served.
Often, both users and content are represented by sets of fea-
tures. User features may include historical activities at an aggre-
gated level as well as declared demographic information. Content
features may contain descriptive information and categories. In this
scenario, exploration and exploitation have to be deployed at an in-
dividual level since the views of different users on the same con-
tent can vary significantly. Since there may be a very large number
of possible choices or actions available, it becomes critical to rec-
ognize commonalities between content items and to transfer that
knowledge across the content pool.
Traditional recommender systems, including collaborative fil-
tering, content-based filtering and hybrid approaches, can provide
meaningful recommendations at an individual level by leveraging
users’ interests as demonstrated by their past activity. Collaborative
filtering [25], by recognizing similarities across users based on their
consumption history, provides a good recommendation solution to
the scenarios where overlap in historical consumption across users
is relatively high and the content universe is almost static. Content-
based filtering helps to identify new items which well match an