个性化新闻推荐：从上下文感知的Bandit方法

需积分: 14 104 浏览量更新于2024-09-08 收藏 299KB PDF 举报

在推荐系统中，"LinUCB.pdf"文件聚焦于将个性化新闻文章推荐问题视为一个上下文化马尔可夫链带（Contextual Bandit）问题。传统的线性 Upper Confidence Bound (LinUCB) 算法属于非上下文依赖（context-free）的决策框架，它假设所有用户的商品选择策略相同，不考虑个体用户的兴趣、偏好和购买力等个性化因素。然而，在实际应用中，由于用户行为具有显著的动态性和情境依赖性，如用户在不同时间、地点或兴趣状态下的反应差异，单纯依靠context-free的MAB算法往往效果不佳。文章作者们提出了一种上下文化的LinUCB方法，旨在解决推荐系统中的挑战。他们认识到，为了提供更精准的个性化服务，如新闻文章推荐，算法必须能够利用用户的实时上下文信息，包括但不限于用户的浏览历史、搜索记录、地理位置等。这种方法的优势在于它能够在大规模数据流中快速学习用户的兴趣模式，并在每次推荐决策时动态调整策略，以最大化用户的满意度和系统的收益。具体来说，他们的模型构建在以下几点上： 1. **模型构建**：将新闻文章推荐视为一个在线学习过程，其中每个用户被视为一个独立的臂（arm），而文章则是可供选择的行动。通过收集用户的反馈（点击、阅读等行为），算法逐渐了解用户对不同文章的喜好。 2. **上下文利用**：与传统UCB算法不同，LinUCB考虑了用户和文章的上下文特征，这些特征可以帮助预测用户对未见过的文章可能的反应，从而提高推荐的准确性。 3. **学习效率**：面对大规模的用户群体和不断变化的内容库，算法需要具备高效的在线学习能力，以便快速适应新内容和用户的新行为模式。 4. **理论支持**：文章引用了arXiv:1003.0146v2[cs.LG]1Mar2012的研究成果，表明上下文化Bandit方法在理论上具有良好的性能保证，尤其是在高维稀疏环境下，这种算法表现尤为突出。总结来说，"LinUCB.pdf"的核心贡献在于提出了一种将个性化新闻推荐转化为上下文化马尔可夫链带问题的解决方案，通过结合用户和内容的上下文信息，提升推荐的针对性和效果，这在现代推荐系统中具有重要的实践价值。同时，文章也展示了在处理大规模数据和复杂环境中的学习与优化策略。

arXiv:1003.0146v2 [cs.LG] 1 Mar 2012

A Contextual-Bandit Approach to

Personalized News Article Recommendation

Lihong Li

†

, Wei Chu

†

Yahoo! Labs

lihong,chuwei@yahoo-

inc.com

John Langford

‡

Yahoo! Labs

jl@yahoo-inc.com

Robert E. Schapire

∗

Dept of Computer Science

Princeton University

schapire@cs.princeton.edu

ABSTRACT

Personalized web services strive to adapt their services (advertise-

ments, news articles, etc.) to individual users by making use of

both content and user information. Despite a few recent advances,

this problem remains challenging for at least two reasons. First,

web service is featured with dynamically changing pools of con-

tent, rendering traditional collaborative ﬁltering methods inappli-

cable. Second, the scale of most web services of practical interest

calls for solutions that are both fast in learning and computation.

In this work, we model personalized recommendation of news

articles as a contextual bandit problem, a principled approach in

which a learning algorithm sequentially selects articles to serve

users based on contextual information about the users and articles,

while simultaneously adapting its article-selection strategy based

on user-click feedback to maximize total user clicks.

The contributions of this work are three-fold. First, we propose

a new, general contextual bandit algorithm that is computationally

efﬁcient and well motivated from learning theory. Second, we ar-

gue that any bandit algorithm can be reliably evaluated ofﬂine us-

ing previously recorded random trafﬁc. Finally, using this ofﬂine

evaluation method, we successfully applied our new algorithm to

a Yahoo! Front Page Today Module dataset containing over 33

million events. Results showed a 12.5% click lift compared to a

standard context-free bandit algorithm, and the advantage becomes

even greater when data gets more scarce.

Categories and Subject Descriptors

H.3.5 [Information Systems]: On-line Information Services; I.2.6

[Computing Methodologies]: Learning

General Terms

Algorithms, Experimentation

Keywords

Contextual bandit, web service, personalization, recommender sys-

tems, exploration/exploitation dilemma

1. INTRODUCTION

This paper addresses the challenge of identifying the most appro-

priate web-based content at the best time for individual users. Most

∗

This work was done while R. Schapire visited Yahoo! Labs.

A version of this paper appears at WWW 2010, April 26–30, 2010,

Raleigh, North Carolina, USA.

service vendors acquire and maintain a large amount of content in

their repository, for instance, for ﬁltering news articles [14] or for

the display of advertisements [5]. Moreover, the content of such a

web-service repository changes dynamically, undergoing frequent

insertions and deletions. In such a setting, it is crucial to quickly

identify interesting content for users. For instance, a news ﬁlter

must promptly identify the popularity of breaking news, while also

adapting to the fading value of existing, aging news stories.

It is generally difﬁcult to model popularity and temporal changes

based solely on content information. In practice, we usually ex-

plore the unknown by collecting consumers’ feedback in real time

to evaluate the popularity of new content while monitoring changes

in its value [3]. For instance, a small amount of trafﬁc can be des-

ignated for such exploration. Based on the users’ response (such

as clicks) to randomly selected content on this small slice of traf-

ﬁc, the most popular content can be identiﬁed and exploited on the

remaining trafﬁc. This strategy, with random exploration on an ǫ

fraction of the trafﬁc and greedy exploitation on the rest, is known

as ǫ-greedy. Advanced exploration approaches such as EXP3 [8]

or UCB1 [7] could be applied as well. Intuitively, we need to dis-

tribute more trafﬁc to new content to learn its value more quickly,

and fewer users to track temporal changes of existing content.

Recently, personalized recommendation has become a desirable

feature for websites to improve user satisfaction by tailoring con-

tent presentation to suit individual users’ needs [10]. Personal-

ization involves a process of gathering and storing user attributes,

managing content assets, and, based on an analysis of current and

past users’ behavior, delivering the individually best content to the

present user being served.

Often, both users and content are represented by sets of fea-

tures. User features may include historical activities at an aggre-

gated level as well as declared demographic information. Content

features may contain descriptive information and categories. In this

scenario, exploration and exploitation have to be deployed at an in-

dividual level since the views of different users on the same con-

tent can vary signiﬁcantly. Since there may be a very large number

of possible choices or actions available, it becomes critical to rec-

ognize commonalities between content items and to transfer that

knowledge across the content pool.

Traditional recommender systems, including collaborative ﬁl-

tering, content-based ﬁltering and hybrid approaches, can provide

meaningful recommendations at an individual level by leveraging

users’ interests as demonstrated by their past activity. Collaborative

ﬁltering [25], by recognizing similarities across users based on their

consumption history, provides a good recommendation solution to

the scenarios where overlap in historical consumption across users

is relatively high and the content universe is almost static. Content-

based ﬁltering helps to identify new items which well match an

下载后可阅读完整内容，剩余9页未读，立即下载

qq_36718973

粉丝: 0

个性化新闻推荐：从上下文感知的Bandit方法

contextual-bandits-recommender:在Python中实现LinUCB和HybridLinUCB

混合线性 UCB 强盗学习算法L Li_python代码_代码_下载

基于numpy实现的经典常用机器学习库python源码（含项目说明+超详细注释）.zip

基于TensorFlow的推荐算法实战：LinUCB与UCB

生成一段LinUCB的python代码

使用LinUCB算法解决MAB问题，并写出代码

根据查阅资料，编写出MAB的 Softmax算法（或Epsilon-Greedy算法），BetaThompson sampling算法，UCB算法以及LinUCB算法。

根据查阅资料，python编写出MAB的 Softmax算法（或Epsilon-Greedy算法），BetaThompson sampling算法，UCB算法以及LinUCB算法。

最新资源