Netflix大奖：Bell-Kor解决方案与新预测器解析

需积分: 10 108 浏览量更新于2024-09-08 收藏 128KB PDF 举报

"这篇文章是关于Yehuda Koren在2009年对Netflix大奖赛的贡献，其中详细描述了‘Bell-Kor’s Pragmatic Chaos’最终解决方案的一部分。这个解决方案赢得了Netflix大奖。该解决方案包括在AT&T与Robert Bell和Chris Volinsky合作期间创建的所有预测器，这些内容也在2008年的进步奖报告中有所提及。本文主要关注的是新开发的预测器。 I. 引言文章指出，相比于去年的解决方案，他们进一步优化了基础预测器，这提高了其他模型的性能，特别是矩阵分解模型。此外，他们引入了一个针对时间动态的邻域模型扩展。在受限玻尔兹曼机（RBM）方面，他们采用了一种新的RBM模型，通过条件化可见单元来提高准确性。最后，他们提出了一种基于梯度增强决策树（GBDT）的新混合算法。 II. 前导知识 Netflix数据集包含数百万用户的电影评级，比赛的目标是预测用户未评分的电影。参赛者需要构建一个模型，该模型能够预测用户对特定电影的评分，以减少预测误差。 III. 基线预测器的改进作者们改进了基线预测器的性能，这些预测器是其他复杂模型的基础，如协同过滤模型。基线模型的优化直接影响整体解决方案的准确性和效率。 IV. 矩阵分解模型矩阵分解是一种常用的推荐系统技术，它通过将用户-电影评分矩阵分解为两个低秩矩阵来捕捉用户和电影的潜在特征。优化后的基线预测器提升了矩阵分解模型的效果。 V. 针对时间动态的邻域模型传统的邻域模型基于用户或项目的相似性进行预测。新扩展考虑了时间因素，即用户随着时间变化的品味，从而提供了更准确的预测。 VI. 条件化可见单元的RBM 受限玻尔兹曼机是一种概率图模型，用于学习复杂的非线性关系。通过条件化可见单元，RBM可以更好地适应数据分布，提高预测评分的准确性。 VII. 梯度增强决策树（GBDT）混合算法 GBDT是一种强大的监督学习方法，可以组合多个弱预测器形成强预测器。在Netflix大奖赛中，这种算法被用来融合不同的预测模型，以达到最佳的整体预测效果。这篇文章详细阐述了Netflix大奖赛获胜团队所采用的创新预测技术和方法，这些技术包括改进的基线预测、矩阵分解、时间敏感的邻域模型、优化的RBM以及基于GBDT的混合策略，这些都为提高推荐系统的性能和准确性做出了重大贡献。"

The BellKor Solution to the Netﬂix Grand Prize

Yehuda Koren

August 2009

I. INTRODUCTION

This article describes part of our contribution to the “Bell-

Kor’s Pragmatic Chaos” ﬁnal solution, which won the Netﬂix

Grand Prize. The other portion of the contribution was created

while working at AT&T with Robert Bell and Chris Volinsky,

as reported in our 2008 Progress Prize report [3]. The ﬁnal

solution includes all the predictors described there. In this

article we describe only the newer predictors.

So what is new over last year’s solution? First we further im-

proved the baseline predictors (Sec. III). This in turn improves

our other models, which incorporate those predictors, like the

matrix factorization model (Sec. IV). In addition, an extension

of the neighborhood model that addresses temporal dynamics

was introduced (Sec. V). On the Restricted Boltzmann Ma-

chines (RBM) front, we use a new RBM model with superior

accuracy by conditioning the visible units (Sec. VI). The ﬁnal

addition is the introduction of a new blending algorithm, which

is based on gradient boosted decision trees (GBDT) (Sec. VII).

II. PRELIMINARIES

The Netﬂix dataset contains more than 100 million date-

stamped movie ratings performed by anonymous Netﬂix cus-

tomers between Dec 31, 1999 and Dec 31, 2005 [4]. This

dataset gives ratings about m = 480, 189 users and n = 17, 770

movies (aka, items).

The contest was designed in a training-test set format. A

Hold-out set of about 4.2 million ratings was created consisting

of the last nine movies rated by each user (or fewer if a

user had not rated at least 18 movies over the entire period).

The remaining data made up the training set. The Hold-out

set was randomly split three ways, into subsets called Probe,

Quiz, and Test. The Probe set was attached to the training

set, and labels (the rating that the user gave the movie) were

attached. The Quiz and Test sets made up an evaluation set,

which is known as the Qualifying set, that competitors were

required to predict ratings for. Once a competitor submits pre-

dictions, the prizemaster returns the root mean squared error

(RMSE) achieved on the Quiz set, which is posted on a public

leaderboard (www.netflixprize.com/leaderboard).

RMSE values mentioned in this article correspond to the Quiz

set. Ultimately, the winner of the prize is the one that scores

best on the Test set, and those scores were never disclosed by

Netﬂix. This precludes clever systems which might “game” the

competition by learning about the Quiz set through repeated

submissions.

Compared with the training data, the Hold-out set contains

many more ratings by users that do not rate much and are

Y. Koren is with Yahoo! Research, Haifa, ISRAEL. Email:

yehuda@yahoo-inc.com

therefore harder to predict. In a way, this represents real

requirements for a collaborative ﬁltering (CF) system, which

needs to predict new ratings from older ones, and to equally

address all users, not just the heavy raters.

We reserve special indexing letters to distinguish users from

movies: for users u, v, and for movies i, j. A rating r

indicates

the preference by user u of movie i. Values are ranging from

1 (star) indicating no interest to 5 (stars) indicating a strong

interest. We distinguish predicted ratings from known ones,

by using the notation ˆr

for the predicted value of r

The scalar t

denotes the time of rating r

. Here, time is

measured in days, so t

counts the number of days elapsed

since some early time point. About 99% of the possible ratings

are missing, because a user typically rates only a small portion

of the movies. The (u, i) pairs for which r

is known are stored

in the training set K = {(u, i) | r

is known}. Notice that K

includes also the Probe set. Each user u is associated with a

set of items denoted by R(u), which contains all the items

for which ratings by u are available. Likewise, R(i) denotes

the set of users who rated item i. Sometimes, we also use

a set denoted by N(u), which contains all items for which

u provided a rating, even if the rating value is unknown.

Thus, N(u) extends R(u) by also considering the ratings in

the Qualifying set.

Models for the rating data are learned by ﬁtting the pre-

viously observed ratings (training set). However, our goal is

to generalize those in a way that allows us to predict future,

unknown ratings (Qualifying set). Thus, caution should be ex-

ercised to avoid overﬁtting the observed data. We achieve this

by regularizing the learned parameters, whose magnitudes are

penalized. The extent of regularization is controlled by tunable

constants. Unless otherwise stated, we use L2 regularization.

This is a good place to add some words on the constants

controlling our algorithms (including step sizes, regularization,

and number of iterations). Exact values of these constants

are determined by validation on the Probe set. In all cases

but one (to be mentioned below), such validation is done in

a manual greedy manner. That is, when a newly introduced

constant needs to get tuned, we execute multiple runs of the

algorithms and pick the value that yields the best RMSE on

the Netﬂix Probe set [4]. This scheme does not result in

optimal settings for several reasons. First, once a constant is

set we do not revisit its value, even though future introduction

of other constants may require modifying earlier settings.

Second, we use the same constants under multiple variants

of the same algorithm (e.g., multiple dimensionalities of a

factorization model), whereas a more delicate tuning would

require a different setting for each variant. We chose this

convenient, but less accurate method, because our experience

showed that over tuning the accuracy of a single predictor does

下载后可阅读完整内容，剩余9页未读，立即下载

qq_18478043

粉丝: 0
资源: 2

Netflix大奖：Bell-Kor解决方案与新预测器解析

The-BellKor-solution-to-the-Netflix-Prize

netflix_titles.csv

基于springboot共享经济背景下校园闲置物品交易平台源码数据库文档.zip

基于WoodandBerry1和非耦合控制WoodandBerry2来实现控制木材和浆果蒸馏柱控制Simulink仿真.rar

emcopy042002.zip

(源码)基于Python的遥感图像语义分割系统.zip

(源码)基于Spring Boot的博客管理系统.zip

基于springboot的中医院问诊系统源码数据库文档.zip

基于SpringBoot+Vue的校园篮球联赛管理系统源码数据库文档.zip

基于springboot框架药品购买系统源码数据库文档.zip

最新资源