下一代推荐系统：现状与展望

需积分: 9 194 浏览量更新于2024-08-02 收藏 552KB PDF 举报

"Web Intelligence Research at the WIC Beijing Center 的报告聚焦于推荐系统的发展与前沿技术，特别是针对下一代推荐系统的调研。报告详细介绍了当前推荐系统的主流方法，包括基于内容、协同过滤以及混合推荐策略，并讨论了这些方法的局限性及可能的扩展方向，如用户和物品理解的深化、情境信息的融入、多准则评价支持以及更灵活、侵入性更低的推荐方式。" 在现代信息技术中，推荐系统扮演着至关重要的角色，它们能够根据用户的兴趣和行为历史，为用户提供个性化的产品或服务建议。报告首先概述了推荐系统领域的现状，这一领域主要分为三类基本推荐方法： 1. **基于内容的推荐**：这种方法依赖于对用户过去的喜好和物品的特性进行分析，当系统能识别出新物品与用户过去喜欢的物品之间的相似性时，会推荐类似的新物品。 2. **协同过滤**：这是一种用户-用户或物品-物品的推荐策略，通过分析大量用户的历史行为数据，发现用户的兴趣模式并预测其他用户可能的兴趣。 3. **混合推荐**：结合了以上两种方法的优点，通过集成多种推荐策略来提高推荐的准确性和覆盖率。尽管现有的推荐系统已经取得了显著成就，但报告指出它们还存在一些局限性，比如用户和物品的理解不够深入，无法充分捕捉用户的动态变化和复杂需求；推荐过程中通常忽视了情境因素，而这些因素往往对用户的决策有重要影响；此外，单一的评分标准可能无法满足多样化的用户需求。为了克服这些局限，报告探讨了可能的系统扩展，包括： - **深化用户和物品理解**：通过更复杂的用户建模和机器学习算法，提高对用户偏好和物品特征的理解，这可能涉及到情感分析、社会网络分析等。 - **结合情境信息**：推荐系统可以考虑时间、地点、社交环境等上下文信息，以提供更为适时和贴切的推荐。 - **支持多准则评价**：允许用户根据不同的标准（如价格、品质、便利性等）给出评价，系统则根据这些多维度的反馈生成更全面的推荐。 - **提供灵活和非侵入式推荐**：减少推荐对用户日常体验的干扰，比如通过后台推荐或可调整的推荐强度，让用户在需要时获取建议而不感到压力。报告还涉及了推荐系统中的关键技术和方法，如评级估计、协同过滤算法的优化，以及未来可能的研究方向。通过这些扩展和改进，推荐系统有望在电子商务、媒体推荐、社交网络等更多领域发挥更大的作用，为用户提供更加个性化的体验。

Besides the traditional heuristics that are based mostly

on information retrieval methods, other techniques for

content-based recommendation have also been used, such

as Bayesian classifiers [70], [77] and various machine

learning techniques, including clustering, decision trees,

and artificial neural networks [77]. These techniques differ

from information retrieval-based approaches in that they

calculate utility predictions based not on a heuristic

formula, such as a cosine similarity measure, but rather

are based on a model learned from the underlying data

using statistical learning and machine learning techni-

ques. For example, based on a set of Web pages that were

rated as “relevant” or “irrelevant” by the user, [77] uses

the naive Bayesian classifier [31] to classify unrated Web

pages. More specifically, the naive Bayesian classifier is

used to estimate the following probability that page p

belongs to a certain class C

(e.g., relevant or irrelevant)

given the set of keywords k

1;j

; ...;k

n;j

on that page:

P ðC

1;j

&...&k

n;j

Þ: ð7Þ

Moreover, [77] uses the assumption that keywords are

independent and, therefore, the above probability is

proportional to

P ðC

P ðk

x;j

Þ: ð8Þ

While the keyword independence assumption does not

necessarily apply in many applications, experimental results

demonstrate that naı

ve Bayesian classifiers still produce

high classification accuracy [77]. Furthermore, both

P ðk

x;j

Þ and P ðC

Þ can be estimated from the underlying

training data. Therefore, for each page p

, the probability

P ðC

1;j

&...&k

n;j

Þ is computed for each class C

and page p

is assigned to class C

having the highest probability [77].

While not explicitly dealing with providing recommen-

dations, the text retrieval community has contributed several

techniques that are being used in content-based recommen-

der systems. One example of such a technique would be the

research on adaptive filtering [101], [112], which focuses on

becoming more accurate at identifying relevant documents

incrementally by observing the documents one-by-one in a

continuous document stream. Another example would be

the work on threshold setting [84], [111], which focuses on

determining the extent to which documents should match a

given query in order to be relevant to the user. Other text

retrieval methods are described in [50] and can also be

found in the proceedings of the Text Retrieval Conference

(TREC) (http://trec.nist.gov).

As was observed in [8], [97], content-based recommender

systems have several limitations that are described in the

rest of this section.

2.1.1 Limited Content Analysis

Content-based techniques are limited by the features that

are explicitly associated with the objects that these systems

recommend. Therefore, in order to have a sufficient set of

features, the content must either be in a form that can be

parsed automatically by a computer (e.g., text) or the

features should be assigned to items manually. While

information retrieval techniques work well in extracting

features from text documents, some other domains have an

inherent problem with automatic feature extraction. For

example, automatic feature extraction methods are much

harder to apply to multimedia data, e.g., graphical images,

audio streams, and video streams. Moreover, it is often not

practical to assign attributes by hand due to limitations of

resources [97].

Another problem with limited content analysis is that, if

two different items are represented by the same set of

features, they are indistinguishable. Therefore, since text-

based documents are usually represented by their most

important keywords, content-based systems cannot distin-

guish between a well-written article and a badly written

one, if they happen to use the same terms [97].

2.1.2 Overspecializati on

When the system can only recommend items that score

highly against a user’s profile, the user is limited to being

recommended items that are similar to those already rated.

For example, a person with no experience with Greek

cuisine would never receive a recommendation for even the

greatest Greek restaurant in town. This problem, which has

also been studied in other domains, is often addressed by

introducing some randomness. For example, the use of

genetic algorithms has been proposed as a possible solution

in the context of information filtering [98]. In addition, the

problem with overspecialization is not only that the

content-based systems cannot recommend items that are

different from anything the user has seen before. In certain

cases, items should not be recommended if they are too

similar to something the user has already seen, such as a

different news article describing the same event. Therefore,

some content-based recommender systems, such as Daily-

Learner [13], filter out items not only if they are too different

from the user’s preferences, but also if they are too similar

to something the user has seen before. Furthermore, Zhang

et al. [112] provide a set of five redundancy measures to

evaluate whether a document that is deemed to be relevant

contains some novel information as well. In summary, the

diversity of recommendations is often a desirable feature in

recommender systems. Ideally, the user should be pre-

sented with a range of options and not with a homogeneous

set of alternatives. For example, it is not necessarily a good

idea to recommend all movies by Woody Allen to a user

who liked one of them.

2.1.3 New User Problem

The user has to rate a sufficient number of items before a

content-based recommender system can really understand

the user’s preferences and present the user with reliable

recommendations. Therefore, a new user, having very few

ratings, would not be able to get accurate recommendations.

2.2 Collaborative Methods

Unlike content-based recommendation methods, collabora-

tive recommender systems (or collaborative filtering systems)

try to predict the utility of items for a particular user based

on the items previously rated by other users. More formally,

the utility uðc; sÞ of item s for user c is estimated based on

the utilities uðc

;sÞ assigned to item s by those users c

2 C

who are “similar” to user c. For example, in a movie

ADOMAVICIUS AND TUZHILIN: TOWARD THE NEXT GENERATION OF RECOMMENDER SYSTEMS: A SURVEY OF THE STATE-OF-THE-ART... 737

剩余15页未读，继续阅读

品味Google

粉丝: 13
资源: 5

下一代推荐系统：现状与展望

WIC组件，支持x86和x64

wic_x86_chs

WIC_practice

资源wic下载

wic-webpage

Cisco ISDN BRI S T WIC for the Cisco 1600, 1700, 2600, 3600, and 3700 Series

WIC与半导体.pdf

微软图形识别工具WIC

wic_x86.rar

WIC (WhoIsCalling)-开源

最新资源