启发式数据融合提升协作标签系统中的项目推荐效果

PDF格式 | 693KB | 更新于2024-07-15 | 95 浏览量 | 举报

本文主要探讨了在协作标记系统（Collaborative Tagging Systems）中，通过启发式数据融合（Heuristic Data Fusion）来提升项目推荐（Item Recommendation）的效能。随着互联网的发展和信息爆炸，用户面临的信息过载问题日益严重，因此个性化推荐服务的需求也随之增长。在这样的背景下，研究者们一直在寻找有效的方法来改进这些系统的性能，特别是针对项目推荐这一关键环节。作者Hao Wu、Yijian Pei、Bo Li等人来自云南大学信息科学与工程学院，他们针对当前推荐系统的现状进行了深入研究。文章首先概述了现有的推荐方法，这些方法根据其算法原理被分类为不同的类别，以便更好地理解各种技术的优缺点。这包括但不限于基于内容的推荐、协同过滤（Collaborative Filtering）、矩阵分解等策略。接下来，研究者们在实验部分选择了大约四十个推荐组件，并将其应用于来自不同数据集的测试，目的是评估数据融合在项目推荐中的实际效果。数据融合作为一种整合多个来源信息的技术，旨在通过结合不同推荐算法的优势，提高推荐的准确性和多样性，从而减少信息过载，增强用户的满意度。为了实现这一目标，文章可能讨论了如何设计和实施有效的数据融合策略，比如加权平均、集成学习或者深度学习方法。此外，还可能包含了对融合效果的度量标准，如精确率、召回率、F1分数以及用户覆盖度等，以全面评价推荐性能。研究过程中，作者可能进行了细致的性能比较分析，探讨了数据融合在不同场景下的表现，以及与其他推荐策略相比的优劣。此外，他们还可能关注了系统的可扩展性和实时性，因为在一个协作标记系统中，推荐的及时性和用户体验至关重要。这篇研究论文提供了一个理论框架和实践经验，展示了在协作标记系统中利用启发式数据融合进行项目推荐的可能性和潜力。它不仅有助于推动该领域的学术研究，也为实际应用中的在线平台提供了实用的参考策略。通过对比实验结果和分析，这篇文章对于那些寻求在信息过载环境中优化推荐系统的专业人士具有很高的参考价值。

3.3. Recommender components

In collaborative tagging systems, item recommendation is

formulated as: given a target user u 2 U, we ﬁrst use various

components based on different principles to predict the likelihood

of an unseen item (or resource) i 2 I to be accessed by u, namely, to

estimate pðijuÞ, then these unseen items are ranked according to

pðijuÞ and suggested to u.

3.3.1. Collaborative ﬁltering

User-based CF (UCF). User-based collaborative ﬁltering [2,44] is

based on the assumption: a natural way to ﬁnd the contents of

interest for a user u is to ﬁrst ﬁnd other like-mind users of u, and

then recommend u the contents these users are interested in, as

‘‘birds of a feather ﬂock together’’. Following this, we give the UCF

model to estimate pðijuÞ, as expressed in Eq. (5),

pðijuÞ/

2NSðuÞ

pðij

simðu;

2NSðuÞ

simðu;

ð5Þ

where NSðuÞ is the neighbor set of u; pðij

Þ indicates the preference

of user

to item i. Here, we let pðij

Þ¼1, if

has tagged i; pðij

Þ¼0,

otherwise. simðu;

Þ represents the similarity between u and

, and

can be calculated using cosine-based or jaccard-based functions in

combination with different feature weighting schemes.

Item-based CF (ICF). Different from UCF, Item-based CF makes

suggestions considering those items are similar to the items

collected by the target user [15,2]. To estimate pðijuÞ based on

the ICF, we use Eq. (6),

pðijuÞ/

j2I

simði; jÞ

2NSðjÞ

simði

; jÞ

pðjjuÞð6Þ

where NSðjÞ is the neighbor set of item j; pðjjuÞ indicates the

preference of u to j. Here, we let pðjjuÞ¼1=jI

j, and I

is the item

set collected by u. simði; jÞ can be computed using different combi-

nations of similarity functions and proﬁling models.

Social-based CF (SCF). Generally, users in social systems are

strongly affected by their social friends, and more prefer to adopt

the recommendations came from their friends. Thus social

relations provide a trustable manner to make recommendation in

social tagging systems. If there is enough social information

available, more reliable and accurate recommendations should be

achieved. For this, we introduce another UCF-like recommender

proposed in Ben-Shimon et al. [6]. The recommender explicitly

introduces distances between users in the general UCF framework

against a social graph, as in Eq. (7),

pðijuÞ/

2FSðu;LÞ

pðij

Þb

lðu;

simðu;

2FSðu;LÞ

simðu;

ð7Þ

where b is an attenuation coefﬁcient of the social network that

adjusts the effect of the distance, lðu;

Þ, between two users.

simðu;

Þ can be estimated in the same way as in the UCF. FSðu; LÞ

is the set of neighbors who have a social relation with the user u

by a maximum path length L. Here, we can use a Breadth-First Search

algorithm to ﬁnd FSðu; LÞ for each user [35]. The search starts from

the node u, accesses u’s directly-connected nodes, marks these

nodes with distance l ¼ 1, and adds them to FS. Then, for every node

in FS, it mark its neighbors not included in FS with distance l ¼ l þ 1,

and adds such neighbors to FS. Finally, it repeats this process a few

more times to ﬁnd the set FSðu; LÞ.

3.3.2. Random walks

Different from CF-based methods, network-based methods

consider relevance or similarities propagation on folksonomy

graph, and attempt to fully utilize information provided by folks-

onomy data, thus can alleviate the problem of data sparsity.

Random Walk with Restart(RWR). RWR-based similarity has been

proved a good measure to personalized recommendation [26,48].

Given a directed graph, it considers a random particle that starts

from node x. The particle iteratively transmits to its neighborhood

with the probability that is proportional to their edge weights. Also

at each step, it has some probability d (the classic setting is 0.85) to

return to x. The relevance score of node y w.r.t node x; pðyjxÞ,is

deﬁned as the steady-state probability that the particle will ﬁnally

stay at node y. Such a steady probability can be gotten using power

method computing until convergence, shown as Eq. (8),

ðtÞ

¼ dAp

ðt1Þ

þð1  dÞe ð8Þ

where A is the column-normalized adjacent matrix of the graph.

¼ 1=O

if node i links to node j, and O

is the outgoing degree of

the node i; otherwise, A

¼ 0. The vector e is the preference vector

that is usually conﬁgured as e

¼ 1 and e

¼ 0 for any other node y.

To use RWR model in item recommendation, a social tagging

graph should adhere to the folksonomy schema. In this graph,

the relevance score of resource i w.r.t user u is correspondingly

deﬁned as the steady-state probability that the particle will ﬁnally

stay at node i, namely, pðijuÞ. To estimate pðijuÞ, we suggest project-

ing the folksonomy schema into several sub-schemas and apply

RWR-based similarity on the user-resource (UR) and resource-tag

(RT) bipartite graphs, respectively. Correspondingly, for the UR

bigraph, we set e to prefer the node representing u, and for the

RT bigraph, we set e to prefer the set of resource nodes I

in the

proﬁle of u [48]. The conﬁguration for e is described as Eq. (9),

1 if x ¼ uðUR bigraphÞ

1=jI

j if x 2 I

ðRT bigraphÞ

0 otherwise

ð9Þ

Probability Spreading (ProbS). Supposing that a kind of token is

initially located on items, each item will averagely distribute its

token to all neighboring users, and then each user will redistribute

the received token to all his/her collected items. Given a user-

resource bigraph, and suppose o

and o

to represent respectively

the number of users who have collected resource i and the number

of resources collected by user u. ProbS works by assigning items an

initial level of ’’tokens’’ denoted by the vector p (where p

is the

‘‘token’’ possessed by item i), and then redistributing it via the

transformation: p

¼ Wp, where

u2U

ð10Þ

is a column-normalized n  n matrix. The adjacency matrix A

corresponds to the user-resource bigraph, where A

¼ 1ifitemi is

collected by the user u,andA

¼ 0otherwise[56]. Recommendations

for a given user u are obtained by setting the initial token vector p in

accordance with the items the user has previously collected. That is,

by setting p

¼ A

. In this case, the initial token can be understood

as giving a unit recommending capacity to each collected item, and

the initial token vectors for different users have captured the

personalized preferences. The resulting recommendation list of

uncollected items is then sorted according to p

in descending order.

ProbS performs originally on the user-resource bipartite

(ProbS-UR). If we change the users to the tags, we can obtain a fur-

ther variation ProbS-RT [54].

3.3.3. Semantic models

Topic Models. A topic model is a type of statistical model for dis-

covering the abstract ‘‘topics’’ that occur in a collection of docu-

ments. Two typical models, Probability Latent Semantic Analysis

[21] and Latent Dirichlet Allocation [7] have recently gained many

H. Wu et al. / Knowledge-Based Systems 75 (2015) 124–140

127

剩余16页未读，继续阅读

weixin_38747818

粉丝: 9

启发式数据融合提升协作标签系统中的项目推荐效果

多启发式规则融合粒子群算法的受限项目调度.pdf

Python实现生物启发式多智能体强化学习研究

【数据清洗技术】：为ISO_IEC 23150标准下的数据融合做准备

多机械臂协作系统中的路径规划问题：协同效率的关键

尼康机台软件开发难题：EGA标记的融合之道

【人机协作技术：FANUC机器人协作应用的探索与实践】

MEGA11实操案例分析：手把手教你从数据到系统发育树的完整构建过程

SVN与代码审查：结合代码审查工具实现协作开发

CATIA工程图协作设计攻略：跨部门团队协同之道

ibaAnalyzer数据可视化：从数据到图表的转化艺术

最新资源