基于贝叶斯网络和MapReduce的社交网络用户相似性发现方法

86 浏览量更新于2024-08-27 收藏 243KB PDF 举报

本文主要探讨了一种基于贝叶斯网络（Bayesian Network, BN）和MapReduce在社交网络中发现用户相似性的方法。该研究的背景是随着社交网络的快速发展，如何有效地分析用户行为和关系，挖掘潜在的相似性变得尤为重要。作者Juan Xu、Kun Yue（通讯作者）、Jin Li、Feng Wang和Weiyi Liu来自云南大学的信息科学与工程学院、软件学院以及云南科技大学的计算机技术应用重点实验室。首先，作者构建了一个以贝叶斯网络为基础的模型，称为社交用户贝叶斯网络（Social User Bayesian Network, SUBN）。SUBN被设计用来描述用户之间的直接相似性关系，通过概率模型捕捉用户行为、兴趣或交互模式中的依赖性和不确定性。这种网络结构有助于理解用户之间的复杂联系，不仅考虑了单一属性的影响，还能处理多因素的综合影响。接着，为了应对社交网络数据的规模问题，文中提出了一种分布式存储策略，基于Hadoop平台，能够高效地存储和管理SUBN，确保大规模数据下的计算效率。这种存储方法利用Hadoop的并行计算能力，将复杂的推理任务分解到多个节点上执行，显著提高了处理速度和容错性。随后，作者进一步设计了一种基于SUBN的算法，旨在发现用户之间的间接相似关系。这个算法可能涉及到概率推理、贝叶斯更新和关联规则挖掘等技术，通过识别用户的行为模式和共同兴趣，发现那些虽然没有直接互动但具有相似行为特征的用户。实验结果显示，这种方法在时间和准确性上都表现出色，能够在保证计算效率的同时，有效地揭示社交网络中用户的隐含相似性。关键词包括社交网络（Social Network）、贝叶斯网络（Bayesian Network）、MapReduce、用户相似性（User Similarity）和分布式存储（Distributed Storage），这些核心概念贯穿全文，体现了研究的核心内容和目标。这篇研究论文提供了一种创新的方法，利用贝叶斯网络和分布式计算技术来处理社交网络中的用户相似性问题，对于提高社交网络数据分析的效率和精度具有重要意义。

An Approach for Discovering User Similarity in

Social Networks Based on the Bayesian Network and

MapReduce

Juan Xu

, Kun Yue (Corresponding author)

, Jin Li

, Feng Wang

, Weiyi Liu

School of Information Science and Engineering, Yunnan University, Kunming, China

kyue@ynu.edu.cn

School of Software, Yunnan University, Kunming, China

Yunnan Computer Technology Application Key Lab, Kunming University of Science and Technology, Kunming, China

Abstract—Adopting Bayesian network (BN) as the effective

framework for representing and inferring dependencies and

uncertainties among variables, in this paper, we established a

BN-based model to discover user similarities in social networks.

First, we built a BN to describe the direct similarity relationships

between users, called social user BN and abbreviated as SUBN.

Second, we proposed a distributed storage method based on

Hbase to store the SUBN and support the efficient probabilistic

inferences. Consequently, we proposed a SUBN-based method to

find indirect similarity relationships between users. Experimental

results show the efficiency and accuracy of our method.

Keywords—Social network, User similarity, Bayesian network

(BN), Hbase, MapReduce, Probabilistic inference

I. INTRODUCTION

With the rapid development of Web 2.0 applications and

social networks, social media have been regarded as highly

valuable as they can broaden our horizons in keeping track of

the life, in learning about societies, or simply, in making

advertising more profitable [1]. Recently, many researchers

have conducted various studies upon social networks, such as

community evolution [2] and product recommendation [3].

Among all of these social network studies, finding similar users

can help understand how user relationships will evolve, and

what paths should be taken to spread specific news/ads/

political views or which factors should be targeted for these

scenarios [4]. Generally, as an important kind of relationship

among social users, user similarities in social network reflect

user preference in social activities, and can be well used in

product recommendation, since similar users tend to share their

opinions together and the recommendations from similar users

are more willing to trust [3]. User similarities can be also used

to explore one user’s evaluation of another [5] and discover

user’s topic and role. Therefore, user similarity establishes the

basis for various perspectives of social network analysis.

Without loss of generality, user similarities in social

network are always implied and reflected from the activities of

their social interactions, which we call transactions analogous

to those in frequent pattern mining. For example, the

similarities among co-authors of academic papers are implied

and reflected from the DBLP records [6] (i.e., transactions).

Multiple users may relate to various products in the Epinions

social network [7], and multiple users may be connected to the

same video in the Youtube social network [8].

In recent years, many researchers proposed various

methods for discovering the similarities of social network users

from user interaction data by means of collaborative filtering,

link-based, and score-based ideas [9, 10]. However, when

confronted with massive transaction datasets, the accuracy and

scalability of the above methods cannot be guaranteed. To this

end, we consider the following two aspects. On one hand from

the inherence of similarity, uncertainties are doomed both in

representation and derivation of user similarities. On the other

hand from the rapid expansion of realistic social network

applications, large scale transactions of user interactions are

necessary to be retrieved efficiently. Therefore, it is natural to

consider discovering the similarities of social users from the

large scale transactions of historical activity interactions by

focusing on the uncertainty representation and derivation. This

is exactly the problem that we will address in this paper.

It is known that Bayesian network (BN) is the well-adopted

framework for uncertainty representation and inferences. A BN

is a directed acyclic graph (DAG), where nodes represent

random variables and edges represent dependencies among

these random variables. Each variable in a BN is associated

with a conditional probability table (CPT) to give the

probability of each parent state. By combining the graph and

probability theories, uncertainties can be represented directly

and inferred effectively. These mechanisms of uncertainty

representation and inferences make us use BN to describe and

discover the direct and indirect user similarities in this paper.

Meanwhile, MapReduce is a programming model for

processing and analyzing large data sets [11]. It not only offers

a parallel programming model, but also can process massive

data. Thus, we discuss the method for BN-based modeling and

induction of user similarities by adopting MapReduce as the

mechanism of massive transactions of social user behaviors.

Generally, the contributions of this paper are as follows:

• We propose a BN of users to represent the direct

similarity relationships between social network users,

called social user BN and abbreviated as SUBN. To

construct the SUBN from transaction datasets, we give

the MapReduce-based algorithm to obtain the DAG by

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38633475

粉丝: 3
资源: 946

基于贝叶斯网络和MapReduce的社交网络用户相似性发现方法

Pattern Recognition with Neural Networks in C++

An Approach to Discovering Reusable Components in Java Legacy System

Discovering hidden suspicious accounts in online social networks

A Community Discovering Method Based on Event Network for Topic Detection

Discovering protein complexes in protein interaction networks via exploring the weak ties effect

Dynamic Online HDP model for discovering evolutionary topics from Chinese social texts

Acquisitional Rule-based Engine for Discovering IoT.pdf

Automatically discovering surveillance devices in the cyberspace

Discovering Modern C++: An Course for Scientists, Engineers, and Programmers

探索信息系统的探索方法Discovering Information Systems An Exploratory Approach

最新资源