分布式SimRank算法：随机游走路径新方法

需积分: 50 130 浏览量更新于2024-08-08 1 收藏 465KB PDF 举报

"基于随机游走路径的分布式SimRank算法研究" SimRank算法是计算图中节点间相似度的一种重要方法，其核心思想是基于“如果两个节点有相似的邻居，则它们自身也是相似的”这一原则。在大规模网络数据环境下，传统的集中式SimRank算法由于计算复杂度高、内存需求大，已经无法满足实时性和效率的要求。为解决这个问题，2014年发表的一篇论文提出了基于随机游走路径的分布式SimRank算法。该算法分为两个主要阶段。第一阶段，采用BSP（Bulk Synchronous Parallel）模型进行数据预处理，构建随机游走路径的索引信息。BSP模型是一种并行计算模型，它将计算过程分为多个阶段，每个阶段所有处理器执行相同的计算步骤，然后同步进入下一阶段。在这个阶段，算法设计了一个Find-K-Paths算法，其目的是有效地生成和存储随机游走路径，同时通过阈值过滤机制减少不必要的路径生成，以提高效率并节省存储空间。这个阶段的关键是能够动态添加新的路径，以适应网络结构的变化。第二阶段，利用第一阶段生成的索引信息，进行SimRank值的计算。此阶段可能涉及到分布式环境下的通信和计算优化，如使用MapReduce等并行计算框架，将计算任务分解到多个节点上，使得计算过程并行化，从而提高计算效率和系统扩展性。通过对随机游走路径的索引查询，可以快速获取节点间的相似性信息，有效降低了计算复杂度。论文提出的这种方法在应对大规模网络数据时，能够提供更好的性能和可扩展性。通过随机游走路径，不仅能够捕捉到节点间的直接关系，还能考虑到间接关系，更全面地反映节点间的相似性。同时，分布式处理方式使得算法能够在大型分布式系统上运行，解决了集中式算法在大数据量下的计算瓶颈问题。这篇论文贡献了一种创新的分布式SimRank算法，它结合了随机游走理论和并行计算技术，对于网络分析、推荐系统、社交网络挖掘等领域具有重要的实践价值。通过优化的路径生成和索引策略，该算法能够有效地处理大规模网络中的相似性计算，提升了计算效率，同时具备良好的动态适应性。

基于随机游走路径的分布式 SimRank算法

刘恒，寇月

，申德荣，王泰明，于戈

东北大学信息科学与工程学院，沈阳 110004

Distributed SimRank Algorithm Based on Random Walk Path

􀆽

LIU Heng, KOU Yue

, SHEN Derong, WANG Taiming, YU Ge

College of Information Science and Engineering, Northeastern University, Shenyang 110004, China

+ Corresponding author: E-mail: kouyue@ise.neu.edu.cn

LIU Heng, KOU Yue, SHEN Derong, et al. Distributed SimRank algorithm based on random walk path. Journal

of Frontiers of Computer Science and Technology, 2014, 8(12)：1422-1431.

Abstract: SimRank is a widely used model for computing similarity, it measures similarity between objects based

on gr aph to pology. With the rapid increase of da ta, th e way of cen tralized SimRank is not applicable and cu rrent

distributed SimRank approaches have some drawbacks in efficiency and scalability. This paper presents a two-stage

distributed SimRank algorithm based on random walk path. The first stage is data preprocessing and a Find-K-Paths

algorithm based on BSP (b ulk synchronous parallel) model is proposed. The algorithm can effe cti vel y build the

index information of random walk path and support the dynamic adding of new paths. The number of the generated

paths can be reduced by thresho ld filtering. In the second stage, based on the index information, a distributed

SimRank algorithm is proposed under MapReduce. The experiments demonstrate the feasibility and effectiveness of

the proposed algorithm.

Key words: distributed SimRank; random walk path; BSP model; MapReduce

* The National Natural Science Foundation of China under Grant Nos. 61472070, 61033007 (国家自然科学基金); the National Basic

Research Program of China under Grant No. 2012CB316201 (国家重点基础研究发展计划(973 计划)); the Fundamental Research

Funds for the Central Universities of China under Grant Nos. 110404007, 130404015 (中央高校基本科研业务费专项基金); the Spe-

cialized Research Fund for the Doctoral Program of Higher Education of China under Grant No. 20120042110028 (高等学校博士学

科点专项科研基金); the MOE-Intel Special Fund of Information Technology under Grant No. MOE-INTEL-2012-06 (教育部-英特尔

信息技术专项科研基金).

Received 2014-05, Accepted 2014-07.

CNKI网络优先出版：2014-07-11, http://www.cnki.net/kcms/doi/10.3778/j.issn.1673-9418.1405053.html

刘恒，寇月，申德荣，等.基于随机游走路径的分布式SimRank算法[J].计算机科学与探索，2014，8（12）：1422-1431.

ISSN 1673-9418 CODEN JKYTA8

Journal of Frontiers of Computer Science and Technology

1673-9418/2014/08(12)-1422-10

doi: 10.3778/j.issn.1673-9418.1405053

E-mail: fcst@vip.163.com

http://www.ceaj.org

Tel: +86-10-89056056

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38529123

粉丝: 3
资源: 930

分布式SimRank算法：随机游走路径新方法

SimRank算法

simrank算法实现 java

基于随机游走路径的分布式SimRank算法.pdf

基于随机游走的社团发现算法Hadoop版

Python基于随机游走模型的PageRank算法及应用.zip

基于随机游走的PersonalRank算法

带有Lévy Flight机制的引力搜索算法* (2014年)

基于Python随机游走模型的 PageRank 算法及应用【100011665】

基于蚂蚁算法的动态分布式路由算法.rar-综合文档

RandomWalk.rar_random walk算法_randomwalk_randomwalk R_随机游走_随机游走算法

最新资源