SGP：大型社交网络图分区抽样算法

159 浏览量更新于2024-08-26 收藏 614KB PDF 举报

"SGP:基于图分区的大型社交网络抽样。该算法旨在从大型社交网络中抽取代表性样本，以供互联网服务进行准确的社会数据分析。通过将原始网络划分为多个子网络并均衡采样，SGP能保持采样网络与原网络在拓扑结构和社区结构上的相似性。论文作者包括夏琳·杜、云明叶、李艳和李月平，分别来自哈尔滨工业大学深圳研究生院、深圳职业技术学院计算机工程学院。实验结果显示，SGP在多个知名数据集上表现出色，证实了其有效性和适用性。" 在社交网络分析领域，获取一个大型社交网络的代表性样本至关重要，因为这直接影响到对大量社会数据的准确分析。SGP（Sampling Big Social Network Based on Graph Partition）是一种新的采样算法，专门针对这一需求而设计。该算法的核心在于图分区，即将原始的大型社交网络分割成多个小的子网络，然后对这些子网络进行均匀抽样。图分区在SGP中的作用是确保每个子网络都能反映原始网络的关键特征。通过这种方式，可以有效地维护采样网络的拓扑相似性，即抽样的网络结构应尽可能接近原始网络的结构。此外，由于社交网络通常具有明显的社区结构，SGP还特别关注保持社区结构的相似性。社区结构是指网络中节点之间存在强连接的区域，它对于理解网络内的用户群体行为和关系网络至关重要。在实现过程中，SGP可能采用了诸如METIS或Girvan-Newman等图分区算法，将大型网络划分为多个具有内部紧密连接和外部相对稀疏连接的子网络。然后，通过选择性地采样子网络中的节点和边，生成一个规模较小但结构上与原网络高度类似的样本网络。为了验证SGP的有效性，研究人员在几个著名的数据集上进行了实验，这些数据集可能包括Facebook、Twitter或LinkedIn等实际社交网络的数据。实验结果表明，SGP能够在保持关键网络特性的同时，显著减少网络的规模，从而降低了后续分析的复杂性和计算成本。此外，通过对抽样网络和原始网络的比较，SGP在社区检测、节点属性预测等任务上的表现也证明了其在保持网络结构和社区结构方面的优势。 SGP是一种创新的社交网络抽样方法，通过图分区策略，它能够在大规模社交网络中抽取具有代表性的样本，同时保持网络的拓扑和社区结构的相似性，为社交网络分析提供了高效且准确的工具。这对于依赖于社交网络数据的众多互联网服务来说，具有重要的实用价值。

SGP: Sampling Big Social Network Based on Graph Partition

Xiaolin Du, Yunming Ye

Key Laboratory of Internet Information Collaboration,

Shenzhen Graduate School, Harbin Institute of

Technology, China

duxiaolinhitsz@gmail.com

yeyunming@hit.edu.cn

Yan Li, Yueping Li

School of Computer Engineering,

Shenzhen Polytechnic,

Shenzhen, China

liyan@szpt.edu.cn

leeyueping@gmail.com

Abstract—Deriving a representative sample from a big social

network is essential for many Internet services that rely on

accurate analysis of big social data. A good sampling method

for social network should be able to generate small sample

networks with similar structures as original big network. In

this paper, we propose SGP, a new big social network

sampling algorithm based on graph partition. In SGP,

original network is firstly partitioned into several sub-

networks that will be sampled evenly. This procedure

enables SGP to effectively maintain the topological similarity

and community structure similarity between the sampled

network and its original network. We have evaluated SGP

on several well-known data sets. The experimental results

show that SGP outperforms six state-of-the-art methods.

Keywords-sampling algorithms; social networks; graph

partition; community structure; topology structure

I. INTRODUCTION

In the era of Social Web, social networks (twitter,

micro-blog, MSN, Facebook, co-citation relation, credit

network) appear everywhere. The last few years have

witnessed an explosive growth of online social networks

which have attracted most attention from all over the

world [1]. The rapid growth of social networks has

brought new challenges to the research on social networks.

The modern science of networks has brought significant

advances in our understanding of complex systems [2]. In

research, social networks are usually represented by

different types of graphs. Vertices represent entities, and

edges represent interactions between pairs of entities.

Some graph mining techniques (graph visualization

techniques, graph structure analyzing techniques, etc.) are

then employed to assist big social networks analysis.

However, for a large-scale graph with millions of vertices,

it is very difficult to use graph mining approaches to

handle the entire graph directly. Finding specific methods

to accelerate the large-scale graph mining process is an

essential issue. One popular solution is to accomplish a

sub-graph, which can represent the original graph

effectively so that we are able to use this sub-graph for

simulations and analysis. The accomplishment of a sub-

graph relies on a graph sampling process. This sampling

process aims at selecting a set of vertices and edges in a

way that the resulting sub-graph obeys some general

characteristics of the original graph. In this paper, we

focus on developing a large-scale graph sampling

technique.

Generally, sampling large graph encounters three

questions [3]. What is good sampling method? What is a

good sample size? How do we measure the goodness of a

single sample as well as the goodness of a whole sampling

method? Many researchers have proposed their solutions

to sample social networks. There are some state of the art

sampling algorithms: Random Node (RN) sampling,

Random PageRank Node (RPN) sampling, Random

Degree Node (RDN) sampling, Random Edge (RE)

sampling, Random Walk (RW) sampling, Random Jump

sampling(RJ), Forest Fire (FF) sampling, Breadth-First

Sampling (BFS) and other sampling strategies, which will

be introduced briefly in section II. For these algorithms,

sampling size is preset by users so that users can get their

ideal sampled graph. In the specific sampling process,

maintaining similar properties between the sampled graph

and the original graph is significant. Only the sampled

graph represents the original graph well, can we study the

sampled graph instead of the original graph. How to

evaluate whether the sampled graph and original graph

have similar properties? Now there are some techniques to

measure the similarity which will be introduced in section

II.

In this paper, we have proposed a new big social

network sampling algorithm based on graph partition

(SGP). The proposed SGP algorithm firstly partitions the

original network into several sub-networks, and then

stratifies samples vertices in each sub-network. Thus, this

procedure enables SGP to effectively maintain the

topological similarity and community structure similarity

between the sampled network and its original network.

The rest of the paper is organized as follows: Section II

presents the related works. Section III describes the

proposed SGP algorithm based on graph partition. The

experiment process and the experimental results will be

presented in Section IV. Finally, Section V concludes the

paper.

II. RELATED WORKS

In this section, we will introduce some network

sampling algorithms and performance evaluations

respectively.

A. Graph Sampling Algorithms

Currently, there have been several state-of-the-art

graph sampling algorithms. Conceptually, we can split

these existing algorithms into three groups [3]: methods

based on randomly selecting vertices, methods relying on

randomly selecting edges, and exploration techniques that

simulate random walks or virus propagation to find a

representative sample of the vertices.

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38606076

粉丝: 4
资源: 942

SGP：大型社交网络图分区抽样算法

SGP: Sampling Big Social Network Based on Graph Partition

SGP30_单片机sgp30_SGP30STM32_sgp30传感器_sgp30与51_sgp30检测原理_

基于HAL库开发的SGP30驱动程序.zip

基于sgp4的卫星轨道计算程序

sgp30基于51测量气体的程序

sgp30气体基于51测量气体的程序

基于STM32的SGP30的通讯过程

SGP30读取CO2代码

sgp30内部工作流程

sgp30传感器中文数据手册

最新资源