A novel green algorithm for sampling complex networks
Chao Tong, Yu Lian, Jianwei Niu
n
, Zhongyu Xie, Yang Zhang
School of Computer Science and Engineering, Beihang University, Beijing, China
article info
Available online 8 July 2015
Keywords:
Network sampling
Green computing
Community structure
PageRank
abstract
Researches of complex networks such as social networks are becoming popular in recent years. Due to
the large scale and complex structure of these networks, analysis and studies on a complete network
require a lot of computational resources and storage space, which will also consume a large amount of
energy. Sampling algorithms provide a new green approach for this problem. Especially some researches
related to network communities with high energy consumption can be directly conducted on the
sampled networks, which maintain the community structure of original networks. In this paper, we
propose a sampling algorithm named Improved Forest Fire Sampling algorithm based on PageRank
(IFFST-PR) based on the idea of Forest Fire Sampling and PageRank algorithm. IFFST-PR can maintain the
community structure of original networks. We select a set of key nodes called community cluster center,
according to a coefficient named community coefficient. Besides, we adopt PageRank to decide the order
of initiative sampling nodes. To make a comprehensive comparison of IFFST-PR with other 6 algorithms,
we use network community profile and Kolmogorov–Smirno D statistics to prove the consistency
between sampled networks and original networks. Experiments applied on 3 different data sets show
that IFFST-PR has better performance in terms of most parameters defined in network community
profile than those of the other 6 algorithms.
& 2015 Elsevier Ltd. All rights reserved.
1. Introduction
Recently, people found that research of complex networks is
penetrating into many other areas such as mathematics discipline,
life science and engineering discipline (Hunt and Cooke, 1996;
Newman, 2003; van Mieghem, 2014; Tu, 2000). Research con-
ducted on complex networks requires large amount of energy and
storage space, because complex networks have intricate structure
and huge amount of nodes and edges. For instance, a large
national power system consists of numerous of power plants
and substations. There are millions of users in social networks
such as Facebook and Twitter. The nervous system of human
beings contains billions of neurons (Crossley et al., 2013). All these
systems are associated with complex networks. Besides, the
number of nodes and edges of complex networks may dynamically
change. Computing and searching on such a large-scale network is
not feasible on the aspects of space and time. Therefore, how to
efficiently conduct research on complex networks is a pressing
problem needed to be solved. Implementing sampling algorithms
on a complex network has been proved to be a possible approach
to solve this problem. Methods aiming at reducing the scale of
networks have been tested (Krishnamurthy et al., 2007). Sampling
methods have been used to sample a representative small-scale
network from Facebook (Gjoka et al., 2010). Sampling algorithms
can also help recommendation systems to improve their efficiency
(Gjoka et al., 2011).
With the deepening of research, the community structure of
complex networks has been attracting a lot attention. The com-
munity structure of networks is extremely important in the study
of complex networks. Because it depicts the aggression of nodes
and the uneven distribution of edges (Palla et al., 2005; Newman,
2004). Generally, nodes in the same community tend to connect
closely, while nodes in different communities have fewer edges.
For example, as for a social network, users in the same community
usually share the same interests. Operators of a recommendation
system can predict a user's preference according to the preferences
of other users in the same community. Community structure also
has been used to detect terrorists, predict features of unknown
protein, search web sites, enhance scoring systems and so on.
Obviously, community structure plays an important role in the
study of networks. In Bader and McCloskey (2009) adefinition of
community structure is proposed. It pointed out that a community
is a group of nodes which has close interconnected. While Radicchi
et al. (2004) believe that nodes in the same community should
closely interconnect, meanwhile nodes in different communities
should have fewer edges. These two definitions are currently
widely accepted definitions. In this paper, we adopt the second
definition as the criteria of community structure.
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/jnca
Journal of Network and Computer Applications
http://dx.doi.org/10.1016/j.jnca.2015.05.021
1084-8045/& 2015 Elsevier Ltd. All rights reserved.
n
Corresponding author.
E-mail address: niujianwei@buaa.edu.cn (J. Niu).
Journal of Network and Computer Applications 59 (2016) 55–62