Community Detecting and Feature Analysis in
Real Directed Weighted Social Networks
Yao Liu, Qiao Liu, and Zhiguang Qin
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu
610054, China
E_mail: liuyao@uestc.edu.cn, {qliu, qinzg}@uestc.edu.cn
Abstract—Real social networks usually have some
structural features of the complex networks, such as
community structure, the scale-free degree distribution,
clustering, "small world" network, dynamic evolution and
so on. A new community detecting algorithm for directed
and weighted social networks is proposed in this paper. Due
to the use of more reference information, the accuracy of
the algorithm is better than some of the typical detecting
algorithms. And because of the use of heap structure and
multi-task modular architecture, the algorithm also got a
high computational efficiency than other algorithms. The
effectiveness and efficiency of the algorithm is validated by
experiments on real social networks. Based on the theories
and models of complex networks, the features of the real
large social networks are analyzed.
Index Terms
—modularity, degree distribution, clustering
coefficient, hierarchy
I. INTRODUCTION
Recently, many researchers in different fields use the
topological properties and evolutionary processes of the
complex networks to describe the relationships and
collective behaviors in their own fields [1, 2]. This
methodology is called network analysis. New analysis
methods and topology properties are proposed by this
approach.
Network modeling is becoming an essential tool to
study and understand the complexity of many natural and
artificial systems [3]. This understanding firstly analyzed
the topological features of these systems, which usually
related to the complex networks. Examples are the degree
distribution, the average degree, the clustering coefficient,
the node betweenness, the short path length and the
assortative mixing describing the node correlations in the
networks.
Nowadays, the detecting of community structure has
become a hot area in the research of the complex
networks, which is also known as graph partitioning.
Many definitions of community are presented in this area.
In essence, it means the gathering of nodes into groups
such that there is a higher density of edges within groups
than between them [4]. The main problem in the area is to
find communities within large networks in some
automated fashion with the shortest possible execution
time.
Several algorithms have been proposed to find
reasonably good partitions in a reasonably fast way. Early
community detecting algorithms are always related to the
theories of graph partitioning and hierarchical clustering,
such as the agglomerative algorithm [5] and the
divisive algorithm [2, 3, 6] using the hierarchical
clustering theory; the Kernighan-Lin algorithm [7] and
the spectral bisection algorithm [8] using the graph
partitioning theory. But they all face the problem that
they can’t determine the exact value of the community
size and number, or the optimal community partition.
To solve this problem, a number of new algorithms
have been proposed in recent years. Girvan and Newman
proposed the GN divisive algorithm that uses edge
betweenness as a metric to identify the boundaries of
communities [2]. But the GN algorithm makes heavy
demands on computational resources, running in
2
OM N
time on an arbitrary network with
edges
and
nodes, or
3
ON
time on a sparse graph. This
restricts the algorithm’s use to large networks with
thousands of nodes.
In 2004, Newman proposed the modularity concept,
which can be used to measure the quality of community
partition [6]. The modularity of a partition is a scalar
value between -1 and 1 that measures the density of links
inside communities as compared to links between
communities [1, 2]. High values of modularity are
supposed to indicate high values of internal link densities
for the communities, which are distinct from the groups
with randomly linked nodes.
Based on modularity, a number of faster algorithms
have been proposed. Newman proposed an algorithm
based on the greedy optimization of the quantity
modularity [5]. This method has a running time in
(( ) )O M NN+
or
on a sparse graph, which is
substantially faster than the GN algorithm. So it can be
used in large networks.
Clauset proposed the CNM algorithms [4] that
performs the same greedy optimization as the algorithm
of [5]. By exploiting some shortcuts in the optimization
problem and using more sophisticated data structures, it
runs far more quickly, in time
( log )O dM N
where
d
is
the depth of the “dendrogram” describing the network’s
community structure [4]. For networks that have a
hierarchical structure
, the CNM algorithm has
an essentially linear running time
2
( log ( ))ON N
. However
the CNM algorithm has a tendency to produce
super-communities that contain a large fraction of the
JOURNAL OF NETWORKS, VOL. 8, NO. 6, JUNE 2013
doi:10.4304/jnw.8.6.1432-1439