A new blockmodeling based hierarchical clustering algorithm
for web social networks
$
Shaojie Qiao
a,
n
, Tianrui Li
a
, Hong Li
a
, Jing Peng
b
, Hongmei Chen
a
a
School of Information Science and Technology, Southwest Jiaotong University, No. 111, Erhuanlu Beiyiduan, Chengdu, Sichuan 610031, China
b
Department of Science and Technology, Chengdu Municipal Public Security Bureau, No. 136, Wenwu Road, Chengdu, Sichuan 610017, China
article info
Article history:
Received 18 October 2010
Received in revised form
11 April 2011
Accepted 5 January 2012
Available online 31 January 2012
Keywords:
Web social networks
Hierarchical clustering
Blockmodeling
Structural equivalence
Optimization
abstract
Cluster analysis for web social networks becomes an important and challenging problem because of the
rapid development of the Internet community like YouTube, Facebook and TravelBlog. To accurately
partition web social networks, we propose a hierarchical clustering algorithm called HCUBE based on
blockmodeling which is particularly suitable for clustering networks with complex link relations.
HCUBE uses structural equivalence to compute the similarity among web pages and reduces a large and
incoherent netwo rk into a set of smaller comprehensible subnet works. HCUBE is actually a bottom-up
agglomerative hierarchical clustering algorithm which uses the inter-connectivity and the closeness of
clusters to group structurally equivalent pages in an effective fashion. In addition, we address the
preliminaries of the proposed blockmodeling and the theoretical foundations of HCUBE clustering
algorithm. In order to improv e the efficiency of HCUBE, we optimize it by reducing its time complexity
from Oð9V9
2
Þ to Oð9 V 9
2
=pÞ, where p is a constant representing the number of initial partitions. Finally,
we conduct experiments on real data and the results show that HCUBE is effective at partitioning web
social networks compared to the Chamel eon and k-means algorithms.
& 2012 Elsevier Ltd. All rights reserved.
1. Introduction
Recently, social network analysis (SNA) has been recognized as
a promising and effective technology for studying complex net-
works, especially for the Internet, which contains a large amount
of web pages that grow in the ‘‘TB’’ order of magnitude each day.
Because of the characteristic of dynamical structure on the Web,
there is an increasing trend that the Web is partitioned into
distinct subnetworks. We call such subnetwork as web social
network (WSN for short), since it is a social structure made of web
pages called ‘‘nodes’’ which are connected by one or more specific
types of interdependency, such as similar theme or content
related to friendship, kinship, financial exchange, dislike, relation-
ships of beliefs, knowledge or prestige.
SNA is an alternative solution to analyze patterns of relation-
ships and interactions between social actors in order to discover
an underlying social structure ( Breiger, 2004). It focuses practi-
cally on the relationships between social structures and semantic
configurations (Pattison, 1994) while the structure of the social
network constitutes. There are several SNA techniques that have
been applied to analysis the dynamical structure (Strogatz, 2001),
discover the essential players (Qiao et al., 2008; Xu and Chen, 2005;
Qiao et al., 2011), and predict the evolving trend of a social
network (Qiu et al., 2009). But there is rare work focusing
on to discover new emerging groups of similar objects in WSNs,
which can help electronic dealers or online sailers classify their
users by their interactions and interests in order to provide special
services for distinct categories of people. How to develop an
accurate and efficient clustering algorithm for partitioning WSNs
is a challenging problem due to the following characteristics in web
social networks:
The relations among pages in a WSN are very intricate, and the
links between pages could have distinct weights, directions
and signs. The traditional clustering algorithms including the
distance, hierarchical, density based approaches that cannot be
directly applied to partition the WSN containing a lot of pages
with complex interactions.
The network structure of the Web could change dynamically
over time. For example, on the WWW, pages or links are
created and disappeared every minute, which is difficult for
us to accurately partition the dynamically evolving network
structure.
Meta-complication (Strogatz, 2001): the various complications
can influence each other. For example, the present layout of a
power complex network depends on how it has grown over the
years—a case where network evolution. The pages could be
Contents lists available at SciVerse ScienceDirect
journal homepage: www.elsevier.com/locate/engappai
Engineering Applications of Artificial Intelligence
0952-1976/$ - see front matter & 2012 Elsevier Ltd. All rights reserved.
doi:10.1016/j.engappai.2012.01.003
$
This paper is an extended version of a communication given at FLINS 2010.
n
Corresponding author. Tel.: þ86 13551192302.
E-mail addresses: qiaoshaojie@gmail.com, sjqiao@swjtu.edu.cn (S. Qiao).
Engineering Applications of Artificial Intelligence 25 (2012) 640–647