1736 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 7, JULY 2014
the effectiveness and efficiency of the proposed method in
Section 5. The paper is concluded in Section 6.
2RELATED WORK
As revealed in [2], [3], [32], although there are a large
number of algorithms for discovering static communities,
the study of detecting dynamic communities is still in
its infancy. One of the earliest studies was conducted by
Hopcroft et al. [9], which detects evolving communities via
agglomerative hierarchical clustering. Backstrom et al. con-
ducted a case study on two large sources of data [10], which
addresses the questions concerning community member-
ship, growth and evolution. In [4], the dynamic relationship
between vertices and communities was explored, where
communities are found by the MCL method [33]. The tem-
poral smoothness framework trading off the history quality
with the snapshot quality has been investigated in [5],
[6], [8]. Based on the latent space model, Fu et al. [34]
proposed a dynamic mixed membership stochastic block
model (dMMSB) to track the evolving roles of the actors
across timestamps. Similarly, a dynamic stochastic block
model was developed for modeling communities and their
evolution in a unified probabilistic framework, where a
Bayesian treatment is presented for parameter estima-
tion [32]. Focusing on identifying communities in a multi-
mode network, where both actor membership and inter-
actions can evolve, an evolutionary multi-mode clustering
was proposed by Tang et al. using the temporal informa-
tion [7], [35]. Apart from the aforementioned approaches,
evolving communities can also be discovered by means of
information compression, e.g., [11]. Readers can refer to [2],
[3] for an overview of community detection in evolving
networks.
In the case of static graphs, some recent works have
shown significant improvements achieved by integrating
node/edge content and linkage structure in community
detection [17]–[21]. A discriminative model was proposed
in [18] to combine linkage and content analysis for commu-
nity detection, where a conditional model and a discrimi-
native model were respectively used for linkage analysis
and node content analysis. In [20], an edge-induced matrix
factorization (EIMF) approach was used to integrate link-
age structure and edge content for community detection.
Liu et al. [17] developed a Topic-Link LDA model, which
combines the topic similarity (edge content similarity) and
linkage structure to jointly model topics and author com-
munity. Zhou et al. [19] proposed to integrate the structural
and attribute similarities into a unified framework through
graph augmentation, so as to consider both linkage struc-
ture and node attribute. Sachan et al. [21] addressed the
problem of discovering topically meaningful communities
from social networks by combining three types of informa-
tion, namely, discussed topics, graph topology and nature
of user interactions, whereby generative Bayesian models
were introduced for extracting latent communities.
Apart from that, some works on supervised node classi-
fication have also validated the effectiveness of combining
linkage structure and content information in the supervised
learning scenario [22], [23]. However, previous works are
either limited to static graphs [17]–[21] or proposed for
supervised learning [22], [23]. Additionally, it is difficult
to extend these methods for discovering dynamic com-
munities in content-based networks in an unsupervised
manner. For instance, to extend [20], a full recomputa-
tion from scratch is required at each timestamp, which is
time-consuming.
Random walk has been widely used in various research
fields, e.g. [26], [27], [36]–[38]. In [26], an information the-
oretic approach was proposed, which uses the probability
flow of random walks in a network as a proxy for informa-
tion flows. Community structure can then be discovered by
compressing the description of the probability flow. In [27],
a new distance metric was proposed between vertices and
between sets of vertices to quantify the structural similari-
ties using random walks. Then a hierarchical agglomerative
clustering algorithm is performed using the distance metric
between vertices and between sets of vertices to find com-
munity structures at different scales. Finally the community
structure containing the required number of communities
is output. These two approaches are batch processing in
nature and hence do not meet the efficiency requirement
of the on-line dynamic detection of communities. Similarly,
based on the Markov-chain model of random walk, Fouss
et al. computed quantities such as average commute time
as the similarity between vertices, which are used in collab-
orative recommendation [37]. In [38], a random walk based
clustering algorithm was developed, which is based on the
asymmetric pairwise similarity measure of random walk
hitting time and K-destinations.
3 NEI NETWORK TRANSFORMATION
To combine linkage structure, node content and edge con-
tent for dynamic community detection, one needs to design
some structure for simultaneously representing link-based
similarity and content-based similarity. To this end, we pro-
pose a Node-Edge Interaction (NEI) network, into which
the content-based network is transformed. In this section,
we will first describe the construction of the NEI network
from a content-based network. Then we introduce how
to update the NEI network as the content-based network
evolves.
3.1 NEI Network Construction
For notational simplicity, we ignore the time-subscript in
this subsection. Assume that we are given a content-based
network containing a node set
N ={v
1
, v
2
,...,v
m
} and an
undirected edge set
E ={e
1
, e
2
,...,e
n
}. There are contents
associated with each node v
i
and each edge e
p
. To inte-
grate linkage structure, node content and edge content, this
content-based network is transformed into a Node-Edge
Interaction (NEI) network, which is a multi-mode network
comprising two types of nodes and three types of edges, as
shown in Fig. 2. The two types of nodes correspond to
N
and E, which are called n-nodes and e-nodes respectively.
The three types of edges are defined as follows:
1) Each n-node v
i
and each e-node e
p
are connected
if the edge e
p
is incident upon the node v
i
in the
original network, i.e., ∃v
j
∈ N s.t. e
p
= (v
i
, v
j
).Inthe
NEI network shown in Fig. 2, such edge is indicated