Top-k Structure Holes Detection Algorithm in Social Network
Jiang Zhu, Lu Zhu, Chongming Bao, Lihua Zhou, Chongyun Wang, Bing Kong
Yunnan University
Kunming, China
e-mail: kongbing@ynu.edu.cn
Abstract—Structure holes were first proposed in relational
sociology, and then introduced into social network research.
The structure holes is the vertex set at the key positions in the
network, so the detection of such vertex is of great significance
to the control of disease spread network, the analysis of the
influence of social network, the discovery of the weak point of
the network security, the rapid promotion of the information
and so on. Aiming at the problem of Structure holes detection,
this paper proposes an algorithm of Top-k structure holes
discovery based on shortest path increment, mainly through
the analysis of the shortest path increment, the number of sub
connected components and the variance of the vertices in
connected component to determine the structure hole attribute
values of the vertices, then the vertices are sorted according to
this value and obtained the top-k vertices. The experiment uses
the real network and the LFR simulation complex network,
and compares the proposed algorithm with other congeneric
algorithms by using NDCG evaluation method and the SIR
communicable disease propagation diffusion model. The
experimental results show that the NDCG score of the
proposed algorithm is higher, and its diffusion effect in the SIR
model is obviously better than other methods.
Keywords- social network; graph shortest path incremental;
structure holes; information diffusion
I. INTRODUCTION
In recent years, the relationship between the network and
people's production and life is becoming more and more
closely. All kinds of network are turning into the developing
direction towards diversified, complex and massive. How to
quickly grasp the key vertices and obtain effective
information under such a background becomes the key to
further improve production efficiency and quality of life. For
example, The influence of key vertices on Internet public
opinion control, the analysis of user influence in social
network, the discovery of weaknesses in network security
and the rapid promotion of information, etc.
The concept of structure holes is first proposed by the
Burt [1], which explains and analyzes the key position of the
individual in the group. It is believed that in the social
structure, the individual in the key position will be able to
gain more competitive advantage. As shown in Fig. 1, a
simple example network, of which three dashed areas
represent three communities respectively, dark vertices
directly restrict the flow of information between
communities even the entire network, so they are regarded as
structure holes. But the real network is much more complex
than the example network. As the research goes deep, the
understanding of the structure holes is no longer limited to
the key vertices of the flow of information between the
communities. And the algorithms for detecting structure
holes are also increasing and improving. Some of them are
based on community detection [2], and these algorithms
needs to detect the community in advance, so it will become
complicated and lengthy in the calculation process, and the
quality of the structure hole is fully determined by the found
communities; some are optimized for centrality algorithms
and key sorting algorithms [3], these algorithms could reach
convergence and stability quickly, but the accuracy is
relatively weak, such as PageRank [4], Betweenness-
Centrality [5, 6], Closeness-Centrality [7, 8], etc.; machine
learning is also used to integrate multiple data to rank key
vertices [9, 10]. And there are many other algorithms for
different ideas.
Figure 1. Structure Holes.
Our contributions can be summarized as follows:
We are mainly considers from the structure of
network, and judge the attribute strength of vertex by
calculating the increment of shortest path. From the
principle of the algorithm, the proposed algorithm is
similar to the optimization algorithm of centrality
algorithm. And we put forward a new optimization
scheme. As far as we know, this is the first attempt
to use VAR and NCC instead of the maximum value
to describe and solve the unreachable shortest path.
Compared with the centrality algorithm, we inherit
its efficiency and improve the accuracy of the result.
Compared with the algorithm which based on
community detection, we just incorporate the
concept of community into the algorithm through
simple processing, thus avoiding the complexity of
the algorithm.
The results of SIR diffusion experiments on several
networks show that our algorithm is better than other
algorithms.