Proceedings of CCIS2014
Link Prediction in Sina Microblog using Comprehensive
Features and Improved SVM Algorithm
Yun Li, Kai Niu, Baoyu Tian
Key Laboratory of Universal Wireless Communications,
Beijing University of Posts and Telecommunications,
Beijing, 100876, P. R. China
liyun_bupt@126.com
Abstract: Sina Microblog has become one of the most
popular social networks in recent years. As a result,
many interdisciplinary research directions of traditional
social network have been conducted to it. But the link
prediction problem in Sina Microblog has not drawn
much attention till now. In this paper, we conduct a
research of link prediction in Sina Microblog. According
to the characteristics of Sina Microblog, we propose an
effective and comprehensive feature set for link
prediction in Sina Microblog. Then we apply fast
classification algorithm for polynomial kernel support
vector machines (FCPKSVM) to train our classifier and
by transferring most of calculation from prediction
phase to training phase, time complexity in prediction
phase is greatly reduced. We show that a machine
learning classifier trained using the proposed feature set
can obtain comparable and good prediction performance
for link prediction in Sina Microblog, and by
introducing FCPKSVM, our method achieves far less
time complexity in prediction phase compared with
other classical classifiers.
Keywords: link prediction, Sina Microblog,
FCPKSVM
1 Introduction
Sina Microblog is the earliest and biggest microblogging
service in China having over 100 million active users
and has attracted more and more attention in recent
years. Much work has been done to examine the
structural and behavioral properties of Sina Microblog,
but few efforts have been made to solve the link
prediction problem in Sina Microblog.
The work of M. A. Hasan and M. J. Zaki [1] shows the
link prediction problem is dominated by topological
studies of the graphs used to represent social networks.
To obtain the graph of a social network, a user is
represented by a node and the social relationship
between two users is represented by a link. These graphs
change over time with users’ interaction in social
networks. Understanding the dynamics of these graphs
can start with the analysis of how the association
between arbitrary two nodes evolves, namely the link
prediction problem.
Early research about link prediction was mainly made by
computer science. R. R. Sarukkai [2] tried to use
Markov chains to conduct link prediction and path
analysis in computer networks. Then J. Zhu et al. [3]
applied link prediction based on Markov chains to
adaptive web sites. For its unsatisfying prediction
performance, the method using Markov chains didn’t get
widely used in link prediction of social network.
Then for fulfilling the requirement of practical
applications, research of link prediction spreads to
various domains including social network analysis. Link
prediction was used to find interaction between proteins
in bioinformatics [4], to help build recommendation
systems in e-commerce [5] and to identify hidden links
in social network [6].
Supervised machine learning is an effective method to
solve the link prediction problem. This method was first
used by Liben-Nowell and Kleinberg [7], then it was
extended constantly [6, 8, 9, 10, 11] and achieved very
good prediction performance on most of the datasets.
There are many dimensions of features to describe link
between two users [6], but most of the work using
supervised learning only used topological features for
they can equally apply to all domains, and commonly
had high time complexity in prediction phase, which
restricted their real-time application in huge social
networks.
In this paper, we propose an effective solution to the link
prediction problem in Sina Microblog based on
FCPKSVM trained on a feature set that not only
considers topological features, but also includes features
extracted from the microblogs users have issued and
users’ attributes. The proposed method is compared with
several classical classification algorithms. In addition,
we show the different effectiveness of the features in our
feature set using information gain attribute.
The rest of the paper is organized as follows: In Section
2 we give a detailed description of Sina Microblog and
the dataset we collected. Section 3 introduces each
feature we adopted in our feature set. Section 4 presents
our experiment setup, the effectiveness and efficiency
performance of our selected algorithms and the
information gain value of each feature to show the
different contribution of them. Section 5 gives our
conclusion.
2 Dataset of Sina Microblog