Yan-Tao Jia et al.: Learning to Predict Links by Structure and Interaction Information 831
full advantage of the property of Twitter as a social
media (see [14]) or an information diffusion channel.
One Twitter user A can address tweets of user B, and
then mentions B obliquely in his or her tweets, which
is syntaxed as “RT @B”. Another common practice is
that A “retweets” or rebroa dc asts B’s message, which
is syntaxed by @B. For a tweet messag e, the behavior-
based methods extract the usernames after the symbol
@, and consider that A and B have an interaction rela-
tionship. Hopcroft et al.
[9]
considered these interaction
relationships and defined four features to represent the
number of retweets or replies fr om user A to user B
and from user B to user A, respectively. By integrat-
ing other feature s, they propose d a supervised method,
i.e., the Triad Factor Graph model, to predict the reci-
procity link. Similar work can refer to that by Lou et
al.
[10]
Our model is also behavior-ba sed. The difference
is that we integrate the structure and the interaction
behavior into a simpler matrix factorization framework.
As for the matrix factorization method used in the
link prediction problem, it is motivated by the success-
ful application of matrix factorization used in recom-
mender systems, where the model a ims to find latent
features for users and items by factoriz ing the observed
matrix, see [15-17]. Converting the user-item pair to
the use r-user pair leads to the link prediction problem
as a link recommendation problem. Related work can
be fo und in the work o f Menon and Elkan
[18]
and Yin
et al.
[11]
Their models learned the latent features just
from the topological structures of the network. For ex-
ample, Yin et al.
[11]
analyzed the role of the interme-
diate user between two users, and divided its contri-
bution into two parts: one is the recommendation of
the intermediate user, and the other is the accepta nc e
of the recommendation of the intermediate user. Very
recently, Zhang et al.
[19]
enhanced Yin et al.’s work to
find the real intermediate users and studied how they
contribute to the link formation process. To better pre-
dict new links in time-evolving social networks, Gao et
al.
[20]
integrated three types of information: the global
network structure, the content of nodes in the network
and the local information of a given vertex to derive
a matrix factorization model. Similar work by using
the matrix factorization method or the tensor factori-
zation method can refer to [8, 21 -22], etc. However,
these methods lack the consideration of the impact of
interactions between users on the link prediction. Our
work mixes the interaction information betwe e n user s
with the structure of the network.
3 Test the Existence of the Gap by Experiment
In this section, we first examine the different per-
formances of the S-Model by Yin et al.
[11]
on datasets
with different sparseness and get its best predictive per-
formance, i.e., the maximum F 1-measure obtained by
S-Model. E xp eriments show that this maximum F 1-
measure does not ta ke its theoretical maximum 1. This
leads to a hypothesis that ther e exists a gap between
the predictive performa nc e and the ground truth for
S-Model. To narrow this gap, we propose the idea to
use the interaction information between users in the
dataset.
Before reconstructing the experiment of Yin et
al.
[11]
, let us simply recall S-Model as follows. The idea
of S-Model is to pre dic t new follower v
i
(called the tar-
get user) of the source user v
u
via the contributions of
some intermediate user v
k
. The contributions of v
k
can
be divided into two parts: one is the recommendation
of v
i
to v
u
, and the other is v
u
’s acceptance of the reco-
mmendation of v
k
for v
i
. Then S-Model studies the in-
fluence of the network structures on v
k
’s contributions
by introducing the structure similarity between users.
After using the matrix factorization technique, S-Model
can predict new link formation for one static network
as well as two snapshots of the network in Twitter. To
find the best performance of S-Model, we conduct the
experiment on the static dataset with different sparse-
ness. Here the sparseness, denoted by nf, means the
average number of non-followers for a number of users.
We tune the sparseness of the dataset recursively from
one original dataset by randomly converting some num-
ber of follower s to non-followers. In other words, if we
construct a rating-like matrix with the row correspond-
ing to the source users and the column corresponding
to the target users, denoted by R
n×m
= (r
ui
), where n
is the number of source users and m is the number of
target users, r
ui
= 1 if v
u
follows v
i
and r
ui
= 0 other-
wise, the tuning process is to randomly replace some
number of 1’s for each row with 0 respectively. The ini-
tial da taset corresponds to the matrix with all elements
being 1 except the diagonal elements, with the sparse-
ness nf = 1. It is easy to see that 1 6 nf 6 m −1. The
experiment is carried out by fixing both the smoothing
factor and the structural factor being 0.01 in S-Model
and setting m = 10 000 to find the relation between the
F 1-measure and the value nf. We depict the relation
for nf = 1, . . . , 31 as follows, since for the rest part, the
tendency of the curves is similar.
From Fig.1, we can see that the maximum F 1-
measure obtained by S-Model is 0.007 when nf = 29.