AUC ¼
H þ 0:5 E
N
ð2Þ
3.2.2. Precision
This metric considers N links with the highest S
xy
values in all
unknown links. If there are T existing yet unknown links in the
top N unknown links [5,8], the Precision is defined as:
Precision ¼
T
N
ð3Þ
4. Node-coupling clustering approaches
In this section, we present our approaches for link prediction.
Firstly, we present a new node-coupling degree metric – node-
coupling clustering coefficient. Then, we present the process of
our approaches. Finally, we give the complexity analysis of our
approaches.
4.1. Node-coupling clustering coefficient
Many similarity-based methods only consider the number or
degrees of common neighbor nodes of a predicted node-pair in link
prediction, and few exploit further the coupling degrees among the
common neighbor nodes and the clustering information to
improve the prediction accuracy. Based on the above reason, we
propose a new node-coupling degree metric – node-coupling clus-
tering coefficient (NCCC), which can capture the clustering infor-
mation of a network and evaluate the coupling degrees between
the common neighbor nodes of a predicted node-pair. It also con-
siders different roles of the common neighbor nodes of a predicted
node-pair in a network. Now, we introduce this metric through a
simple example.
Fig. 1 shows an example for predicting the link between nodes
M and N in two networks. Two original networks are described in
Fig. 1(a) and (b). Fig. 1(c) and (d) are two subgraphs that consist of
nodes M; N and their common neighbors in Fig. 1(a) and (b),
respectively. We aim to predict which link between nodes M and
N is more likely to exist in Fig. 1(a) and (b). In general, we find that
the coupling degrees of nodes M; N and their common neighbors
are higher in Fig. 1(c) than (d). Thus, we believe the link of nodes
M; N in Fig. 1(a) is more likely to exist than in Fig. 1(b). If we apply
CN; AA; RA; PA to predict the link of nodes M; N in these two orig-
inal networks, we can gain the same prediction result for each
method. The reasons are as follows: from Fig. 1(a) and (b), we
can see that the common neighbor set fbdf g of nodes M; N are
the same, and every corresponding common neighbor node has
the same degree value in these two original networks. The similar-
ity metric is the number of the common neighbor nodes of a pre-
dicted node-pair in CN. CN has the same prediction results
because of the same common neighbor node set fbdf g of nodes
M; N in these two original networks.
RA; AA are based on the
degree values of the common neighbor nodes. RA has the same
prediction probability as AA because of the same degree value of
every corresponding common neighbor node in fbdf g in these
two original networks. For the same reason, PA provides the same
prediction result because that there are the corresponding same
degree values for nodes M and N in these two original networks.
However, the link probabilities between node M and node N in
Fig. 1(a) and (b) are not likely to be the same.
In the above case, inspired by [10,19], we propose a new node-
coupling degree metric based on the clustering information and
node degree – node-coupling clustering coefficient. This metric
cannot only resolve the above prediction problem in Fig. 1, but also
capture the clustering information of a network. If node n is a com-
mon neighbor node of the predicted node-pair ðM; NÞ, the node-
coupling clustering coefficient of node n; NCCCðnÞ, can be defined
as follows:
NCCCðnÞ¼
P
i2C
n
1
d
i
þ CðiÞ
P
j2
C
ðnÞ
1
d
j
þ CðjÞ
ð4Þ
where CðnÞ is the neighbor node set of node n. ðM; NÞ denotes a pre-
dicted node-pair. n 2
CðMÞ\CðNÞ. C
n
denotes the common neigh-
bor node set of the node-pair ðM; NÞ in
CðnÞ, which includes
nodes M; N. Namely C
n
¼ CðMÞ\CðNÞ\CðnÞ[fM; Ng. d
i
denotes
the degree value of node i. CðiÞ denotes the clustering coefficient
of node i. In this metric,
1
d
i
þ CðiÞ is considered as the contribution
of node i to the coupling degree of the common neighbor nodes of
the predicted node-pair ðM; NÞ. The node-coupling clustering coeffi-
cient of node n is the ratio of the contribution sum of all nodes in C
n
to that in CðnÞ. In this way, our approaches can apply this metric
that incorporates the clustering information and different roles of
each related node to improve the prediction accuracy for link
prediction.
In Eq. (4), since C
n
# CðnÞ,
P
i2C
n
1
d
i
þ CðiÞ
6
P
j2
C
ðnÞ
1
d
j
þ Cð jÞ
.
As a result, NCCCðnÞ2ð0; 1. Specially, NCCCðnÞ¼1 when
C
n
¼ CðnÞ.
4.2. Node-coupling clustering approach based on probability theory
(NCCPT)
From probability theory, we propose a new link prediction
approach based on the node-coupling clustering coefficient (NCCC)
in Section 4.1. Given a pair of predicted nodes ðx; yÞ, node n is a com-
mon neighbor node of the node-pair ðx; yÞ. NCCCðnÞ can be consid-
ered as the contribution of node n to the connecting probability for
the predicted node-pair ðx; yÞ. PðnÞ denotes the link existence
probability that node x and node y connect because of node n.
PðnÞ
denotes the link non-existence probability that node n
connects node x to node y. Therefore, PðnÞ¼NCCCðnÞ and
PðnÞ¼1 NCCCðnÞ. fA
1
; A
2
; ...; A
i
; ...; A
m
g is the common neighbor
set of the predicted node-pair ðx; yÞ, namely
CðxÞ\CðyÞ¼
fA
1
; A
2
; ...; A
i
; ...; A
m
g. We assume that these common neighbor
nodes of the node-pair ðx; yÞ are independent to each other. If there
Fig. 1. An example for predicting the link between nodes M and N in two original networks.
F. Li et al. / Knowledge-Based Systems 89 (2015) 669–680
671