Page 4 of 17
Peng
et al. Algorithms Mol Biol (2015) 10:21
share functions not only with their direct neighbors but
also with their indirect neighbors, and even with their
level k neighbors, some potential mappings between
proteins of two species can be inferred from their
direct, indirect or level k neighbors. Furthermore, the
level of neighbors with which a protein tend to share
functions varies with species due to the structural and
topological difference of their PPI networks. Hence,
we should infer potential protein–protein mappings
from unequal level of neighbors for different species.
In this work, we adopt an unbalanced Bi-random walk
algorithm to find potential mapping between proteins
of two species. is method has also been used in our
previous study [35] that gets protein-function asso
-
ciations by walking different number of steps in PPI
network and functional interrelationship network. To
formally define our method, some variables are intro
-
duced in advance.
Let P(N*N) and H(M*M) be adjacent matrixes of two
input PPI networks respectively. P(N*N) is row-normal
-
ized and H(M*M) is column-normalized. e element
p(i, j) of matrix P(N*N) and h(i, j) of matrix H(M*M) is
defined as follows.
where degree(i) denotes sum of interactions of node i .
Let matrix A(N*M) represent known protein–pro
-
tein mappings measured by sequence-based similari-
ties. Its element a(i,j) is 1, if there exists an mapping
between protein i of one species and protein j of the
other one, 0 otherwise. R(N*M) denotes the final
protein–protein mappings. The value of its element
r(i,j) represents the probability that protein i will be
mapped to protein j.
Given matrix P, H and A, we want to calculate matrix R.
Since proteins and their level k neighbors in one PPI net
-
work may map to the same proteins in the counterpart
network, several random walk steps are taken on the two
PPI networks, respectively. At each walking step, multi
-
plying P on the left and H on the right respectively can
detect some potential protein–protein mappings (Eqs.3,
4). en the weighted average of the multiply results
updates matrix R (Eq.5). Consider the difference of the
two input networks, the level of neighbors from which
the proteins infer mapping information should be dif
-
ferent. To address this problem, two parameters (l and r)
are adopted to control maximal iteration steps in the two
networks. Mathematically, the process can be expressed
as Algorithm 1.
(1)
p(i, j) =
=
degree(i)
if degree(i)>0
0
(2)
(i, j) =
=
degree(j)
if degree(j)>0
0
where t (=1, 2,
) represents the walking steps. Matrix
A storing known protein–protein mappings can regu
-
late the iteration process. e parameter
(0
1) is
used to adjust the weight of regulation of network and
of prior knowledge stored in Matrix A (in this work,
is set to 0.5).
or
are indicators which are 1 if the
number of walk steps on PPI network One or Two are
less than their thresholds (l or r), respectively, 0 oth
-
erwise. ISORank [11] adopts similar strategy to obtain
potential mappings between proteins of two different PPI
networks and computes their global network alignment.
In ISORank, however, random walks are taken simulta
-
neously on the two networks until the global networks.
Actually, ISORank treats the two networks equally. How
-
ever, Our work separately takes random walks on two
networks, which walks only several steps (t is set to 1,
2,
) and is convenient for controlling different walk-
ing steps taken on the two networks according to their
difference in topology and structure. Consequently, our
method is more flexible to get protein–protein mappings
between two PPI networks.
Detecting conserved protein complexes fromPPI networks
e basic idea of UEDAMAlign is first dividing PPI net-
works into small subnetworks and then mapping pro-
teins of subnetworks to the other PPI network. Many
computational methods, such as Coach[36], MCL [37,
38], CMC [39], CFinder [40] and so on, have been pro
-
posed to detect protein complexes form a single PPI
network and achieve good performance. Moreover,
biological experiments have been implemented on sev
-
eral species and the data of known protein complexes
is available. Consequently those known protein com
-
plexes or those predicted by computational methods
can be conveniently used as partition of a PPI network.
e main challenge of UEDAMAlign lies in mapping
proteins in subnetworks of a PPI network to the other
one in order to find common connected components. In
the course of finding common connected components,
Algorithm1Finding potentialmappings
1: Input:Matrix P ,H,A parameter α,iterationsteps l , r;
2: Output:predicted association matrix R ;
3: R
0
= A =
A
sum(A)
4: for (t =1to max(l , r)) do
5: λ
p
= λ
h
=0;
6: if ( t<l) then
7: R
t
p
= αP ∗ R
t−1
+(1 − α)A (3) //PPInetwork One
8: λ
p
=1
9: end if
: if (t<r) then
: R
t
h
= αR
t−1
∗ H +(1 − α)A (4) //PPInetwork Two
: λ
h
=1
: end if
: R
t
=(λ
p
∗ R
t
p
+ λ
h
∗ R
t
h
)/(λ
p
+ λ
h
) (5) //Mergetwo results
: end for
: return R