UEDA-MAlign：一种基于划分匹配的保守蛋白质复合物检测方法

63 浏览量更新于2024-07-15 收藏 1.29MB PDF 举报

"这篇研究论文探讨了一种新的方法，用于检测跨物种保守的蛋白质复合物，该方法基于划分匹配算法和不等宽的标准网络比较。通过对不同物种间的蛋白质-蛋白质相互作用（PPI）数据进行局部对齐，可以识别出共同的子网络，这有助于我们研究生物进化。UEDA-MAlign是提出的新方法，它解决了在严格相似拓扑条件下难以找到公共子网络的问题，并考虑了输入网络之间的差异，采用了不等宽的宽松标准。" 在生物学领域，蛋白质复合物是由多个蛋白质分子组成的结构，它们协同工作执行特定的生物学功能。随着不同物种的蛋白质-蛋白质相互作用数据的增多，科学家们有机会通过比较这些数据来寻找跨物种共有的蛋白质复合物，从而深入理解生物的进化过程。传统的局部对齐算法通常在蛋白质序列水平和网络结构水平上比较不同物种的PPI网络。然而，由于计算复杂性和生物学上的变异，很难找到拓扑结构完全一致的公共子网络。因此，一些研究方法引入了较为宽松的相似性标准。但这些方法的一个缺点是，它们对两个输入的PPI网络应用了同等程度的宽松标准，没有充分考虑到不同网络可能存在的差异。论文中提出的UEDA-MAlign方法解决了这一问题。它采用了一种划分匹配策略，首先将PPI网络划分为较小的模块，然后在这些模块之间进行匹配。关键创新在于采用了不等宽的标准，这意味着对于不同的网络部分，可以根据其特性和复杂性应用不同程度的宽松度。这种方法更灵活，能够更好地适应网络结构的变化，从而提高识别保守蛋白质复合物的准确性。通过这种策略，UEDA-MAlign能够更好地捕捉到在不同物种间保守的蛋白质复合物，即使它们的网络结构存在一定的变异。这种方法的应用不仅有助于揭示生物进化中的保守机制，还能为药物设计和疾病治疗提供潜在的靶点，因为蛋白质复合物的异常往往与多种疾病的发生有关。这篇研究论文提出了一个创新的算法，它利用了划分匹配和不等宽的网络比较标准，提高了在多物种PPI网络中识别保守蛋白质复合物的能力，为生物网络分析和进化生物学的研究提供了新的工具和思路。

Page 4 of 17

Peng

et al. Algorithms Mol Biol (2015) 10:21

share functions not only with their direct neighbors but

also with their indirect neighbors, and even with their

level k neighbors, some potential mappings between

proteins of two species can be inferred from their

direct, indirect or level k neighbors. Furthermore, the

level of neighbors with which a protein tend to share

functions varies with species due to the structural and

topological diﬀerence of their PPI networks. Hence,

we should infer potential protein–protein mappings

from unequal level of neighbors for diﬀerent species.

In this work, we adopt an unbalanced Bi-random walk

algorithm to ﬁnd potential mapping between proteins

of two species. is method has also been used in our

previous study [35] that gets protein-function asso

ciations by walking diﬀerent number of steps in PPI

network and functional interrelationship network. To

formally deﬁne our method, some variables are intro

duced in advance.

Let P(N*N) and H(M*M) be adjacent matrixes of two

input PPI networks respectively. P(N*N) is row-normal

ized and H(M*M) is column-normalized. e element

p(i, j) of matrix P(N*N) and h(i, j) of matrix H(M*M) is

deﬁned as follows.

where degree(i) denotes sum of interactions of node i .

Let matrix A(N*M) represent known protein–pro

tein mappings measured by sequence-based similari-

ties. Its element a(i,j) is 1, if there exists an mapping

between protein i of one species and protein j of the

other one, 0 otherwise. R(N*M) denotes the final

protein–protein mappings. The value of its element

r(i,j) represents the probability that protein i will be

mapped to protein j.

Given matrix P, H and A, we want to calculate matrix R.

Since proteins and their level k neighbors in one PPI net

work may map to the same proteins in the counterpart

network, several random walk steps are taken on the two

PPI networks, respectively. At each walking step, multi

plying P on the left and H on the right respectively can

detect some potential protein–protein mappings (Eqs.3,

4). en the weighted average of the multiply results

updates matrix R (Eq.5). Consider the diﬀerence of the

two input networks, the level of neighbors from which

the proteins infer mapping information should be dif

ferent. To address this problem, two parameters (l and r)

are adopted to control maximal iteration steps in the two

networks. Mathematically, the process can be expressed

as Algorithm 1.

(1)

p(i, j) =



degree(i)

if degree(i)>0

otherwise



(2)

(i, j) =



degree(j)

if degree(j)>0

otherwise



where t (=1, 2,

...

) represents the walking steps. Matrix

A storing known protein–protein mappings can regu

late the iteration process. e parameter

<α<

1) is

used to adjust the weight of regulation of network and

of prior knowledge stored in Matrix A (in this work,

is set to 0.5). 



or 



are indicators which are 1 if the

number of walk steps on PPI network One or Two are

less than their thresholds (l or r), respectively, 0 oth

erwise. ISORank [11] adopts similar strategy to obtain

potential mappings between proteins of two diﬀerent PPI

networks and computes their global network alignment.

In ISORank, however, random walks are taken simulta

neously on the two networks until the global networks.

Actually, ISORank treats the two networks equally. How

ever, Our work separately takes random walks on two

networks, which walks only several steps (t is set to 1,

...

) and is convenient for controlling diﬀerent walk-

ing steps taken on the two networks according to their

diﬀerence in topology and structure. Consequently, our

method is more ﬂexible to get protein–protein mappings

between two PPI networks.

Detecting conserved protein complexes fromPPI networks

e basic idea of UEDAMAlign is ﬁrst dividing PPI net-

works into small subnetworks and then mapping pro-

teins of subnetworks to the other PPI network. Many

computational methods, such as Coach[36], MCL [37,

38], CMC [39], CFinder [40] and so on, have been pro

posed to detect protein complexes form a single PPI

network and achieve good performance. Moreover,

biological experiments have been implemented on sev

eral species and the data of known protein complexes

is available. Consequently those known protein com

plexes or those predicted by computational methods

can be conveniently used as partition of a PPI network.

e main challenge of UEDAMAlign lies in mapping

proteins in subnetworks of a PPI network to the other

one in order to ﬁnd common connected components. In

the course of ﬁnding common connected components,

Algorithm1Finding potentialmappings

1: Input:Matrix P ,H,A parameter α,iterationsteps l , r;

2: Output:predicted association matrix R ;

3: R

= A =

sum(A)

4: for (t =1to max(l , r)) do

5: λ

= λ

=0;

6: if ( t<l) then

7: R

= αP ∗ R

t−1

+(1 − α)A (3) //PPInetwork One

8: λ

9: end if

: if (t<r) then

: R

= αR

t−1

∗ H +(1 − α)A (4) //PPInetwork Two

: λ

: end if

: R

=(λ

∗ R

+ λ

∗ R

)/(λ

+ λ

) (5) //Mergetwo results

: end for

: return R

剩余16页未读，继续阅读

weixin_38673738

粉丝: 2
资源: 914

UEDA-MAlign：一种基于划分匹配的保守蛋白质复合物检测方法

基于字符串模式匹配算法的病毒感染检测问题_算法_数据结构_

多模式匹配算法和正向匹配器算法、逆向匹配算法、双向最长匹配算法的区别

MATLAB目标匹配算法

怎么实现在特征匹配算法 SIFT 中对指定范围进行检测并匹配

基于工业检测的图像模板匹配算法

使用orb检测算法和BF匹配或者FLANN匹配算法 把一张照片和文件夹里面的照片相匹配 找出和该一样的照片 并且输出图像名称

模式匹配算法和模糊匹配算法

使用模板匹配算法写一段条码检测代码

使用SIFT算法进行特征匹配

图像匹配算法 python

最新资源

使用orb检测算法和BF匹配或者FLANN匹配算法把一张照片和文件夹里面的照片相匹配找出和该一样的照片并且输出图像名称