yeast与Gneg细菌蛋白定位：基于图随机游走的预测方法

196 浏览量更新于2024-08-29 收藏 384KB PDF 举报

本文主要探讨了在生物信息学领域中，利用随机游走算法进行蛋白质定位预测的一种新颖方法。"使用图上的随机游走进行蛋白质定位预测"（Protein localization prediction using random walks on graphs）这篇研究论文发表于2012年的国际智能计算大会（International Conference on Intelligent Computing, ICIC2012），地点位于中国黄山，时间是7月25日至29日。背景：蛋白质在细胞中的定位对于理解其功能及其可能的相互作用至关重要。在蛋白质分类中，确定一个蛋白质所在的亚细胞结构（如细胞核、线粒体等）是一个关键任务。然而，由于标注数据有限，如何有效地预测蛋白质的定位标签成为一个具有挑战性的问题。传统的序列分析和功能预测方法中，基于图的表示和随机游走技术已经展现出良好的性能，但这一方法尚未应用于蛋白质定位。研究方法：作者提出了一个基于图的理论模型，通过将蛋白质数据转化为图形结构，引入随机游走策略来进行定位预测。他们使用酵母（yeast）和革兰氏阴性（Gneg）细菌的数据集来构建和验证他们的模型。这种方法旨在捕捉蛋白质之间在功能和空间上的联系，利用随机游走的扩散性质来推断未标记蛋白可能的亚细胞位置。实验结果：论文展示了使用这种随机游走方法在蛋白质定位预测方面的初步效果。尽管面临着数据稀疏和复杂性的挑战，但作者的模型能够在有限的已标注数据上实现了一定程度的准确性和稳定性。通过比较与传统方法的性能，他们证明了在蛋白质定位预测任务中，随机游走算法展现出了独特的优势。总结：这项工作不仅为蛋白质定位问题提供了一个新的解决途径，还表明了图上的随机游走方法具有广泛的应用潜力，特别是在生物信息学领域，尤其是在数据标注不足的情况下。未来的研究可能进一步优化模型，提高预测精度，并探索在其他类型的生物数据中应用这一技术的可能性。

PROCEEDINGS Open Access

Protein localization prediction using random

walks on graphs

Xiaohua Xu

*†

, Lin Lu

†

, Ping He, Ling Chen

From The 2012 International Conference on Intelligent Computing (ICIC 2012)

Huangshan, China. 25-29 July 2012

Abstract

Background: Understanding the localization of proteins in cells is vital to characterizing their function s and

possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located

becomes an important problem in protein classification. This classi fication issue thus involves predicting labels in a

dataset with a limited number of labeled data points available. By utilizing a graph representation of protein data,

random walk techniques have performed well in sequence classification and functional prediction; however, this

method has not yet been applied to protein localization. Accordingly, we prop ose a novel classifier in the site

prediction of proteins based on random walks on a graph.

Results: We propose a graph theory model for predicting protein localization using data generated in yeast and

gram-negative (Gneg) bacteria. We tested the performance of our classifier on the two datasets, optimizing the

model training parameters by varying the laziness values and the number of steps taken during the random walk.

Using 10-fold cross-validation, we achie ved an accuracy of above 61% for yeast data and about 93% for gram-

negative bacteria.

Conclusions: This study pre sents a new classifier derived from the random walk technique and applies this

classifier to investigate the cellular localization of proteins. The prediction accuracy and additional validation

demonstrate an improvement over previous methods, such as support vector machine (SVM)-based classifiers.

Background

Protein localization is a general a term that refers to the

study of where proteins are located within the cell. In

many cases, proteins cannot perform their designated

function until they are tran sported to the proper location

at the appropriate time. Improper localization of proteins

can exert a significant impact on cellular processes or on

the entire organism. Therefore, a central issue for biolo-

gists is to predict the (sub)cellular localization of proteins

[1-3], which has implications for the functions and interac-

tions [4,5] of proteins.

With the deve lopment of new approaches in c omputer

science, coupled with an improved dataset of proteins

with known localizatio n, computational tools can now

provide fast and accurate localization predictions for

many organisms as an alternative to laboratory-based

methods. Therefore, many studies have begun to address

this issue. To predict the cellular localization of proteins,

soon after their proposal of a probabilistic classification

system to identify 336 E.coli proteins and the 1484 yeast

proteins [6], Paul Horton and Kenta Nakai [7] also

compared their s pecifically designed probabilistic model

with three other classifiers on the same datasets: the

k-nearest-neighbor (kNN) classifier, the binary decision

tree classifier, and the naive Bayes classifier. The resulting

accuracy using stratified cross-validation showed that the

kNN classifier performed better than the other methods,

with an accuracy of approximately 60% for 10 yeast

classes and 86% for 8 E. coli classes.

Feng [8] presented an o verview about the prediction of

protein subcellular localization, and in 2004, Donnes and

Hoglund [9] introduced past and cur rent work on this

* Correspondence: arterx@gmail.com

† Contributed equally

Department of Computer Science, Yangzhou University, Yangzhou 225009,

China

Xu et al. BMC Bioinformatics 2013, 14(Suppl 8):S4

http://www.biomedcentral.com/1471-2105/14/S8/S4

Attribution License (http://creativecommons .org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in

any medium, provided the original work is properl y cited.

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38582719

粉丝: 11
资源: 952

yeast与Gneg细菌蛋白定位：基于图随机游走的预测方法

基于人工神经网络和随机游走模型的汇率预测.pdf

重启随机游走算法

NewGOA：通过混合图上的双随机游走预测蛋白质的新GO注释

NewGOA：利用混合图双随机游走预测蛋白质功能的新方法

利用PageRank（重启随机游走）预测蛋白质相互作用.zip

pf-localization：使用粒子过滤器（和随机游走模型）进行定位

模拟随机游走_随机游走模拟_随机游走_python_

随机游走

随机游走matlab代码-Twoscale-Fusion:使用基于增强随机游走算法的多焦点图像融合和两尺度焦点图

rwr.zip_rwr算法_节点随机_随机游走 matlab_随机游走rwr_随机游走算法

最新资源