深度学习加速RNA结合蛋白预测：Deep-RBPPred

5星 · 超过95%的资源需积分: 50 19 浏览量更新于2024-09-07 收藏 1.42MB PDF 举报

深度学习预测RBP的研究着重于利用深度学习技术改进RNA结合蛋白质（RBP）的预测能力。本文发表在《科学报告》(Scientific Reports, 2018)上，DOI:10.1038/s41598-018-33654-x，该研究介绍了一种名为Deep-RBPPred的新模型。与之前的RBP预测器RBPPred相比，Deep-RBPPred在处理RBP识别任务时具有显著的优势。首先，传统方法如RBPPred需要生成Position-Specific Scoring Matrix (PSSM)作为特征，这个过程较为耗时。而Deep-RBPPred通过结合RBPPred的蛋白特性，并利用卷积神经网络（CNN）这一强大的深度学习架构，减少了对物理化学属性的依赖，仅基于蛋白质序列就能进行预测。这大大提高了预测的效率。其次，与传统的计算密集型方法不同，Deep-RBPPred能够实现更快的预测速度，这对于大规模的蛋白质组学研究尤为重要，因为时间效率直接影响到实际应用的可行性。再者，作者采用平衡训练集和不平衡训练集两种策略，分别训练了Deep-RBPPred-balance和Deep-RBPPred-imbalance模型，以评估模型在不同数据分布情况下的性能。结果显示，即使在不平衡的数据集中，Deep-RBPPred也表现出了良好的泛化能力，这意味着它能有效地应对实际数据中的多样性，避免过拟合。 Deep-RBPPred不仅简化了特征工程，提升了预测效率，还展示了出色的预测准确性和广泛的适用性。这项工作对于生物信息学领域，特别是RBP研究来说，是一次重要的技术革新，有助于推动后续的RBP识别和功能分析研究的发展。随着深度学习技术在生物信息学领域的深入应用，我们可以期待更多高效、准确的预测工具的出现。

Scientific REPORTS | (2018) 8:15264 | DOI:10.1038/s41598-018-33654-x

www.nature.com/scientificreports

Deep-RBPPred: Predicting RNA

binding proteins in the proteome

scale based on deep learning

Jinfang Zheng, Xiaoli Zhang, Xunyi Zhao, Xiaoxue Tong, Xu Hong, Juan Xie & Shiyong Liu

RNA binding protein (RBP) plays an important role in cellular processes. Identifying RBPs by

computation and experiment are both essential. Recently, an RBP predictor, RBPPred, is proposed in

our group to predict RBPs. However, RBPPred is too slow for that it needs to generate PSSM matrix as

its feature. Herein, based on the protein feature of RBPPred and Convolutional Neural Network (CNN),

we develop a deep learning model called Deep-RBPPred. With the balance and imbalance training

set, we obtain Deep-RBPPred-balance and Deep-RBPPred-imbalance models. Deep-RBPPred has

three advantages comparing to previous methods. (1) Deep-RBPPred only needs few physicochemical

properties based on protein sequences. (2) Deep-RBPPred runs much faster. (3) Deep-RBPPred has

a good generalization ability. In the meantime, Deep-RBPPred is still as good as the state-of-the-art

method. Testing in A. thaliana, S. cerevisiae and H. sapiens proteomes, MCC values are 0.82 (0.82),

0.65 (0.69) and 0.85 (0.80) for balance model (imbalance model) when the score cuto is set to 0.5,

respectively. In the same testing dataset, dierent machine learning algorithms (CNN and SVM) are

also compared. The results show that CNN-based model can identify more RBPs than SVM-based. In

comparing the balance and imbalance model, both CNN-base and SVM-based tend to favor the majority

class in the imbalance set. Deep-RBPPred forecasts 280 (balance model) and 265 (imbalance model) of

299 new RBP. The sensitivity of balance model is about 7% higher than the state-of-the-art method. We

also apply deep-RBPPred to 30 eukaryotes and 109 bacteria proteomes downloaded from Uniprot to

estimate all possible RBPs. The estimating result shows that rates of RBPs in eukaryote proteomes are

much higher than bacteria proteomes.

RNA binding proteins (RBPs) play important functions in many cellular processes, such as post-transcriptional

gene regulation, RNA subcellular localization and alternative splicing. With signicant function in biology, many

high-throughput experimental techniques have been developed to identify new RBPs in human, mouse, S. cere-

visiae and C. elegans

1–10

. Aer RBPs have been identied, CLIP-related experimental technologies

11–14

are applied

to reveal the binding sites in RNAs. Also, many computational methods have been proposed to predict interaction

of protein with RNA

15–18

and RBPs

19–25

. RBP predictors can predict the RBPs, and then CLIP-related techniques

can further reveal RNAs interacting with these RBPs. However, previous computational methods only considered

only part features or known RNA binding domain (RBD) which plays a signicant role in RBPs prediction. So, we

proposed RBPPred integrating as much as features to address this problem

. Benchmarking on datasets shows

that RBPPred is better than other approaches. But RBPPred runs slowly because it requires to run blast against a

huge protein NR database to generate PSSM matrix. However, the prediction speed is important because a large

number of RBPs are still unknown in many species. To overcome this shortcoming, we present Deep-RBPPred

which is based on deep learning.

In recently years, deep learning technology has been used in many aspects in bioinformatics and proved as a

power tool

26–32

. For predicting protein binding sites in RNA sequence, DeepBind

is the rst CNN-based model

to predict the binding anity. Deep-rbp

and iDeep

30,31

are two deep learning methods which both take RNA

structure into consideration. ese methods outperform the conventional approaches in term of prediction accu-

racy. However, deep learning algorithm is still not applied to RBPs prediction. In Deep-RBPPred, we apply a deep

convolutional neural network instead of SVM. Since CNN-based method requires to input a xed length feature

vector, two solutions are handled to meet this requirement. e rst solution is to pad all the sequences to xed

length sequences, and then one-hot encoding is used to encode the sequences. e second solution is to design

School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China. Correspondence

and requests for materials should be addressed to S.L. (email: liushiyong@gmail.com)

Received: 17 May 2018

Accepted: 28 September 2018

Published: xx xx xxxx

OPEN

下载后可阅读完整内容，剩余8页未读，立即下载

刘士勇

粉丝: 0
资源: 1

深度学习加速RNA结合蛋白预测：Deep-RBPPred

基于深度学习的大数据空气污染预报

常见30种数学建模模型

3个径向基网络的matlab源程序

基于深度学习预测动物中RBP-circRNA相互作用位点的工具.zip

RBP结合位点预测的深度学习方法进展.docx

"深度学习方法预测RBP结合位点及其生物功能

基于深度学习的RBP结合蛋白识别方法

深度学习工具预测动物RBP-circRNA相互作用位点

RBP-detector-using-RNA-sequence-master_deeplearning_深度学习_CNN_

ClipNet：确定协同工作的RBP

最新资源