深度学习预测蛋白质二级结构

PDF格式 | 1.42MB | 更新于2024-08-27 | 60 浏览量 | 举报

"这篇论文探讨了使用深度学习方法预测蛋白质二级结构的问题，是计算生物学中的重大挑战之一。预测过程可以分解为多个子问题，其中二级结构预测是最基础的。尽管已有多种计算方法被提出，但能准确建模氨基酸序列与结构之间的映射关系以及残基间的相互作用关系的方法并不多见。文章主要关注的是使用深度学习，特别是编码器-解码器网络和循环神经网络来解决这个问题。" 蛋白质二级结构预测是生物信息学中的核心任务，它涉及到对蛋白质氨基酸序列如何折叠成三维结构的理解。蛋白质的功能与其特定的三维结构密切相关，因此预测结构对于药物设计、疾病机制研究以及蛋白质工程等领域至关重要。传统的预测方法通常基于统计模型或物理模型，但这些方法往往难以捕捉到序列与结构之间的复杂关系。深度学习是一种人工智能领域的先进技术，因其在图像识别、语音识别等领域的成功应用，近年来在生物信息学领域也得到了广泛应用。本文提出的深度学习方法，尤其是编码器-解码器网络和循环神经网络（RNN），能够处理序列数据并捕获长距离依赖性，这使得它们特别适合于蛋白质二级结构预测。编码器-解码器网络是一种用于序列到序列学习的架构，其中编码器将输入序列转化为一个固定长度的向量，而解码器则从这个向量中生成目标序列。这种框架允许模型学习到序列的全局表示，同时保持局部信息，对于理解蛋白质氨基酸序列与结构的关系非常有帮助。循环神经网络，如长短期记忆网络（LSTM）或门控循环单元（GRU），具有记忆单元，能处理序列数据的时序依赖性。在蛋白质二级结构预测中，RNN可以捕捉到氨基酸序列中残基间的相互作用，这对于预测相邻残基的结构状态至关重要。文章指出，尽管已有许多方法尝试解决蛋白质二级结构预测问题，但大多数方法在建模氨基酸序列与二级结构之间的复杂关系以及残基间相互作用方面存在局限。深度学习方法，尤其是结合编码器-解码器和RNN，有望提供更准确的预测，因为它们能够更好地模拟这些关系。这项工作强调了深度学习在蛋白质二级结构预测中的潜力，并为解决这一难题提供了新的思路。未来的研究可能会进一步优化这些模型，提高预测精度，从而推动整个生物信息学领域的进步。

Knowledge-Based Systems 118 (2017) 115–123

Contents lists available at ScienceDirect

Knowle dge-Base d Systems

journal homepage: www.elsevier.com/locate/knosys

Protein secondary structure prediction by using deep learning method

Yangxu Wang, Hua Mao

∗

, Zhang Yi

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, People’s Republic of China

a r t i c l e i n f o

Article history:

Received 9 March 2016

Revised 16 November 2016

Accepted 16 November 2016

Available online 17 November 2016

Keywords:

Deep learning

Secondary structure prediction

Encoder–decoder networks

Recurrent neural networks

a b s t r a c t

The prediction of protein structures directly from amino acid sequences is one of the biggest challenges

in computational biology. It can be divided into several independent sub-problems in which protein sec-

ondary structure (SS) prediction is fundamental. Many computational methods have been proposed for SS

prediction problem. Few of them can model well both the sequence-structure mapping relationship be-

tween input protein features and SS, and the interaction relationship among residues which are both im-

portant for SS prediction. In this paper, we proposed a deep recurrent encoder–decoder networks called

Secondary Structure Recurrent Encoder–Decoder Networks (SSREDNs) to solve this SS prediction prob-

lem. Deep architecture and recurrent structures are employed in the SSREDNs to model both the complex

nonlinear mapping relationship between input protein features and SS, and the mutual interaction among

continuous residues of the protein chain. A series of techniques are also used in this paper to reﬁne the

model’s performance. The proposed model is applied to the open dataset CullPDB and CB513. Experi-

mental results demonstrate that our method can improve both Q3 and Q8 accuracy compared with some

public available methods. For Q8 prediction problem, it achieves 68.20% and 73.1% accuracy on CB513 and

CullPDB dataset in fewer epochs better than the previous state-of-art method.

Introduction

Discovering protein’s structure and biological functions are very

important for understanding their biological processes, such as the

protein-protein interactions [1] , protein complexes identiﬁcation

[2] and protein structure prediction. Protein structure prediction,

elucidating the complex relationship between a protein sequence

and its structure, is one of the most important challenges in com-

putational biology [3] . The most elemental task of protein structure

prediction is protein secondary structure (SS) prediction, which

aims to discover the structural states of amino acids. SS represents

the local conformation of the polypeptide backbone of proteins and

provides a bridge that links the primary sequence and the tertiary

structure, which is very helpful for many structural and functional

analysis tools [4,5] .

Typically, protein secondary structures can either be divided

into three states ( α-helix (H), β-strand (E) and coil region (C)) or

be further classiﬁed into eight ﬁne-grained states (3

-helix (G),

α-helix (H), π -helix (I), β-strand (E), β-bridge (B), β-turn (T), high

curvature regions (S) and irregular loop (L)). SS prediction is usu-

ally evaluated by Q3 and Q8 accuracy, which measures the per-

∗

Corresponding author.

E-mail addresses: mellowxu@gmail.com (Y. Wang), huamao@scu.edu.cn (H.

Mao), zhangyi@scu.edu.cn (Z. Yi).

centage of residues for which 3-state or 8-state SS is correctly pre-

dicted. Currently, extensive research efforts have been spent on ap-

plying computational methods to address the Q3 prediction prob-

lem, but very few to the more challenging Q8 prediction problem.

Hidden markov model (HMM) has been applied to 3-state SS

prediction problem [6] . Although HMM can describe the inter-

actions among adjust residues, it’s very challenging for HMM to

model the complex nonlinear relationship between input protein

features and SS. Support vector machine (SVM) [7] can deal with

this complex nonlinear mapping, but it’s challenging for SVM to

take into consideration the interactions among adjacent residues.

To our best knowledge, by using a 2-stage neural networks (NNs)

method [8] , so far the best Q3 accuracy is about 80%. For the Q8

prediction problem, existing methods [9,10] fail to provide promis-

ing results. The problem may be that most of these mentioned

methods are shallow architectures. The limitation of them is that

it’s very diﬃcult for a relatively shallow architectures to model

well both the complex sequence-structure relationship between in-

put protein features and SS, and the mutual interaction relation-

ship among residues. However, they are both important for SS

prediciton [10,11] .

Nowdays, NNs with deep architectures, also called deep neu-

ral networks (DNNs) become the most powerful machine learning

techniques for pattern recognition [12,13] . With the ability of map-

ping unorganized low-level features into high-level laten data rep-

resentations which are more suitable for a ﬁnal classiﬁcation prob-

http://dx.doi.org/10.1016/j.knosys.2016.11.015

下载后可阅读完整内容，剩余8页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

weixin_38654855

粉丝: 6

深度学习预测蛋白质二级结构

Python项目源码：CNN/Transformer蛋白质二级结构预测

蛋白质二级结构预测的毕业设计项目

深度学习在蛋白质二级结构预测中的应用

RNA Secondary Structure Prediction7.rar

Fine-Grained Parallelism Accelerating for RNA Secondary Structure Prediction with Pseudoknots Based on FPGA

A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.

PSRna: Prediction of Small RNA Secondary Structure Based on the Reverse Complementary Folding Method

calcuate the secondary structure

SGPPI: structure-aware prediction of protein–protein interaction

Protein-Secondary-Structure-Classification:使用深度学习检测层次特征表示并建立蛋白质二级结构的预测模型

最新资源