利用氨基酸序列的局部联合三联体预测蛋白质相互作用

128 浏览量更新于2024-07-14 收藏 620KB PDF 举报

"使用新型的氨基酸序列的局部联合三联体描述符预测蛋白质-蛋白质相互作用" 这篇研究论文探讨了预测蛋白质-蛋白质相互作用（PPIs）的新方法，利用了氨基酸序列的局部联合三联体描述符。PPIs在细胞的各种生物学过程中起着至关重要的作用，对生命系统的正常运作至关重要。尽管过去几十年通过高通量技术已经验证了大量的PPIs，但目前所知的PPI配对仍远远不完整，因此预测新PPIs的方法具有巨大的研究价值。蛋白质相互作用预测是生物信息学的一个重要领域，它有助于理解复杂的生物网络，如信号转导途径和代谢通路，以及疾病的发生机制，如癌症和传染性疾病。传统的实验方法如酵母双杂交和免疫共沉淀等虽然有效，但成本高、耗时且难以规模化。因此，开发基于序列信息的预测模型成为一种高效而经济的替代方案。本文提出的方法创新在于使用局部联合三联体描述符，这是一种将氨基酸序列信息编码为数学特征的方式。三联体描述符考虑了氨基酸序列中的连续三个位置，而局部联合则意味着这些三联体信息是结合在一起考虑的，以捕捉蛋白质表面的结构和功能特性。这种描述符可以捕获序列的局部模式，可能包括二级结构信息和亲水性等特性，从而帮助预测蛋白质是否能形成相互作用。论文中可能会涉及以下步骤： 1. 数据预处理：首先，需要收集已知的蛋白质序列及其相互作用信息作为训练和测试数据集。这通常来自公共数据库，如STRING、BioGRID等。 2. 三联体编码：对于每个蛋白质序列，提取所有可能的三联体，并将它们转化为数值向量，这一步即为局部联合三联体描述符的生成。 3. 特征选择与降维：为了减少计算复杂度并提高预测准确性，可能需要进行特征选择和降维操作，例如使用PCA（主成分分析）或LASSO（套索回归）。 4. 构建预测模型：将处理后的特征输入到机器学习算法中，如支持向量机（SVM）、随机森林（Random Forest）或深度学习模型，如卷积神经网络（CNN），训练模型以区分有相互作用和无相互作用的蛋白质对。 5. 模型评估：使用交叉验证和独立测试集评估模型的性能，主要指标可能包括精确率、召回率、F1分数和AUC值。 6. 应用与优化：优化模型参数，提高预测性能，并将其应用于未知蛋白质序列的PPI预测，以发现新的生物学联系。这项研究通过提出新的氨基酸序列描述符，为蛋白质相互作用预测提供了一种有效工具，有助于加快生物学研究的步伐，特别是在系统生物学和药物发现等领域。

Int. J. Mol. Sci. 2017, 18, 2373 4 of 17

2.2. Experimental Setup

DNN-LCTD is implemented on Tensorlfow platform https://www.tensorﬂow.org. The ﬂowchart

of DNN-LCTD is shown in Figure 1. DNN-LCTD ﬁrstly encodes the amino acid sequences using

the novel LCTD. After that, we train a 3-hidden layers neural network with GPU based on the

encoded feature sets. Finally, we apply the learned DNN to predict new PPIs. Hyper-parameters

of the DNN model heavily impact the experimental results. Deep learning algorithms have ten

or more hyper-parameters to be properly speciﬁed, trying all of them is impossible in practice [

We summarize the recommended conﬁguration of DNN-LCTD in Table 1. As to the parameters setup of

the comparing methods, we use the grid search approach to obtain the optimal parameters. The optimal

parameters is shown in Table 2. The details of the parameters of comparing methods are available

at http://scikit-learn.org. For Du et al. work [

], there are too many parameters need to be set,

the information of parameters can be accessed via http://ailab.ahu.edu.cn:8087/DeepPPI/index.html.

All the experiments are carried out on a server with conﬁguration: CentOS 7.3, 256 GB RAM, and Intel

Exon E5-2678 v3. DNN-LCTD uses NVIDIA Corporation GK110BGL [Tesla K40c] to accelerate training

of DNNs.

DIP

PIR

Uniprot

Obtain Protein

Interaction Data

Positive Set

PPI

Negative Set

PPI

Testing

Random Pairing

LCTD

DNN with

GPU

Learned network

Evaluation &

Comparison

Obtain Protein

Sequence Data

neg

= No

pos

Final

Dataset

Figure 1.

The ﬂowchart of DNN-LCTD for predicting protein-protein interactions. There are some

abbreviations in this ﬁgure, including database of interacting proteins (DIP), protein information

resource, local conjoint triad descriptor (LCTD), protein-protein interactions (PPIs), and graphics

processing unit (GPU). The

neg

is the number of non-interacting protein pairs,

pos

is the number

of interacting protein pairs. Y/N means yes/no.

2.3. Results on PPIs of S. cerevisiae

In order to achieve good experimental results, the corresponding hyper-parameters for deep

neural network are ﬁrstly optimized. Table 1 provides the recommended hyper-parameters that are

chosen by a large number of experiments. Considering the numerous samples used in this work,

ﬁve-fold cross validation is adopted to reduce the impact of data dependency and to minimize the

risk of over-ﬁtting. Thus, ﬁve models are generated for the ﬁve sets of data. Table 3 reports the

results of DNN-LCTD on ﬁve individual folds (fold 1–5) and the overall average results of ﬁve folds.

From Table 3, we can observe that all the prediction accuracies are nearly

≥

93.1%, the precisions

are

≥

93.35%, all the recalls are almost

≥

93.4%, the speciﬁcities are

≥

92.75%, and the

are

≥

92.4%.

In order to comprehensively evaluate the performance of DNN-LCTD, the MCC and AUC are also

calculated. DNN-LCTD achieves superior prediction performance with an average accuracy as 93.11%,

precision as 93.75%, recall as 92.40%, speciﬁcity as 92.75%, MCC as 86.24%,

as 93.06%, and AUC

as 97.95%.

剩余16页未读，继续阅读

weixin_38715831

粉丝: 4
资源: 990

利用氨基酸序列的局部联合三联体预测蛋白质相互作用

通过蛋白质序列的多元互信息预测蛋白质-蛋白质相互作用

三联体光敏剂的（-）-表没食子儿茶素-3-没食子酸酯的猝灭效应和光敏化过程中的单加氧作用

Gentein核心：DNA到蛋白质序列perl应用

电子政务-三联体电除尘灰斗.zip

基于现有K字的频率和位置熵的蛋白质序列特征量度

论文研究 - 腹腔镜治疗晚期三联体异位妊娠的诊断。

基于深度学习的蛋白质亚细胞定位预测.pdf

AdS中的费米离子高自旋三联体

三联体类半带滤波器组的设计方法

计算文件中的碱基三联体的种类及数目

最新资源