ProtDec-LTR：学习排序提升蛋白质远程同源性检测

研究论文

201 浏览量更新于2024-08-26 收藏 1.09MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇研究论文探讨了学习等级在蛋白质远程同源性检测中的应用，旨在改进现有的计算方法，提升预测性能。" 蛋白质远程同源性检测是计算生物学中的基础问题，其目标是寻找已知结构数据库中与给定查询蛋白质具有进化关系的蛋白质序列。这个问题在生物信息学领域至关重要，因为识别这些同源蛋白质可以帮助科学家理解蛋白质的功能、结构以及它们在进化过程中的关系。当前，一些计算方法将蛋白质远程同源性检测视为排名问题来处理，如PSI-BLAST（Position-Specific Iterated Basic Local Alignment Search Tool）、HHblits（Hierarchical HMM-HMM search for protein domain families）和ProtEmbed等，这些方法已经在实践中表现出最先进的性能。它们通过比较和排序蛋白质序列，找出与查询蛋白最相似的序列，从而确定可能的同源关系。论文作者Bin Liu、Junjie Chen和Xiaolong Wang提出了一种名为ProtDec-LTR（Protein Decoder with Learning to Rank）的新方法。这种方法结合了学习排名的策略，目的是通过优化现有技术的组合来提高预测的准确性。学习排名是一种机器学习技术，它通过对多个对象进行排序来解决预测问题，通过训练数据来学习如何最好地排列这些对象，以最大化某个性能度量。在论文中，作者可能详细讨论了ProtDec-LTR的工作原理，包括如何构建模型、如何训练学习算法以及如何评估预测性能。他们可能对比了ProtDec-LTR与其他现有方法的实验结果，展示了新方法在精度、召回率或F1分数等方面的优势。此外，论文还可能涵盖了数据预处理、特征选择、参数调优等关键步骤，这些都是优化蛋白质远程同源性检测的关键。这篇研究论文深入探讨了如何利用学习排名的理论来改进蛋白质远程同源性检测，有望为生物信息学领域的蛋白质分析提供更精确的工具，并促进对生命科学的深入理解。

资源详情

资源推荐

Sequence analysis

Application of learning to rank to protein

remote homology detection

Bin Liu

1,2,3,

*, Junjie Chen

and Xiaolong Wang

1,2

School of Computer Science and Technology,

Key Laboratory of Network Oriented Intelligent Computation,

Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China and

Gordon Life

Science Institute, Belmont, MA 02478, USA

*To whom correspondence should be addressed.

Associate Editor: John Hancock

Received on June 4, 2015; revised on July 3, 2015; accepted on July 7, 2015

Abstract

Motivation: Protein remote homology detection is one of the fundamental problems in computa-

tional biology, aiming to ﬁnd protein sequences in a database of known structures that are evolu-

tionarily related to a given query protein. Some computational methods treat this problem as a

ranking problem and achieve the state-of-the-art performance, such as PSI-BLAST, HHblits and

ProtEmbed. This raises the possibility to combine these methods to improve the predictive per-

formance. In this regard, we are to propose a new computational method called ProtDec-LTR for

protein remote homology detection, which is able to combine various ranking methods in a super-

vised manner via using the Learning to Rank (LTR) algorithm derived from natural language

processing.

Results: Experimental results on a widely used benchmark dataset showed that ProtDec-LTR can

achieve an ROC1 score of 0.8442 and an ROC50 score of 0.9023 outperforming all the individual

predictors and some state-of-the-art methods. These results indicate that it is correct to treat pro-

tein remote homology detection as a ranking problem, and predictive performance improvement

can be achieved by combining different ranking approaches in a supervised manner via using LTR.

Availability and implementation: For users’ convenience, the software tools of three basic ranking

predictors and Learning to Rank algorithm were provided at http://bioinformatics.hitsz.edu.cn/

ProtDec-LTR/home/

Contact: bliu@insun.hit.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction

Using sequence similarity between protein pairs to detect evolu-

tionary relationships is one of the central tasks in bioinformatics,

which can be applied to the protein 3D structure and function pre-

diction (Bork and Koonin, 1998). Unfortunately, remote homology

protein pairs have similar structures and functions, but they lack

easily detectable sequence similarity, because the protein tertiary

structure is more conserved than protein sequence. Therefore, it is

often difficult to detect protein remote homology by computa-

tional approaches.

Some effective computational methods have been developed to

address this challenging problem, which can be mainly divided into

two groups, including discriminative methods and ranking methods.

The first group discriminative methods treat protein remote hom-

ology detection as a classification problem using both the positive

and negative samples to train the classification models, and then

they are used to predict unseen samples. Among this kind of

approaches, the methods based on Support Vector Machines

(SVMs) achieve the state-of-the-art performance with appropri-

ate kernel functions, which measure the similarity between any

Bioinformatics, 31(21), 2015, 3492–3498

doi: 10.1093/bioinformatics/btv413

Advance Access Publication Date: 10 July 2015

Original Paper

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38513665

粉丝: 5
资源: 936

ProtDec-LTR：学习排序提升蛋白质远程同源性检测

用什么软件可以序列同源性计算

怎样进行序列同源性比对

使用比对结果怎么进行进化分析，比如计算序列的同源性、构建系统发育树

蛋白质结构预测领域简要发展历程

蛋白质pssm矩阵介绍

检测App内部的WebView的file域协议是否存在同源策略绕过漏洞

tcga 同源重组缺陷

在ncbi中搜索感兴趣基因的同源基因

win10使用MobSF静态分析如何检测App内部的WebView的file域协议是否存在同源策略绕过漏洞

cors跨域资源共享漏洞检测

使用什么工具检测App内部的WebView的file域协议是否存在同源策略绕过漏洞

如何检测App内部的WebView的file域协议是否存在同源策略绕过漏洞

dnastar蛋白分析入门

alphafold预测蛋白结构使用方法

已拦截跨源请求：同源策略禁止读取位于 的远程资源。（原因：CORS 请求未能成功）。

weixin007医院管理系统+Springboot.rar

5G网络优化：片区满意度交流材料.pptx

操作系统内可以一键关闭WD

weixin086基于微信小程序的影院选座系统+ssm.rar

基于java的学生评奖评优管理系统的设计与实现.docx

最新资源

已拦截跨源请求：同源策略禁止读取位于的远程资源。（原因：CORS 请求未能成功）。