GPCR-GIA: 高效的G蛋白偶联受体鉴定与家族分类工具

178 浏览量更新于2024-09-01 收藏 204KB PDF 举报

GPCR-GIA是一个专为识别G-蛋白偶联受体(GPCRs)及其家族而设计的在线服务器，它结合了灰色发生率分析这一创新方法。G-蛋白偶联受体在调控各种生理过程以及几乎所有细胞活动方面发挥核心作用，因其广泛的功能性差异，不同的GPCR家族负责不同的生物学功能。随着后基因组时代的海量蛋白质序列数据涌现，开发一种自动化的工具来解决两个关键问题显得尤为重要：给定一个查询蛋白的序列，能否判断它是否属于GPCR？如果属于，它应属于哪个家族？ GPCR-GIA的算法设计采用两层分类体系，首先通过灰色发生率（Grey Incidence Degree, GID）这一新颖的度量标准来评估输入序列与GPCR的关联程度。GID是一种介于完全黑白二元分类（即明确是或不是GPCR）之间的灰色度量，可以捕捉到潜在的不确定性和模糊性。这种尺度使得GPCR-GIA在识别GPCR和非GPCR之间具有很高的总体准确率，大约达到了95%。服务器的工作流程包括输入查询序列，然后进行比对、特征提取和灰色度分析。它可能运用了生物信息学技术，如序列相似性搜索、结构特征分析以及模式识别，来确定候选GPCR的特性。此外，GPCR-GIA还可能整合了已知GPCR家族的数据库和进化关系，以便更准确地定位目标蛋白所属的家族类别。使用GPCR-GIA，研究人员和生物信息学家能够快速有效地对新发现的蛋白质进行初步筛查，节省了时间和实验资源，同时提高了GPCR研究的效率和精确度。该工具的开发对于深入了解GPCR的多样性和复杂性，以及推进相关药物研发具有重要意义。然而，尽管准确性高，但用户仍需注意灰色发生率的不确定性，可能需要结合其他证据进行进一步验证。

GPCR-GIA: a web-server for identifying G-protein

coupled receptors and their families with grey

incidence analysis

Wei-Zhong Lin

, Xuan Xiao

1,3

and Kuo-Chen Chou

Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen

333001, China and

Gordon Life Science Institute, 13784 Torrey Del Mar

Drive, San Diego, CA 92130, USA

To whom correspondence should be addressed.

E-mail: xiaoxuan0326@yahoo.com.cn

G-protein-coupled receptors (GPCRs) play fundamental

roles in regulating various physiological processes as well

as the activity of virtually all cells. Different GPCR

families are responsible for different functions. With the

avalanche of protein sequences generated in the postge-

nomic age, it is highly desired to develop an automated

method to address the two problems: given the sequence

of a query protein, can we identify whether it is a

GPCR? If it is, what family class does it belong to? Here,

a two-layer ensemble classiﬁer called GPCR-GIA was

proposed by introducing a novel scale called ‘grey inci-

dent degree’. The overall success rate by GPCR-GIA in

identifying GPCR and non-GPCR was about 95%, and

that in identifying the GPCRs among their nine family

classes was about 80%. These rates were obtained by the

jackknife cross-validation tests on the stringent bench-

mark data sets where none of the proteins has



50%

pairwise sequence identity to any other in a same class.

Moreover, a user-friendly web-server was established at

http://218.65.61.89:8080/bioinfo/GPCR-GIA. For user’s

convenience, a step-by-step guide on how to use the

GPCR-GIA web server is provided. Generally speaking,

one can get the desired two-level results in around 10 s

for a query protein sequence of 300–400 amino acids; the

longer the sequence is, the more time that is needed.

Keywords: ensemble classiﬁer/fusion/K nearest neighbor

algorithm/pseudo amino acid composition/web server

Introduction

G-protein-coupled receptors (GPCRs) are seven-helix trans-

membrane proteins that provide a molecular link between

extracellular signals and intracellular reactions ranging from

cell–cell communication processes to physiological

responses (Heuss and Gerber, 2000; Milligan and White,

2001; Hall and Lefkowitz, 2002; Chou, 2005a). They are

among the largest and most diverse protein families in mam-

malian genomes. Owing to their close relevance to a variety

of diseases, such as cancer, diabetes, neurodegenerative,

inﬂammatory and respiratory disorders, GPCRs are of utmost

interest in drug development: over half of all prescription

drugs currently on the market act by targeting these receptors

directly or indirectly.

Many efforts have been invested in studying GPCR by

both academic institutions and pharmaceutical industries.

However, as membrane proteins, GPCRs are very difﬁcult to

crystallize and most of them will not dissolve in normal sol-

vents. Accordingly, so far, very few crystal GPCR structures

have been determined. Although the recently developed

state-of-the-art NMR technique is a very powerful tool in

determining the three-dimensional structures of membrane

proteins (Oxenoid and Chou, 2005; Call et al., 2006; Douglas

et al., 2007; Schnell and Chou, 2008), it is time-consuming

and costly. Although some membrane protein structures can

be derived with homology approaches (Chou, 2004), the

number of templates for transmembrane proteins is very

limited. In contrast, more than thousand GPCR sequences

are known, and much more are expected to come in the near

future. In view of this, it would be very useful to develop a

computational method which can predict the classiﬁcation of

the families and subfamilies of GPCRs based on their

primary sequences.

In a pioneer study (Chou and Elrod, 2002), Chou and

Elrod attempted to identify the subfamily classes of the

rhodopsin-like GPCR family by using the covariant-

discriminant algorithm (Chou and Elrod, 1999). With more

data available later, the study was extended to identify the

main family classes of GPCRs (Chou, 2005b) with a similar

approach. Stimulated by the encouraged results, some

follow-up studies were conducted by using various different

approaches as reported in Bhasin and Raghava (2005), Gao

and Wang (2006) and Wen et al. (2007).

Although considerable progresses have been achieved

during the past 6 years in this area, further studies are

needed due to the following reasons. First, the data sets con-

structed to train the existing predictors cover very limited

GPCR family classes. With the development of protein data-

bases, more classes should be included to enhance the cover-

age scope for practical usage. Secondly, the reported success

rates were derived based on a benchmark data set without

being rigorously screened by a clear data-culling operation to

avoid redundancy and homologous bias, and hence those

reported success rates therein might be overestimated. As is

well known, the more the family classes covered, the lower

the odds are in getting a correct prediction. Also, the more

stringent the benchmark data set in excluding homologous

sequences, the harder it becomes to get a high success rate

for cross-validation test (Xiao et al., 2005; Chou and Shen,

2007c; Chou and Shen, 2008). The present study was

devoted to address these problems by developing a new

GPCR predictor. Moreover, a user-friendly web server,

called GPCR-GIA, was designed for the new predictor. For

the convenience of most experimental scientists who wish to

utilize the predictor to generate the desired data but feel difﬁ-

cult to follow the detailed mathematics and processes, a

step-by-step guide on how to use the web server predictor

was provided.

For Permissions, please e-mail: journals.permissions@oxfordjournals.org

699

Protein Engineering, Design & Selection vol. 22 no. 11 pp. 699–705, 2009

Published online September 22, 2009 doi:10.1093/protein/gzp057

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38635323

粉丝: 9
资源: 955

GPCR-GIA: 高效的G蛋白偶联受体鉴定与家族分类工具

tra-gia-vang:从 code.google.comptra-gia-vang 自动导出

transicao-siafi-para-grp:GIA门户网站透明玻璃制造商协会

Optimizacion-Comercial:GIA1º信息-团队合作2021

GIA-2015-GIS-Meets-Cloud-Tutorial:GIS IN ACTION 2015 大会的 GIS 云教程

sdamgia-api:SdamGIA Api-与教育门户SDAM GIA交互的Python模块

wikiplag-multi:使用Spark，Hadoop和Apache Cassandra查找Wikipedia gia窃

Integracao-Com-ViaCEP:通过API与ViaCEP集成

callnumber-plugin-cordova:电话号码插件科尔多瓦

design-patterns-python:Padrõesde Projeto em Python

You-Can-See-Clearly-Now:令人敬畏的微光图像增强方法的集合

最新资源