利用计算方法识别和功能注释酵母E3泛素连接酶Rsp5底物

25 浏览量更新于2024-07-15 收藏 680KB PDF 举报

"本文主要探讨了通过计算机模拟方法（in silico）来识别和功能注释酵母E3泛素连接酶Rsp5的底物。Rsp5是一种在酵母到哺乳动物中高度保守的E3连接酶，对多种生物过程具有关键作用。然而，对其所有底物的认识仍不完全。研究团队系统地分析了可能与Rsp5底物识别相关的多个特征，发现PPxYmotif、跨膜区域和无序区是影响Rsp5与其底物相互作用的重要分子决定因素。" 文章详细介绍了对Rsp5 E3泛素连接酶的研究，这是生物学和蛋白质组学领域的一个重要主题。E3泛素连接酶是泛素化途径的关键组成部分，该途径负责蛋白质的标记和降解，从而参与细胞周期调控、信号转导、应激反应和多种疾病的发生。Rsp5在酵母中具有广泛的功能，包括蛋白质运输、囊泡形成、细胞周期控制和应激响应等。研究者利用计算生物学方法，即在计算机上进行模拟分析（in silico analysis），来预测和识别Rsp5的新底物。这种方法可以减少实验成本，提高效率，并可能揭示尚未被发现的Rsp5作用机制。他们关注了几个可能影响Rsp5底物识别的特征： 1. PPxY motif：这是一种特定的氨基酸序列模式，被认为是Rsp5识别其底物的标志性结构。这种模体可能作为Rsp5结合的接头，促进泛素化过程。 2. 跨膜区域：研究发现，含有跨膜结构域的蛋白质更有可能成为Rsp5的底物。这可能是因为跨膜蛋白在细胞膜上的定位使其更易于与Rsp5相互作用。 3. 无序区：蛋白质的无序区域可能增加其灵活性，有利于Rsp5的结合和泛素化。这些区域通常在蛋白质功能中扮演重要角色，可能涉及信号传导或蛋白质复合体的组装。此外，文章还可能涵盖了功能注释的过程，这涉及到将预测的Rsp5底物与其潜在的生物学功能联系起来。这可能包括蛋白质互作网络分析、通路富集分析以及对这些底物在酵母生理过程中的作用的深入理解。这项研究为理解Rsp5的生物学功能提供了新的见解，同时也为寻找其他生物体中E3泛素连接酶的底物提供了研究策略。其结果可能有助于揭示新的药物靶点，尤其是在疾病关联的蛋白质异常泛素化过程中。

324

. Son

et al.

redundant proteins using 40% sequence identity as the cut-off in order to obtain a non-

redundant negative data set. After filtering, we obtained 154 qualified full-length yeast

proteins as negative data sets.

In order to discover novel Rsp5 substrates in yeast proteome, we also collected 3450

proteins comprehensively as the pool to be predicted by our method (Belle et al., 2006).

2.2 Informative features

As we already know, the interaction between enzyme and substrate is somewhat

determined by its structure and sequence, so the amino acid sequence is the basis for

investigating the Rsp5 substrate (Jaakkola et al., 2000; Liao and Noble, 2002; Saigo

et al., 2004). We analysed the amino acid composition and distribution of the protein

sequence.

We examined a number of features computed based on protein sequences and

secondary structures that are possibly relevant to the recognition of Rsp5 substrate. Some

features are included because they are known to be relevant to substrates of Rsp5, while

others are included because of their statistical relevance to our classification problem.

Firstly, we calculated the statistics of monopeptide and dipeptide, which were then

normalised by the sequence length. In the end, we obtained 420 features from the amino

acid composition. Due to the variation of polypeptide sequences in evolution, analysis of

the composition of grouped amino acids will be more reasonable. We divide them into

six groups according to physical and chemical properties of amino acid: class a (I, V, L,

M), class b (F, Y, W), class c (H, K, R), class d (D, E), class e (Q, N, T, P) and class f

(A, C, G, S). We then analysed the composition (C) of grouped mono-peptide, grouped

dipeptide and tripeptide.

Beside the descriptors of amino acid composition, transition (T) and distribution (D)

of amino acid groups are also used to describe the global composition of amino acids

groups, in which T denotes the relative frequency in changing amino acid groups along

the protein sequence and D denotes the chain length within which the first 25%, 50%,

75% and 100% of the amino acids of a particular group are located (Dubchak et al.,

1995; Cai et al., 2003; Cui et al., 2007). We also included some general features such as

the protein length, hydrophobic value, sulphur content, isoelectric point, signal peptide

and N-end amino acid.

We also took into account the Low-Complexity Region (LCR) as an important

feature. LCRs in protein sequences are regions containing little diversity in their amino

acid composition. We examined the numbers of LCR, the length of maximum LCR and

the total length of LCR in every sequence by the program of ‘SEG’ in order to

investigate the implication with the stability of proteins (Wootton, 1994). In the end,

three features in total were included in the initial list.

Functional proteins with part of disordered structures are highly abundant in nature.

Disordered proteins are more widespread in eukaryotic proteomes; therefore, four

features representing the number of disordered regions, the total length of disordered

regions, the length of maximum disordered regions and the average score of disordered

regions are used to describe the characteristics of the disordered region. We analysed

disorder regions of protein by means of IUPred (Dosztanyi et al., 2005).

We take the existence of the transmembrane region and the total length of

transmembrane region in protein as two features in initial list. We analysed the

transmembrane regions of protein by means of SMART (Letunic et al., 2009).

剩余16页未读，继续阅读

weixin_38731479

粉丝: 3
资源: 916

利用计算方法识别和功能注释酵母E3泛素连接酶Rsp5底物

In Silico Conserved Motif and Promoter Analysis of the MIKCc-type MADS-Box Genes in Tomato

In silico saturation mutagenesis of cancer .pptx

multidict-6.0.2-cp39-cp39-win_amd64.whl

【图像融合】基于matlab小波变换灰色图像融合（含相关性、信噪比）【含Matlab源码 1841期】.md

VOS3000: 高效视频监控系统介绍及其部署指南

【医学图像分割】基于matlab磁共振成像 (MRI) 数值模拟平台【含Matlab源码 826期】.md

pyHook-1.5.1-cp36-cp36m-win_amd64.whl

e4c018e1-bc1a-45ce-a434-93f9285299721728807532179.mp4

llist-0.7.1-cp310-cp310-win_amd64.whl

基于非平衡热力学的超弹性-塑性耦合土体模型及其应用

最新资源