弱标签学习提升蛋白质功能预测:处理不完整注释的方法

0 下载量 53 浏览量 更新于2024-08-27 收藏 2.06MB PDF 举报
"蛋白质功能预测是计算生物学中的一个重要挑战,尤其是在面对大量未标注蛋白时,准确预测其功能至关重要。传统的多标签学习方法通常基于完全标注的假设,即已知功能的蛋白质没有遗漏任何功能。然而,在实际应用中,蛋白质的功能并非总是完全可获取的,可能存在部分缺失的标签或者未知的功能。 本文提出了一种名为Protein Function Prediction with Weak-label Learning (ProWL)的方法,以及其增强版本ProWL-IF,旨在解决蛋白质功能预测中因缺失标注导致的问题。ProWL的核心思想是利用弱标签(weak labels)来填补蛋白质功能的空白,即通过分析部分已知的功能信息,推断可能存在的其他功能。弱标签学习允许模型在一定程度上处理不确定性,提高预测的稳健性。 ProWL-IF在此基础上更进一步,除了补全缺失的功能,它还引入了一个额外的考量——蛋白质不可能具有某些特定功能的知识。这种逆向思维策略可以作为额外的约束,增强模型对蛋白质功能的识别能力。这种方法的优势在于能够充分利用现有的有限信息,并考虑到功能的互补性和排斥性,从而提高预测精度。 实验结果在蛋白质相互作用网络和基因表达数据集上进行了验证,显示ProWL和ProWL-IF在面对不完整标注的情况下,相较于传统方法有显著的优势,能够在实际场景中提供更准确的蛋白质功能预测。这些研究成果对于理解生物系统、疾病机制以及药物设计等领域具有重要意义,也为未来在生物信息学领域中处理复杂、不完整数据提供了新的思路和工具。"

精简下面表达:Existing protein function prediction methods integrate PPI networks and multivariate bioinformatics data to improve the performance of function prediction. By combining multivariate information, the interactions between proteins become diverse. Different interactions’ functions in functional prediction are various. Combining multiple interactions simply between two proteins can effectively reduce the effect of false negatives and increase the number of predicted functions, but it can also increase the number of false positive functions, which contribute to nonobvious enhancement for the overall functional prediction performance. In this article, we have presented a framework for protein function prediction algorithms based on PPI network and semantic similarity with the addition of protein hierarchical functions to them. The framework relies on diverse clustering algorithms and the calculation of protein semantic similarity for protein function prediction. Classification and similarity calculations for protein pairs clustered by the functional feature are more accurate and reliable, allowing for the prediction of protein function at different functional levels from different proteomes, and giving biological applications greater flexibility.The method proposed in this paper performs well on protein data from wine yeast cells, but how well it matches other data remains to be verified. Yet until now, most unknown proteins have only been able to predict protein function by calculating similarities to their homologues. The predictions result of those unknown proteins without homologues are unstable because they are relatively isolated in the protein interaction network. It is difficult to find one protein with high similarity. In the framework proposed in this article, the number of features selected after clustering and the number of protein features selected for each functional layer has a significant impact on the accuracy of subsequent functional predictions. Therefore, when making feature selection, it is necessary to select as many functional features as possible that are important for the whole interaction network. When an incorrect feature was selected, the prediction results will be somewhat different from the actual function. Thus as a whole, the method proposed in this article has improved the accuracy of protein function prediction based on the PPI network method to a certain extent and reduces the probability of false positive prediction results.

2023-02-27 上传