sdpCNN：卷积神经网络在蛋白质-蛋白质关系提取中的应用

45 浏览量更新于2024-08-26 收藏 1.3MB PDF 举报

"这篇研究论文提出了一种基于最短依赖路径的卷积神经网络（sdpCNN）模型，专门用于蛋白质-蛋白质关系提取。传统的蛋白质相互作用（PPI）提取方法主要依赖于手工特征的核方法，而sdpCNN则采用CNN，并仅使用最短依赖路径（sdp）和词嵌入作为输入。这种方法减少了对人工特征工程的依赖，通过学习语义上下文中的模式来识别蛋白质关系。" 正文: 在生物信息学领域，蛋白质-蛋白质相互作用（Protein-Protein Interactions, PPIs）的研究对于理解细胞功能、疾病机制以及药物发现具有重要意义。传统的PPI提取方法通常涉及复杂的特征工程，需要专家设计一系列特征来捕获蛋白质间的关系。然而，这些方法的性能受限于特征选择的质量。本文提出的基于最短依赖路径的卷积神经网络（sdpCNN）模型是一种新颖的深度学习方法，它尝试解决这个问题。sdpCNN的核心思想是利用最短依赖路径（Shortest Dependency Path, sdp）来捕捉句子中蛋白质关系的上下文信息。依赖路径是自然语言处理中的一种语法结构，可以表示单词间的句法关系，从而提供关于蛋白质关系的线索。卷积神经网络（CNN）在图像识别和自然语言处理等领域已经取得了显著成就，其强大的特征学习能力使得sdpCNN能够自动学习从sdp和词嵌入中提取的模式。词嵌入是一种将词汇转换为连续向量的技术，能有效地捕获词汇的语义信息。sdpCNN将这两个元素结合，能够在不依赖大量手工特征的情况下，识别蛋白质相互作用的信号。 sdpCNN的工作流程包括以下几个步骤：首先，通过计算句子中两个蛋白质实体之间的最短依赖路径，获取关键的句法信息；其次，每个单词被其对应的词嵌入表示，形成sdp的向量序列；然后，CNN对这个向量序列进行卷积操作，检测不同长度的局部模式；最后，通过池化层和全连接层对特征进行整合，输出蛋白质关系的预测结果。这种方法的优势在于，它能够自动学习到蛋白质关系的表示，同时减少了对领域知识的依赖。通过端到端的训练，sdpCNN可以适应不同的数据集，有望提高PPI提取的准确性和泛化能力。此外，论文还可能涉及模型的训练细节，如优化器的选择、损失函数的定义、超参数的调整，以及实验结果的比较。作者可能对比了sdpCNN与其他传统方法（如基于支持向量机或条件随机场的模型）的性能，展示了sdpCNN在PPI关系提取任务上的优越性。这篇研究论文提出了一个创新的深度学习模型sdpCNN，它利用最短依赖路径和词嵌入来提取蛋白质关系，为PPI的自动识别提供了新的思路，有望推动生物信息学领域的进步。

Research Article

A Shortest Dependency Path Based Convolutional Neural

Network for Protein-Protein Relation Extraction

Lei Hua

and Chanqin Quan

Department of Computer and Information Sciences, Hefei University of Technology, Hefei 230009, China

Department of Computer and Information Sciences, Kobe University, Kobe 6578501, Japan

Correspondence should be addressed to Lei Hua; hualeilxf@.com

Received  March ; Revised  June ; Accepted  June 

Academic Editor: Rita Casadio

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

e state-of-the-art methods for protein-protein interaction (PPI) extraction are primarily based on kernel methods, and their

performances strongly depend on the handcra features. In this paper, we tackle PPI extraction by using convolutional neural

networks (CNN) and propose a shortest dependency path based CNN (sdpCNN) model. e proposed method (1) only takes the

sdp and word embedding as input and (2) could avoid bias from feature selection by using CNN. We performed experiments on

standard Aimed and BioInfer datasets, and the experimental results demonstrated that our approach outperformed state-of-the-art

kernel based methods. In particular, by tracking the sdpCNN model, we nd that sdpCNN could extract key features automatically

and it is veried that pretrained word embedding is crucial in PPI task.

1. Introduction

Biomedical relations play an important role in biologic

processesandarewidelyresearchedintheeldofbiomedical

natural language processing (BioNLP). PPI task aims to

extract protein interactions; for example, in sentence “e

distribution of actin laments is altered by prolin overexpres-

sion,” the interaction between protein entities “actin” and

“prolin” would be extracted. A number of databases, such as

BIND [], MINT [], and IntAct [], had been created to store

structured interactions. However, the biomedical literature

regarding protein interactions is expanding rapidly, making it

dicult for these databases to keep up with the latest protein-

protein interactions. Consequently, eective and automatic

protein-protein relation extraction systems become more

signicant.

Previous researches have illustrated the eectiveness of

the shortest dependency path (sdp) between entities for

relation extraction in many elds [–]. For example, in

PPI task, [] proposed an edit-distance kernel based on sdp

and classied the relations by SVM. Reference [] has made

a detailed investigation into the relevant work of relation

extraction and elaborated the important role of sdp in relation

extraction. However, how to preprocess the sdp (e.g., using

a variety of kernels) and how to combine dierent features

(e.g., part-of-speech, -grams, and parser tree) still are open

problems. In this work, the proposed approach takes raw

sdp as the only input, and it can learn features automatically.

And thus, dierent from previous researches, manual feature

selectionandfeaturecombinationarenotnecessaryinour

approach.

Many eorts have been done on PPI task, especially

the kernel based methods. Most of these methods take the

PPI task as a binary classication problem by determining

whether there is an interaction between the two entities. e

kernels include bag-of-words kernel [], all-path kernel [],

subset-tree kernel [], edit-distance kernel [], and graph

kernel [], and they have shown eectiveness in PPI task.

Considering that single kernel partly calculates the similarity

of two instances, hybrid kernel [–] has been proposed

and demonstrated much better performance than single

kernel. Kernel methods are eective, because they integrate

a large amount of manually selected features. e problem

of existing kernel based method is how to combine dierent

features; in most cases, sophisticated design is required.

Deep learning methods have achieved remarkable results

in computer vision [] and speech recognition [], and due

Hindawi Publishing Corporation

BioMed Research International

Volume 2016, Article ID 8479587, 9 pages

http://dx.doi.org/10.1155/2016/8479587

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38603875

粉丝: 6

sdpCNN：卷积神经网络在蛋白质-蛋白质关系提取中的应用

基于神经网络的目标识别论文汇总.zip

基于最短依存路径与神经网络的关系抽取.pdf

网络游戏-基于最短路径从全局蛋白互作网络提取子网络方法及系统.zip

基于java的开发源码-最短路径算法实现 k-shortest-paths.zip

java资源最短路径算法实现k-shortest-paths

基于最短路径算法的物流配送车辆优化调度（VRP）的研究

论文研究-基于中轴线约束的最短路径的血管提取算法.pdf

基于最短路径的蛋白互作子网络提取研究

基于最短路径的网络边介数计算方法及问题讨论

探索GNSS-R镜面反射点：基于最短路径算法研究

最新资源