Research Article
A Shortest Dependency Path Based Convolutional Neural
Network for Protein-Protein Relation Extraction
Lei Hua
1
and Chanqin Quan
2
1
Department of Computer and Information Sciences, Hefei University of Technology, Hefei 230009, China
2
Department of Computer and Information Sciences, Kobe University, Kobe 6578501, Japan
Correspondence should be addressed to Lei Hua; hualeilxf@.com
Received March ; Revised June ; Accepted June
Academic Editor: Rita Casadio
Copyright © L. Hua and C. Quan. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
e state-of-the-art methods for protein-protein interaction (PPI) extraction are primarily based on kernel methods, and their
performances strongly depend on the handcra features. In this paper, we tackle PPI extraction by using convolutional neural
networks (CNN) and propose a shortest dependency path based CNN (sdpCNN) model. e proposed method (1) only takes the
sdp and word embedding as input and (2) could avoid bias from feature selection by using CNN. We performed experiments on
standard Aimed and BioInfer datasets, and the experimental results demonstrated that our approach outperformed state-of-the-art
kernel based methods. In particular, by tracking the sdpCNN model, we nd that sdpCNN could extract key features automatically
and it is veried that pretrained word embedding is crucial in PPI task.
1. Introduction
Biomedical relations play an important role in biologic
processesandarewidelyresearchedintheeldofbiomedical
natural language processing (BioNLP). PPI task aims to
extract protein interactions; for example, in sentence “e
distribution of actin laments is altered by prolin overexpres-
sion,” the interaction between protein entities “actin” and
“prolin” would be extracted. A number of databases, such as
BIND [], MINT [], and IntAct [], had been created to store
structured interactions. However, the biomedical literature
regarding protein interactions is expanding rapidly, making it
dicult for these databases to keep up with the latest protein-
protein interactions. Consequently, eective and automatic
protein-protein relation extraction systems become more
signicant.
Previous researches have illustrated the eectiveness of
the shortest dependency path (sdp) between entities for
relation extraction in many elds [–]. For example, in
PPI task, [] proposed an edit-distance kernel based on sdp
and classied the relations by SVM. Reference [] has made
a detailed investigation into the relevant work of relation
extraction and elaborated the important role of sdp in relation
extraction. However, how to preprocess the sdp (e.g., using
a variety of kernels) and how to combine dierent features
(e.g., part-of-speech, -grams, and parser tree) still are open
problems. In this work, the proposed approach takes raw
sdp as the only input, and it can learn features automatically.
And thus, dierent from previous researches, manual feature
selectionandfeaturecombinationarenotnecessaryinour
approach.
Many eorts have been done on PPI task, especially
the kernel based methods. Most of these methods take the
PPI task as a binary classication problem by determining
whether there is an interaction between the two entities. e
kernels include bag-of-words kernel [], all-path kernel [],
subset-tree kernel [], edit-distance kernel [], and graph
kernel [], and they have shown eectiveness in PPI task.
Considering that single kernel partly calculates the similarity
of two instances, hybrid kernel [–] has been proposed
and demonstrated much better performance than single
kernel. Kernel methods are eective, because they integrate
a large amount of manually selected features. e problem
of existing kernel based method is how to combine dierent
features; in most cases, sophisticated design is required.
Deep learning methods have achieved remarkable results
in computer vision [] and speech recognition [], and due
Hindawi Publishing Corporation
BioMed Research International
Volume 2016, Article ID 8479587, 9 pages
http://dx.doi.org/10.1155/2016/8479587