Journal of Engineering Science and Technology Review 8 (4) (2015) 51-55
Research Article
Extracting Nested Biomedical Entity Relations by Tagging Dependency Chains
Xiaomei Wei
1,2
, Yu Huang
2
, Chen Lyu
1,3
, and Donghong Ji
1,*
1
Computer School, Wuhan University, Wuhan 430072, China
2
College of informatics, Huazhong Agriculture University, Wuhan 430070, China
3
Singapore University of Technology and Design, 138682, Singapore
Received 24 June 2015; Accepted 14 October 2015
___________________________________________________________________________________________
Abstract
Biomedical event extraction is an important research topic in the field of biomedical text mining. However, much
research work is required before event extraction systems become applicable. Thus, we proposed a novel and efficient
approach for extracting nested biomedical events. First, using dependency parsing, we extracted the target sequences that
contained biomedical entity (trigger/argument) chains. Second, the Condition Random Fields (CRFs) model was used to
tag the entity chains which represented the nested argument-trigger edges. Thirdly, the post-processing step was used to
output the events. This method is a new attempt to treat the biomedical event extraction as a sequence tagging problem.
The experiment results showed that we got the performance of 47.3 in F-score which is promising when compared with
the joint ML-based system in BioNLP-ST2013. Furthermore, we estimated the results of the trigger detection, which
outperformed the state-of–the-art systems on the same corpus. Therefore, our work is a positive contribution to the
biomedical text mining community.
Keywords: Joint; Event extraction; Entity chain; Dependency; Tag
__________________________________________________________________________________________
1. Introduction
Biomedical event extraction has become an important
research topic in the field of biomedical natural language
processing in recent years [1]. Biomedical events describe
the fine-grained relations among biomedical entities. The
biomedical literature contains substantial information
regarding relations among biomedical entities, and these
relations must be extracted to construct a knowledge
database for researchers. This effort led to the BioNLP GE
shared task (BioNLP-ST, hereafter) series [2-4], which aims
to extract nested bio-molecular events from biomedical text.
BioNLP-ST addressed nine types of biomedical molecular
events related to protein biology. These events can be
grouped into three categories: Simple, Binding, and
Regulation. Simple events (Gene_expression, Transcription,
Protein_catabolism, Phosphorylation, Localization) take one
protein argument. Binding events (Binding) have one or
more protein arguments. Regulation events
(Positive_regulation, Negative_regulation and Regulation)
have one obligatory Theme and one optional Cause
argument. Each argument of Regulation events could be
either a protein or another event. A Regulation event is
considered nested if it has another event as its argument. A
sample of an event annotation of a sentence (Sen.1) from
training corpus is illustrated in Fig. 1.
Sen.1: BMP-6 did not induce significant changes in the
protein expression of Id2 and Id3.
In this sentence, the trigger words are presented in bold
font, whereas the protein arguments are expressed in
underline font. In the definition of BioNLP09-ST [2], both
triggers and arguments are called entities. In the upper
textbox of the figure, proteins “BMP-6”, “Id2”, and “Id3”
are labeled as T73, T74, and T75, respectively. In the lower
textbox, T50 and T51 are two labels of triggers, and E27 and
E28 are two events.
Biomedical event extraction is a complex task that
requires study before being applied. The complexity of event
extraction rests on two aspects. First, the sentences in the
biomedical literature are typically very complex. Second,
many biomedical events are nested and are thus different
from the event definition in the common field, such as the
ACE2005 [5] event task. As shown in Fig.1, event E79
contains the trigger word T169 and the protein argument
T74. Meanwhile, event E79 is the argument of another event
E76. Therefore, event E76 is a nested event while it is the
argument of event E75. When multiple nested layers exist,
extracting events becomes more difficult because errors in
the lower layers could lead to errors in the upper layers.
2. Related works
To date, researchers have proposed many experimental
methods to extract biomedical event based on