Co-training an Improved Recurrent Neural Network 547
2 Related Work
As described in [5], there are three kind of methods for named entity recognition:
dictionary-based methods, rule-based methods and statistical machine learning
methods which rely on different theories. NER can be solved by machine learning
methods, such as CRF [6,7], Support Vector Machine (SVM) [8], HMM [9]etc.
These methods are commonly used for NER these years in a way of supervised
learning. In addition, semi-supervised methods are also one road to this task
when labeled data is difficulty to obtain.
Recently, while the probability statistical models perform well in many fields,
deep neural networks as a new wave tide in machine learning, have achieved great
performances in many domains such as image classification [10], knowledge dis-
covery [11] and translation [12] etc. Collobert et al. [13] propose a unified neural
network architecture and learning algorithm to do various NLP tasks and also
achieved a better result for NER task. Compared to the well-known Convo-
lutional Neural Network (CNN) which has achieved remarkable performances
in image domain, RNN can exploit the time-connection feedback thus capture
dependencies beyond the input window. Therefore, RNN architecture is more
suitable for NER. Song et al. [14] build a simple and efficient system for bio-NER
based on Recurrent Neural Network (RNN). Jason P.C. Chiu and Eric Nichols
[15] present a novel neural network architecture that can automatically detect
word and character level features using a hybrid bidirectional Long Short-Term
Memory (LSTM) and CNN architecture.
On the other hand, as described in [16], a deep neural network is characterized
by a set of weight matrices, bias vectors, and a nonlinear activation function,
which gives a deep neural network the learning ability of hierarchical nonlinear
mapping. But in model parameter training, weight matrices and bias vectors are
updated using an error back-propagation algorithm whereas activation function
is not. So the change of activation function is important for a neural network,
which can speed up model training [17], enhance stability [18]. In this paper, we
adopt the RNN model and modify its activation function to do NER task.
Another problem for RNN is that it needs plenty of train data. Hence in this
paper we consider a co-training method which is one of useful solutions when
train data is in lack. Co-training, one of the semi-supervised learning methods,
was first proposed in 1998 and also has been used in NER. Tsendsuren et al.
[19] present an Active Co-Training (ACT) algorithm for biomedical named-entity
recognition. Li et al. [20] propose a semi-supervised approach to extract bilingual
named entity and used a bilingual co-training algorithm to improve the named
entity annotation quality. But using RNN to do co-training is a few in NER
researches [21] and most of them are about biomedical domain. In this paper,
we aim to explore the performance when co-training an improved RNN with
probability statistic models for NER task.