A Deep Neural Network Sentence Level Classification Method with
Context Information
Xingyi Song and Johann Petrak
Department of Computer Science
University of Sheffield
Sheffield, UK
{x.song, johann.petrak}@sheffield.ac.uk
Angus Roberts
NIHR Biomedical Research Centre
Institute of Psychiatry, Psychology and Neuroscience
Kings College London
London, UK
angus.roberts@kcl.ac.uk
Abstract
In the sentence classification task, context
formed from sentences adjacent to the sen-
tence being classified can provide important
information for classification. This context is,
however, often ignored. Where methods do
make use of context, only small amounts are
considered, making it difficult to scale. We
present a new method for sentence classifica-
tion, Context-LSTM-CNN, that makes use of
potentially large contexts. The method also
utilizes long-range dependencies within the
sentence being classified, using an LSTM, and
short-span features, using a stacked CNN. Our
experiments demonstrate that this approach
consistently improves over previous methods
on two different datasets.
1 Introduction
Artificial neural networks (ANN) and especially
Deep Neural Networks (DNN) give state-of-the
art results for sentence classification tasks. Usu-
ally, sentences are treated as separate instances for
the task. However, in many situations the sen-
tence that is the focus of classification appears
in a context that can provide additional informa-
tion. For example, in the below sentences from the
IEMOCAP dataset, it is difficult to classify M02 as
showing excitement, without the prior context:
• M01: I got it. I got accepted to U.S.C..
• F01: Oh, for real?
• M02: Yes! I just found out today. I just got the letter.
Our work is motivated by sentence classifica-
tion in the text of medical records, in which com-
plex judgements may be made across several sen-
tences, each adding weight and nuance to a point.
We believe, however, that the techniqe is more
widely applicable. In order to test generalisability
and to allow reproducibility, we therefore present
an evaluation of the method with publicy avail-
able, non-medical corpora.
Previous work on using context for sentence
classification used LSTM and CNN network lay-
ers to encode the surrounding context, giving an
improvement in classification accuracy (Lee and
Dernoncourt, 2016). However, the use of CNN
and LSTM layers imposes a significant computa-
tional cost when training the network, especially
if the size of the context is large. For this reason,
the approach presented in (Lee and Dernoncourt,
2016) is explicitly intended for sequential, short-
text classification.
In many cases, however, the context available is
of significant size. We therefore introduce a new
method, Context-LSTM-CNN
1
, which is based
on the computationally efficient FOFE (Fixed Size
Ordinally Forgetting) method (Zhang et al., 2015),
and an architecture that combines an LSTM and
CNN for the focus sentence. The method consis-
tently improves over results obtained from either
LSTM alone, CNN alone, or these two combined,
with little increase in training time.
This paper makes three contributions: 1) a
demonstration of the importance of context in
some sentence classification tasks; 2) an adapta-
tion of existing datasets for such sentence classifi-
cation tasks, in order to support reproducibility of
evaluations; 3) a neural architecture for sentence
classification that outperforms previous methods,
and can include context of arbitrary size without
incurring a large computational cost.
2 Related work
Since their introduction (Collobert et al., 2011),
CNNs with word embedding language models
have become common for text classification tasks
(Kim, 2014; Conneau et al., 2017). One limi-
tation of the original CNN approach is the loss
1
The code is publicly available at
https://github.com/deansong/contextLSTMCNN
arXiv:1809.00934v1 [cs.IR] 31 Aug 2018