Classification Algorithmof Chinese SentimentOrientation
Based on Dictionary and LSTM
Ge Bin
Science and Technology on InformationSystems
Engineering Key Laboratory , National University of
Defense Technology,Changsha, Hunan, P.R. China,
410073
gebin@nudt.edu.cn
He Chunhui*
Science and Technology on InformationSystems
Engineering Key Laboratory , National University of
Defense Technology,Changsha, Hunan, P.R. China,
410073
xtuhch@163.com
Zhang Chong
Science and Technology on InformationSystems
Engineering Key Laboratory , National University of
Defense Technology,Changsha, Hunan, P.R. China,
410073
leocheung8286@qq.com
Hu Yanli
Science and Technology on InformationSystems
Engineering Key Laboratory , National University of
Defense Technology,Changsha, Hunan, P.R. China,
410073
huyanli@nudt.edu.cn
ABSTRACT
Chinese sentiment analysis is a hot research issue in information
analysis, but the tagging corpus which can be used for machine
learning algorithm training is poor. Machine learning algorithm is
used for text sentiment classification, generally only categories
are given while sentiment words can not be extracted. This paper
proposed an automatic tagging strategy for training corpus and a
classification algorithm for Chinese sentiment orientation based
on dictionary and LSTM. It can label the training corpus
automatically and accurately and efficiently, and also extract
sentiment words. Experiment shows this method is effective and
the accuracy of LSTM algorithm has reached 93.51% on the
mixed data set of sentiment classification.
CCS Concepts
Applied computin ➝Document management and text
processing ➝ Document capture ➝ Document analysis.
Keywords
Sentiment Analysis; Automatic Annotation; Long Short-Term
Memory Neural Network
1. INTRODUCTION
With the rapid development of the Internet and social media and
e-commerce platforms, a large number of users have generated a
large amount of text data with sentimental tendencies in a short
period of time. In recent years, using these text data to mine
hidden negative or positive sentiment tendencies has become a
very valuable research direction in the field of natural language
processing, and a lot of research results have been obtained.
Through the induction and analysis of relevant literature, it is
found that the current mainstream text sentiment analysis methods
mainly include sentiment analysis based on the sentiment database
and template rule base, or statistically based methods using
artificially labeled corpus to train machine learning algorithms.
Then the trained algorithm or model is used to classify the
sentiment tendencies of the text. In the process of sentiment
analysis techniques and theoretical development, these two
methods often infiltrate each other, prompting the sentiment
analysis technology to continue to advance. Especially in the
sentiment analysis of English, the researchers have put forward
many efficient algorithms and mature tools. However, for Chinese
sentiment analysis, the start is relatively late, and Chinese is still
facing problems and challenges such as the lack of large-scale
annotated data sets.
With the deep maturity of deep learning techniques and
frameworks, some researchers have proposed to use deep neural
network algorithms to deeply mine the sentiment tendencies in
text.Although this method can greatly improve the performance of
the algorithm under certain premise, it also has some
shortcomings. The premise is that a large amount of labeled
training data is needed as the input of the algorithm.Considering
the fact that the high-quality labeling training corpus available in
Chinese is particularly lacking, this is a major challenge for
Chinese sentiment analysis;Second, such methods can only
classify text sentiment tendencies.They don’t give sentiment
words that appear in the document, which is not friendly for many
fine-grained Chinese sentiment analysis tasks.At the same time,
they are often impossible to explain the classification results of
sentiment orientation.
In order to better solve the above deficiencies, this paper proposes
a classification algorithm of Chinese sentiment orientation based
on dictionary and long-term and short-term memory neural
network(LSTM).This algorithm combined with the sentiment
dictionary and sentiment score calculation method can give the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee.
Request permissions from Permissions@acm.org.
ICBDR 2018, October 27–29, 2018, Weihai, China
© 2018 Association for Computing Machinery.
978-1-4503-6476-8/18/10…$15.00
DOI: http://dx.doi.org/10.1145/3291801.3291835