基于词典与LSTM的情感分析：中文文本的分类算法

需积分: 0 66 浏览量更新于2024-08-05 收藏 403KB PDF 举报

本文主要探讨了中文情感分析领域的一项关键研究问题，即如何提高机器学习算法在中文情感倾向分类中的性能，特别是针对缺乏高质量训练标注语料库的挑战。作者们提出了一种基于词典和长短时记忆（LSTM）的分类算法，着重于科学与技术信息系统工程学科，这在中国国防科技大学的信息系统工程重点实验室（National University of Defense Technology, Changsha, Hunan, P.R.China, 邮箱：gebin@nudt.edu.cn, hechunhui@163.com, zhangchong@nudt.edu.cn, leocheung8286@qq.com）开展的研究中。传统上，中文情感分析往往依赖于手动标注的情感词汇，然而这种方式的局限性在于数据量有限且难以覆盖所有表达情绪的微妙变化。文章的创新之处在于利用词典作为基础，结合LSTM的序列模型能力，以捕捉文本中情感词汇的上下文信息。LSTM是一种递归神经网络，特别适合处理自然语言处理任务中的长期依赖性，有助于识别和分类文本中的主观情感。该研究团队可能采用了预处理步骤，如分词、词性标注和情感词典匹配，将文本转化为可用于机器学习的特征表示。LSTM模型会学习到不同词序对情感极性的影响，从而提高分类的准确性。此外，他们可能还通过交叉验证等方法评估了算法的性能，并可能对比了与其他常用分类算法（如朴素贝叶斯、支持向量机等）的性能。论文的标题"(EI收录+Web of Science核心合集收录)中文情感倾向分类算法"表明，这项工作不仅得到了学术界的认可，还被纳入了重要的国际检索平台，如EI（Engineering Village）和Web of Science，说明其研究成果具有较高的学术价值和影响力。本文的核心知识点包括： 1. **中文情感分析面临的挑战**：缺乏大规模高质量标注语料库。 2. **方法创新**：基于词典和LSTM的分类算法，强调情感词上下文理解和长期依赖性捕捉。 3. **研究背景**：国防科技大学信息系统工程重点实验室的研究项目。 4. **研究过程**：可能涉及文本预处理、特征提取和模型训练。 5. **成果评价**：发表在EI和Web of Science等高引用期刊，反映了研究的学术水平。这篇论文为中文情感分析领域的机器学习应用提供了新的视角和技术，对于相关研究人员和工程师来说，它提供了一个有价值的学习参考和实践指导。

Classification Algorithmof Chinese SentimentOrientation

Based on Dictionary and LSTM

Ge Bin

Science and Technology on InformationSystems

Engineering Key Laboratory , National University of

Defense Technology,Changsha, Hunan, P.R. China,

410073

gebin@nudt.edu.cn

He Chunhui*

Science and Technology on InformationSystems

Engineering Key Laboratory , National University of

Defense Technology,Changsha, Hunan, P.R. China,

410073

xtuhch@163.com

Zhang Chong

Science and Technology on InformationSystems

Engineering Key Laboratory , National University of

Defense Technology,Changsha, Hunan, P.R. China,

410073

leocheung8286@qq.com

Hu Yanli

Science and Technology on InformationSystems

Engineering Key Laboratory , National University of

Defense Technology,Changsha, Hunan, P.R. China,

410073

huyanli@nudt.edu.cn

ABSTRACT

Chinese sentiment analysis is a hot research issue in information

analysis, but the tagging corpus which can be used for machine

learning algorithm training is poor. Machine learning algorithm is

used for text sentiment classification, generally only categories

are given while sentiment words can not be extracted. This paper

proposed an automatic tagging strategy for training corpus and a

classification algorithm for Chinese sentiment orientation based

on dictionary and LSTM. It can label the training corpus

automatically and accurately and efficiently, and also extract

sentiment words. Experiment shows this method is effective and

the accuracy of LSTM algorithm has reached 93.51% on the

mixed data set of sentiment classification.

CCS Concepts

Applied computin ➝Document management and text

processing ➝ Document capture ➝ Document analysis.

Keywords

Sentiment Analysis; Automatic Annotation; Long Short-Term

Memory Neural Network

1. INTRODUCTION

With the rapid development of the Internet and social media and

e-commerce platforms, a large number of users have generated a

large amount of text data with sentimental tendencies in a short

period of time. In recent years, using these text data to mine

hidden negative or positive sentiment tendencies has become a

very valuable research direction in the field of natural language

processing, and a lot of research results have been obtained.

Through the induction and analysis of relevant literature, it is

found that the current mainstream text sentiment analysis methods

mainly include sentiment analysis based on the sentiment database

and template rule base, or statistically based methods using

artificially labeled corpus to train machine learning algorithms.

Then the trained algorithm or model is used to classify the

sentiment tendencies of the text. In the process of sentiment

analysis techniques and theoretical development, these two

methods often infiltrate each other, prompting the sentiment

analysis technology to continue to advance. Especially in the

sentiment analysis of English, the researchers have put forward

many efficient algorithms and mature tools. However, for Chinese

sentiment analysis, the start is relatively late, and Chinese is still

facing problems and challenges such as the lack of large-scale

annotated data sets.

With the deep maturity of deep learning techniques and

frameworks, some researchers have proposed to use deep neural

network algorithms to deeply mine the sentiment tendencies in

text.Although this method can greatly improve the performance of

the algorithm under certain premise, it also has some

shortcomings. The premise is that a large amount of labeled

training data is needed as the input of the algorithm.Considering

the fact that the high-quality labeling training corpus available in

Chinese is particularly lacking, this is a major challenge for

Chinese sentiment analysis;Second, such methods can only

classify text sentiment tendencies.They don’t give sentiment

words that appear in the document, which is not friendly for many

fine-grained Chinese sentiment analysis tasks.At the same time,

they are often impossible to explain the classification results of

sentiment orientation.

In order to better solve the above deficiencies, this paper proposes

a classification algorithm of Chinese sentiment orientation based

on dictionary and long-term and short-term memory neural

network(LSTM).This algorithm combined with the sentiment

dictionary and sentiment score calculation method can give the

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that

copies bear this notice and the full citation on the first page. Copyrights

for components of this work owned by others than ACM must be

honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior

specific permission and/or a fee.

Request permissions from Permissions@acm.org.

ICBDR 2018, October 27–29, 2018, Weihai, China

978-1-4503-6476-8/18/10…$15.00

DOI: http://dx.doi.org/10.1145/3291801.3291835

119

下载后可阅读完整内容，剩余7页未读，立即下载

我要WhatYouNeed

粉丝: 48
资源: 287

基于词典与LSTM的情感分析：中文文本的分类算法

(EI收录+web of sicence核心集合收录)Domain Neural Chinese Word Segmentati

(CCF C类+EI+web of sicence核心集合收录)Rule-Based HierarchicalRank An U

(web of sicence核心集合收录)Chinese News Hot Subtopic Discovery and Re

web of sicence

computer sicence 33

python data sicence handbook

Applications of MATLAB in Science and Engineering

第二版Science Research Writing for Non-Native Speakers of English

基于51单片机设计字符型LCD1602软件程序源码+Proteus仿真实例+文档资料.zip

借助Web of Science进行科研选题与创新

最新资源