递归自动编码器与HowNet词典结合的情感分析

55 浏览量更新于2024-08-26 收藏 441KB PDF 举报

"这篇研究论文提出了一种名为'具有HowNet词典的递归自动编码器'的方法，专门用于句子级情感分析。该方法利用HowNet词典来增强语义词表示，并通过递归自动编码器捕捉句法和语义信息，以提高情感分析的准确性。在实际应用中，由于标注数据的获取通常成本高昂，因此，论文提出了一种基于全标注句法树的监督学习训练模型，无需人工注解，显著减轻了手动标注的负担。在句子级情感分类任务上，该模型的效能得到了验证。" 在情感分析领域，传统的语义词表示往往忽视了词汇间的句法关系。而递归自动编码器（Recursive Autoencoder）是一种深度学习模型，能够处理树形结构的数据，例如自然语言中的句法树，从而捕获句子中词汇之间的层次关系。结合HowNet词典，这个模型可以进一步增强词的语义表示，HowNet是一个大型的汉语语义词典，包含了丰富的词汇意义和感情色彩信息。在该研究中，模型的训练依赖于监督学习，但与常规方法不同的是，它使用了完全标注的句法树而非人工标注的句子。这降低了对大量标注数据的依赖，使得模型能够在没有额外标注工作的情况下学习到更复杂的句法结构和语义信息。递归自动编码器在处理句子时，会自底向上地组合单词的表示，形成更高级别的表达，这一过程能够体现句子的构成规则。实验结果显示，这种结合HowNet词典和递归自动编码器的方法在句子级情感分析任务上表现出色，证明了模型的有效性。这意味着，对于情感分析的应用，如在线评论的情感倾向判断、社交媒体情绪监测等，这种模型能提供更准确的预测，有助于提高自然语言处理系统的性能。这项工作为解决情感分析中句法信息利用不足和标注数据获取困难的问题提供了新的思路和解决方案。

Recursive Autoencoder with HowNet Lexicon for

Sentence-Level Sentiment Analysis

Xianghua Fu

College of Computer Science and Software Engineering

Shenzhen University, Shenzhen Guangdong

518060, China

fuxh@szu.edu.cn

Yingying Xu

College of Computer Science and Software Engineering

Shenzhen University, Shenzhen Guangdong

518060, China

yingyingyulia@foxmail.com

ABSTRACT

Semantic word representations have been very useful but usually

ignore the syntactic relationship. In the task of sentiment analysis,

compositional vector representations require more structure

information from natural language text and richer supervised

training for more accuracy predictions. However, labeled data are

generally expensive to acquire in reality. To remedy this, we

propose a new method that train our model based on fully labeled

parse tree using supervised learning without manual annotation.

Our method not only significantly reduces the burden of manual

labeling, but also allows the compositionality to capture syntactic

and semantic information jointly. We show the effectiveness of

this model on the task of sentence-level sentiment classification

and conduct preliminary experiments to investigate its

performance. Lastly, it can accurately predict the sentiment

distribution and outperforms other approaches.

CCS Concepts

• Information systems➝Information retrieval➝Retrieval tasks

and goals ➝ Sentiment analysis • Information systems ➝

Information systems applications➝Data mining • Computing

methodologies ➝ Artificial intelligence ➝ Natural language

processing.

Keywords

Sentiment Analysis; Deep Learning; HowNet Lexicon; Parse Tree;

Word Embedding; Data Mining; Sentiment Label.

1. INTRODUCTION

Sentiment analysis is the task of identifying the subjectivity,

polarity (positive or negative) and polarity strength of a piece of

text. Depending on the subjective text, the granularity of the

analysis varies. In this research, we target at the task of sentence-

level sentiment analysis. It aims to classify the sentiment polarity

(such as positive or negative) of sentence based on the text

information.

Most previous studies follow Pang et al’s approach [14] and

regard sentiment analysis as a special case of text categorization

task. Traditional methods mainly adopt bag-of-words

representations, which is more suitable for longer documents by

relying on a few words with strong sentiment like ‘awesome’ or

‘exciting’, while may be not optimal for short messages. With the

deepening research of vector representation in recent years, word

embedding for sentiment analysis is widely concerned. Unlike

primitive word representation, word embedding represent a single

word as a dense, low-dimensional vector in a meaning space [2].

However, since it can only represent words, semantic composition

must be considered to represent phrases and sentences. Socher et

al. [18] exploits hierarchical structure and uses compositional

semantics to understand sentiment. However, the following

problems exist. (1) They use a greedy approximation constructs

the tree structure which doesn’t necessarily follow standard

syntactic constrains. (2) The internal nodes’ sentiment label used

to compute the loss function (cross-entropy) is missing. But

further progress towards understanding compositionality in tasks

such as sentiment analysis requires richer supervised training.

Then Socher et al. [19] introduce a Sentiment Treebank which is

the first corpus with fully labeled parse trees. When trained on the

new Treebank, even baseline methods can achieve improvement.

However, the high cost of manual annotation of training data for

supervised learning imposes a significant burden on their usage.

In order to overcome the above problems, we propose our novel

recursive autoencoder model. The major difference of our model

can be listed as follows:

(1) Rather than manually annotating sentiment labels for

nonterminal nodes, we use HowNet lexicon to compute every

nodes’ polarity. It significantly reduces the burden of manual

labeling.

(2) Instead of constructing binary tree by greedy algorithm, we

represent the structure of sentences using syntax trees. In this way,

the feature representations can capture as much of structure

information as possible.

(3) The characteristics of Chinese bring difficulty in sentiment

classification and so many previous works just exists in English

datasets. In our experiments, our evaluation datasets do not

contain English but also Chinese.

The remaining parts of this paper are organized as follows: In

Section 2 we introduce some related works. Section 3 describes

the model in detail. Experiments and evaluations are reported in

Section 4. The paper is closed with conclusion in Section 5.

Permission to make digital or hard copies of all or part of this work

for personal or classroom use is granted without fee provided that

copies are not made or distributed for profit or commercial

advantage and that copies bear this notice and the full citation on the

first page. Copyrights for components of this work owned by others

than ACM must be honored. Abstracting with credit is permitted.

To copy otherwise, or republish, to post on servers or to redistribute

to lists, requires prior specific permission and/or a fee. Request

permissions from Permissions@acm.org.

ASE BD&SI 2015, October 07-09, 2015, Kaohsiung, Taiwan

DOI: http://dx.doi.org/10.1145/2818869.2818908

Permission to make digital or hard copies of all or part of this work

for personal or classroom use is granted without fee provided that

copies are not made or distributed for profit or commercial

advantage and that copies bear this notice and the full citation on the

first page. Copyrights for components of this work owned by others

than ACM must be honored. Abstracting with credit is permitted.

To copy otherwise, or republish, to post on servers or to redistribute

to lists, requires prior specific permission and/or a fee. Request

permissions from Permissions@acm.org.

ASE BD&SI 2015, October 07-09, 2015, Kaohsiung, Taiwan

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38621630

粉丝: 3
资源: 914

递归自动编码器与HowNet词典结合的情感分析

hownet+ntusd+python情感分析代码，一份积分三份资源

基于HowNet构造语义场的方法 (2005年)

基于句子的情感分析：基于句子的情感分析

基于TensorFlow的神经网络递归自编码器用于句子聚类RAE.zip

谷物：用于室内场景的生成式递归自动编码器

recursive_autoencoder:递归自动编码器

Semi-Supervised-Recursive-Autoencoders-for-Predicting-Sentiment-Distributions:用于预测情绪分布的半监督递归自动编码器

matlab中代码意思-Sentiment-Analysis-using-Recursive-Autoencoders:使用递归自动编码器的情

行业分类-外包设计-基于递归自动编码的高光谱特征学习方法的说明分析.rar

递归下降语法分析器

最新资源