没有合适的资源?快使用搜索试试~ 我知道了~
首页递归自动编码器与HowNet词典结合的情感分析
递归自动编码器与HowNet词典结合的情感分析
0 下载量 137 浏览量
更新于2024-08-26
收藏 441KB PDF 举报
"这篇研究论文提出了一种名为'具有HowNet词典的递归自动编码器'的方法,专门用于句子级情感分析。该方法利用HowNet词典来增强语义词表示,并通过递归自动编码器捕捉句法和语义信息,以提高情感分析的准确性。在实际应用中,由于标注数据的获取通常成本高昂,因此,论文提出了一种基于全标注句法树的监督学习训练模型,无需人工注解,显著减轻了手动标注的负担。在句子级情感分类任务上,该模型的效能得到了验证。" 在情感分析领域,传统的语义词表示往往忽视了词汇间的句法关系。而递归自动编码器(Recursive Autoencoder)是一种深度学习模型,能够处理树形结构的数据,例如自然语言中的句法树,从而捕获句子中词汇之间的层次关系。结合HowNet词典,这个模型可以进一步增强词的语义表示,HowNet是一个大型的汉语语义词典,包含了丰富的词汇意义和感情色彩信息。 在该研究中,模型的训练依赖于监督学习,但与常规方法不同的是,它使用了完全标注的句法树而非人工标注的句子。这降低了对大量标注数据的依赖,使得模型能够在没有额外标注工作的情况下学习到更复杂的句法结构和语义信息。递归自动编码器在处理句子时,会自底向上地组合单词的表示,形成更高级别的表达,这一过程能够体现句子的构成规则。 实验结果显示,这种结合HowNet词典和递归自动编码器的方法在句子级情感分析任务上表现出色,证明了模型的有效性。这意味着,对于情感分析的应用,如在线评论的情感倾向判断、社交媒体情绪监测等,这种模型能提供更准确的预测,有助于提高自然语言处理系统的性能。这项工作为解决情感分析中句法信息利用不足和标注数据获取困难的问题提供了新的思路和解决方案。
资源详情
资源推荐
Recursive Autoencoder with HowNet Lexicon for
Sentence-Level Sentiment Analysis
Xianghua Fu
College of Computer Science and Software Engineering
Shenzhen University, Shenzhen Guangdong
518060, China
fuxh@szu.edu.cn
Yingying Xu
College of Computer Science and Software Engineering
Shenzhen University, Shenzhen Guangdong
518060, China
yingyingyulia@foxmail.com
ABSTRACT
Semantic word representations have been very useful but usually
ignore the syntactic relationship. In the task of sentiment analysis,
compositional vector representations require more structure
information from natural language text and richer supervised
training for more accuracy predictions. However, labeled data are
generally expensive to acquire in reality. To remedy this, we
propose a new method that train our model based on fully labeled
parse tree using supervised learning without manual annotation.
Our method not only significantly reduces the burden of manual
labeling, but also allows the compositionality to capture syntactic
and semantic information jointly. We show the effectiveness of
this model on the task of sentence-level sentiment classification
and conduct preliminary experiments to investigate its
performance. Lastly, it can accurately predict the sentiment
distribution and outperforms other approaches.
CCS Concepts
• Information systems➝Information retrieval➝Retrieval tasks
and goals ➝ Sentiment analysis • Information systems ➝
Information systems applications➝Data mining • Computing
methodologies ➝ Artificial intelligence ➝ Natural language
processing.
Keywords
Sentiment Analysis; Deep Learning; HowNet Lexicon; Parse Tree;
Word Embedding; Data Mining; Sentiment Label.
1. INTRODUCTION
Sentiment analysis is the task of identifying the subjectivity,
polarity (positive or negative) and polarity strength of a piece of
text. Depending on the subjective text, the granularity of the
analysis varies. In this research, we target at the task of sentence-
level sentiment analysis. It aims to classify the sentiment polarity
(such as positive or negative) of sentence based on the text
information.
Most previous studies follow Pang et al’s approach [14] and
regard sentiment analysis as a special case of text categorization
task. Traditional methods mainly adopt bag-of-words
representations, which is more suitable for longer documents by
relying on a few words with strong sentiment like ‘awesome’ or
‘exciting’, while may be not optimal for short messages. With the
deepening research of vector representation in recent years, word
embedding for sentiment analysis is widely concerned. Unlike
primitive word representation, word embedding represent a single
word as a dense, low-dimensional vector in a meaning space [2].
However, since it can only represent words, semantic composition
must be considered to represent phrases and sentences. Socher et
al. [18] exploits hierarchical structure and uses compositional
semantics to understand sentiment. However, the following
problems exist. (1) They use a greedy approximation constructs
the tree structure which doesn’t necessarily follow standard
syntactic constrains. (2) The internal nodes’ sentiment label used
to compute the loss function (cross-entropy) is missing. But
further progress towards understanding compositionality in tasks
such as sentiment analysis requires richer supervised training.
Then Socher et al. [19] introduce a Sentiment Treebank which is
the first corpus with fully labeled parse trees. When trained on the
new Treebank, even baseline methods can achieve improvement.
However, the high cost of manual annotation of training data for
supervised learning imposes a significant burden on their usage.
In order to overcome the above problems, we propose our novel
recursive autoencoder model. The major difference of our model
can be listed as follows:
(1) Rather than manually annotating sentiment labels for
nonterminal nodes, we use HowNet lexicon to compute every
nodes’ polarity. It significantly reduces the burden of manual
labeling.
(2) Instead of constructing binary tree by greedy algorithm, we
represent the structure of sentences using syntax trees. In this way,
the feature representations can capture as much of structure
information as possible.
(3) The characteristics of Chinese bring difficulty in sentiment
classification and so many previous works just exists in English
datasets. In our experiments, our evaluation datasets do not
contain English but also Chinese.
The remaining parts of this paper are organized as follows: In
Section 2 we introduce some related works. Section 3 describes
the model in detail. Experiments and evaluations are reported in
Section 4. The paper is closed with conclusion in Section 5.
Permission to make digital or hard copies of all or part of this work
for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial
advantage and that copies bear this notice and the full citation on the
first page. Copyrights for components of this work owned by others
than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, or republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
ASE BD&SI 2015, October 07-09, 2015, Kaohsiung, Taiwan
© 2015 ACM. ISBN 978-1-4503-3735-9/15/10…$15.00
DOI: http://dx.doi.org/10.1145/2818869.2818908
Permission to make digital or hard copies of all or part of this work
for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial
advantage and that copies bear this notice and the full citation on the
first page. Copyrights for components of this work owned by others
than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, or republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee. Request
permissions from Permissions@acm.org.
ASE BD&SI 2015, October 07-09, 2015, Kaohsiung, Taiwan
下载后可阅读完整内容,剩余6页未读,立即下载
weixin_38621630
- 粉丝: 3
- 资源: 914
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- Unity UGUI性能优化实战:UGUI_BatchDemo示例
- Java实现小游戏飞翔的小鸟教程分享
- Ant Design 4.16.8:企业级React组件库的最新更新
- Windows下MongoDB的安装教程与步骤
- 婚庆公司响应式网站模板源码下载
- 高端旅行推荐:官网模板及移动响应式网页设计
- Java基础教程:类与接口的实现与应用
- 高级版照片排版软件功能介绍与操作指南
- 精品黑色插画设计师作品展示网页模板
- 蓝色互联网科技企业Bootstrap网站模板下载
- MQTTFX 1.7.1版:Windows平台最强Mqtt客户端体验
- 黑色摄影主题响应式网站模板设计案例
- 扁平化风格商业旅游网站模板设计
- 绿色留学H5模板:科研教育机构官网解决方案
- Linux环境下EMQX安装全流程指导
- 可爱卡通儿童APP官网模板_复古绿色动画设计
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功