中文微博情感分析联合模型

46 浏览量更新于2024-08-29 收藏 337KB PDF 举报

"A Joint Model for Chinese Microblog Sentiment Analysis - 研究论文" 这篇研究论文探讨了中文微博情感分析的联合模型，旨在提升对特定话题用户态度识别的准确性。在社交媒体时代，微博成为了人们表达观点和情绪的重要平台，情感分析在中文微博中的应用具有重大的实际意义，例如市场分析、舆情监测和社会热点追踪等。作者提出了一种结合支持向量机（SVM）和深度神经网络（DNN）的联合模型。首先，他们构建了一个基于N-gram、词性标注（N-POS）和情感词典特征的支持向量机分类器。N-gram用于捕捉文本序列中的词汇组合模式，词性标注则提供了词语的语法和语义信息，而情感词典则包含了预定义的正面和负面情感词汇，这些特征对于情感判断至关重要。同时，他们应用卷积神经网络（CNN）来学习段落表示特征，这些特征作为另一个SVM分类器的输入。CNN在自然语言处理中常用于提取文本的局部特征，通过卷积层和池化层的操作，可以有效地从文本中抽取出有意义的结构信息。两个分类器分别进行分类后，其结果会被融合为最终的情感判断。这种集成方法利用了两种不同模型的互补优势，一方面利用SVM的高效稳定，另一方面利用DNN的强大表示学习能力，以提高整体的情感分析性能。实验部分，该研究可能是在SIGHAN-8（第八届汉语计算处理研讨会）上进行的，这是自然语言处理领域的国际会议，通常会展示最新的研究成果。论文展示了联合模型在中文微博客情感分析任务上的表现，并与其他方法进行了比较，证明了其有效性和改进。这篇论文为中文微博客情感分析提供了一种新的思路，结合传统的机器学习方法与深度学习技术，为提高社交媒体情感分析的准确性和实用性做出了贡献。这种方法不仅适用于微博，还可以推广到其他类型的文本数据，如论坛讨论、新闻评论等，对于理解大规模社交媒体数据中的公众情绪有着广泛的应用前景。

Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing (SIGHAN-8), pages 61–67,

Beijing, China, July 30-31, 2015.

2015 Association for Computational Linguistics and Asian Federation of Natural Language Processing

A Joint Model for Chinese Microblog Sentiment Analysis

Yuhui Cao, Zhao Chen, Ruifeng Xu

∗

, Tao Chen and Lin Gui

Shenzhen Engineering Laboratory of Performance Robots at Digital Stage,

Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China

caoyuhuiszu@gmail.com xuruifeng@hitsz.edu.cn

Abstract

Topic-based sentiment analysis for Chi-

nese microblog aims to identify the

user attitude on speciﬁed topics. In

this paper, we propose a joint model

by incorporating Support Vector Ma-

chines (SVM) and deep neural network

to improve the performance of senti-

ment analysis. Firstly, a SVM Clas-

siﬁer is constructed using N-gram, N-

POS and sentiment lexicons features.

Meanwhile, a convolutional neural net-

work is applied to learn paragraph rep-

resentation features as the input of an-

other SVM classiﬁer. The classiﬁcation

results outputted by these two classi-

ﬁers are merged as the ﬁnal classiﬁca-

tion results. The evaluations on the

SIGHAN-8 Topic-based Chinese mi-

croblog sentiment analysis task show

that our proposed approach achieves

the second rank on micro average F1

and the fourth rank on macro average

F1 among a total of 13 submitted sys-

tems.

1 Introduction

With the development of the Internet, mi-

croblog has become a popular user-generated

content platform where users share the newest

events or their personal feelings with each

other. Topic-based microblogs are the most

common interactive way for users to share

their opinions towards a speciﬁed topic. To

identify the opinions of users, sentiment anal-

ysis techniques are investigated to classify

texts into diﬀerent categorizations according

to their sentiment polarities.

Most existing sentiment classiﬁcation tech-

niques are based on machine learning al-

gorithms, such as SupportVectorMachine,

Naïve Bayes and MaximumEntropy. The

machine learning based approach uses feature

vectors as the input of classiﬁcation to pre-

dict the classiﬁcation results. Thus, feature

engineering, a method for extracting eﬀective

features from texts, plays an important role.

Some commonly used features in sentiment

classiﬁcation are unigram, bigram and senti-

ment words. However, these features cannot

work well for cross-domain sentiment classiﬁ-

cation because of the lack of domain knowl-

edge.

Danushka Bollegala et al. (2011) used mul-

tiple sources to construct a sentiment sensi-

tive thesaurus to overcome the lack of domain

knowledge. New sentiment words expansion is

another kind of approach to improve the per-

formance of sentiment analysis. Strfano Bac-

cianella et al. (2010) constructed SentiWord-

Net by extending WordNet with sentiment in-

formation. It is now widely used in sentiment

classiﬁcation for English. As for Chinese senti-

ment analysis, Minlie Huang et al. (2014) pro-

posed a new word detection method by mining

the frequent sentiment word patterns. This

method may discover new sentiment words

from a large scale of unlabeled texts.

With the rapid development of pre-trained

word embedding and deep neural networks,

a new way to represent texts and features

is devloped. Mikolov et al. (2013) showed

that word embedding represents words with

meaningful syntactic and semantic informa-

tion eﬀectively. Recursive neural network pro-

posed by Socher et al. (2011a; 2011b; 2013) is

shown eﬃcient to construct sentence represen-

tations based on the word embedding. Con-

volutional neural networks (CNN), another

deep learn model which achieved success in

image recognition ﬁeld, was applied to na-

ture language processing with word embed-

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38538472

粉丝: 5
资源: 858

中文微博情感分析联合模型

A Fine-Grained Emotion Analysis Method for Chinese Microblog

An improved topic detection method for Chinese microblog based on incremental clustering

LDA topic model for microblog recommendation

A Dynamic Model for Check-in Behavior in Microblog Community

A Microblog Information Dissemination Model Considering Immunization Skip

Preliminary Study of Chinese Word Segmentation and Part-of-Speech Tagging Being Used for Microblog Data

MicroBlog微博

microblog:Microblog Flask Mega教程

flask_microblog:flask_microblog

A novel approach of identifying user intents in microblog

最新资源