基于半监督学习的海量微博情感分析

178 浏览量更新于2024-08-28 收藏 119KB PDF 举报

本文主要探讨了基于半监督学习的中文微博情感分析方法，针对新浪微博大约269百万条数据进行深入研究。作者们采用了Bootstrap（一种自举式学习策略）作为核心技术，结合支持向量机（SVM）算法，对主观性、客观性和极性分类进行了细致处理。这种方法的关键在于利用少量标注数据进行自动学习，从而扩展种子样本规模，显著提高了SVM在情感分类中的性能。 SVM在这里扮演了关键角色，通过迭代方法优化模型，使得情感分析更为精确。研究者还引入了一个权重因子，用于在后续训练过程中控制新种子样本的权重，进一步提升了分类的准确性。实验结果显示，与传统依赖大量人工标注相比，基于Bootstrap的中文微博情感分析显著节省了时间和人力成本，同时取得了更好的性能。具体来说，在主观性和客观性分类中，作者们达到了62.9%的最佳准确率，这表明该方法在处理中文社交媒体文本的复杂性和多义性方面表现出色。此外，由于半监督学习的特性，这种方法能够有效地处理大量未标注数据，对于实时监控公众情绪波动，如品牌声誉管理、市场趋势分析等领域具有实际应用价值。总结来说，这篇文章的主要贡献在于提出了一种有效且经济的中文微博情感分析框架，通过结合Bootstrap和SVM，能够在大规模数据集上实现高精度的情感分类，为社交媒体数据分析提供了新的思路和技术支撑。

Chinese Micro Blog Sentiment Analysis Based on Semi-

supervised Learning

ZHU Shaojie

, XU Bing

, ZHENG Dequan

, and ZHAO Tiejun

Harbin Institute of Technology, School of Computer Science and Technology,

150001 Harbin, China

{sjzhu,xb,dqzheng,tjzhao}@mtlab.hit.edu.cn

Abstract. This paper adopts a semi-supervised method which is based on

Bootstrapping to analyze Sina micro blog data which size is about 269M. The

Support Vector Machine (SVM) method is used in subjective and objective

classification and polarity classification. Our method can extend the size of seed

samples by learning automatically with a small size of labeled corpus. It can

improve the ability of sentiment classification of SVM by using the iteration

method. A weighted factor to control the weight of new seed samples during the

following training process can improve classification performance. The

experiment results show that sentiment analysis of Chinese micro blog based on

Bootstrapping not only save much time of Manual annotation, but also can get

better performance. The results of subjective and objective classification

achieve the best accuracy rate of 62.9%, and the best accuracy rate of sentiment

polarity classification is 57%.

Keywords: semi-supervised learning; Bootstrapping; Support Vector Machine;

micro blog sentiment analysis

1 Introduction

With the emergences of micro blog, a lot of users are organized into social network,

which satisfies the personalization publication of users' information, sociality

transmission and social communication needs. Micro blog has characteristics with

social media as well as instant messaging. Plenty of users express their personal views

and emotions freely on various hot events, characters, and products, etc., and this

information has great commercial value and useful value. Facing the challenges of

information explosion, people need to acquire this information much faster and more

effectively. Micro blog sentiment analysis is generated in this context, and becomes a

hot research problem in recent years.

Micro blog sentiment analysis mainly refers to analyze micro blog subjective

information with the existing sentiment analysis technology. At present mature

research results mainly focus on Twitter sentiment analysis, and research contents

include subjective and objective classification and sentiment polarity classification. It

is just a beginning of Chinese micro blog sentiment analysis. This paper will do

research on Chinese micro blog sentiment analysis.

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38528888

粉丝: 3
资源: 915

基于半监督学习的海量微博情感分析

Deep learning for sentiment analysis A survey 深度学习情感分析综述

Visual And Text Sentiment Analysis through Hierarchical Deep Learning Networks

Sentiment Analysis for Chinese Text Based on Emotion Degree Lexicon and Cognitive Theories

Generating domain-specific affective ontology from Chinese reviews for sentiment analysis

A sentiment analysis parallel algorithm based on MapReduce for network information

Sentiment Classifier base on Maching learning methods

The study of sentiment analysis on Chinese comment texts

Sentiment Analysis on Movie Reviews

Sentiment_analysis_deep_learning

Research on Sentiment Tendency Analysis of Microtext Based on Sense Group

最新资源