LDA融合情感动态话题检测模型提升主题识别准确性

153 浏览量更新于2024-08-27 收藏 449KB PDF 举报

本文主要探讨了融合情感极性的动态话题检测模型(Dynamic Sentiment Topic, DST)，它在传统的静态主题模型基础上进行了创新。静态主题模型通常侧重于单词之间的统计关联，然而，它们忽视了情感趋势和时态特征对主题检测结果的重要影响。在现实世界的数据分析中，如社交媒体监控、新闻聚合或用户评论分析，这些动态属性对于理解主题的情感色彩和变化趋势至关重要。 DST模型基于Latent Dirichlet Allocation (LDA) 的架构，LDA 是一种广泛应用于文本挖掘的无监督学习方法，用于发现文档中的隐藏主题。DST模型在此基础上增加了情感分析的维度，通过捕捉文本中的情感倾向来增强主题的动态性和实时性。具体来说，模型采用最大似然估计和滑动窗口技术，将时间序列数据中的情感信息与话题变化结合起来，从而能够实时地检测和跟踪不同话题的变化，以及大众情感对这些话题的整体趋势。参数估计方面，模型采用了Gibbs采样方法，这是一种常见的概率采样技术，在高维空间中有效估计模型参数，使得模型能够在复杂的数据分布下收敛。此外，随机EM算法被用来进行模型推理，这有助于优化模型的性能，提高模型在实际应用中的鲁棒性和准确性。实验部分，研究者在真实数据集上对比了DST模型与现有算法的表现，结果显示DST模型在捕捉情感变化、识别动态话题和准确评估主题情感倾向上具有显著优势。这证明了融合情感极性的动态话题检测模型在现代信息处理和大数据分析中的实用价值，特别是在需要实时反映和理解用户情绪波动和热点事件发展的领域。总结来说，本文贡献了一个创新的机器学习模型，它将情感分析与LDA相结合，有效地解决了传统主题模型忽视时间和情感因素的问题，为动态话题检测提供了更为精确和全面的方法。这种模型的应用潜力巨大，可以广泛应用于社交媒体监控、舆情分析、市场趋势预测等多个领域。

Dynamic Topic Detection Model by Fusing Sentiment Polarity

Xi Ding

, Lanshan Zhang

, Ye Tian

, Xiangyang Gong

and Wendong Wang

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications

Beijing, 100876, China

School of Digital Media and Design Arts, Beijing University of Posts and Telecommunications

Beijing, 100876, China

Email：dingxi515@163.com,zls326@sina.com,{yetian,xygong,wdwang}@bupt.edu.cn

Abstract

Traditional static topic models mainly focus on the

statistical correlation between words, but ignore the

sentiment tendency and the temporal properties which may

have great effects on topic detection results. This paper

proposed an LDA-based dynamic sentiment-topic (DST)

model, which could not only detect and track topics but

could also analyse the shift of general’s sentiment

tendency towards certain topic. This model combines the

data with the sentiment and dynamic properties of time by

maximum likelihood estimation and the sliding window.

We use Gibbs sampling method to estimate and update

model parameters, and use random EM algorithm for

model reasoning. Experiments on real dataset demonstrate

that DST model outperforms the existing algorithms.

Keywords: dynamic sentiment topic model, sentiment

analysis, Gibbs sampling.

1 Introduction

With the rapid development of Web 2.0 technology,

social media, represented by Facebook, Twitter, Blog and

Weibo, has really taken off in the last few years. More and

more people choose social media as main information

exchange platform to publish and access real-time

information. In addition to daily communication functions,

the huge amount of user generated content (UGC) often

also carries people’s attitudes towards certain public

events or products/services. Existing researches show that

compared with traditional media channels, social media

data which conveys public sentiment has more important

social and economic value for community management

agency, enterprise and the general public. Affected by this,

the sentiment analysis and topic detection of unstructured

social media data has emerged as a hot research hotspot

among the various tasks of social networking analysis

area.

Researchers who are concentrating on the area of

sentiment analysis mainly focus on studying the sentiment

polarity (positive, neutral, negative) classification methods

of the social media data. Although a host of research

achievements have been disclosed [Pang, B. and Lee. L][

Aue, A. and Gamon, M][Turney, P, D], they are mostly

based on supervised learning methods, and there still

existed two limitations. First, a lot of manually annotated

appeared at the Thirty-Eighth Australasian Computer Science

Conference, ACSC 2015, Sydney, Australia, January 2015.

Conferences in Research and Practice in Information Technology

(CRPIT), Vol. 159. David Parry, Ed. Reproduction for academic,

not-for-profit purposes permitted provided this text is included.

samples are needed for parameters adjustment process.

Second, sentiment classifier trained from certain topic area

often does not apply to other topic areas since the

sentiment distribution and topic content are closely

related. Besides, an unavoidable problem is that the

sentiment word dictionary that was trained from

traditional corpus could not be applied to social data that

are flexibly expressed with much more emotion icons and

disjunctive questions used, considering the huge

difference of language style between social media and

traditional media. Topic detection derives from TDT

(Topic Detection and Tracking) technology, which mainly

focuses on detecting and organizing unknown topics from

traditional formal expressed text stream. Topic detection

task consists of two branches: historical topic detection

and online topic detection. For the former, its objective is

to dig out the hidden topics from a given corpus with

unsupervised clustering means. Each cluster corresponds

to a certain topic. While online topic detection determines

whether newly arrived text stream belongs to an existing

topic or a new topic according to historic information.

Compared with newspaper, periodicals and academic

report, social media are most non-standard expressed

unstructured short texts in real-time formalism. These

features make the task of topic detection which takes

social media as study object more challenging.

Sentiment analysis and topic detection are taken as two

independent research tasks in current social networking

analysis fields. However, “sentiment” and “topic” are two

highly associated concepts. On one hand, the generation

and spread process of sentiment must rely on a certain

body, i.e., a specified topic. On the other hand, the change

of sentiment would react on its carrier, i.e., certain

specified topic, and consequently affect the evolution of

topic. Take the event “the loss of communication of flight

M370” as example. At the beginning stage, anxiety and

trepidation are circulating widely among the people who

express concern about this event. As time goes on, the

mood gradually evolves into sadness and discontent.

Under the influence of this mood, M370 event evolves into

a new stage of “relatives’ doubt about Malaysian Airline’s

emergency treatment” from “the lost of aircraft” and “the

search and rescue” .Thus it could be seen, the probability

distribution of topic is affected by sentiment, and these two

concepts are highly relevant.

By fusing the polarity of sentiment, a dynamic

sentiment-topic (DST) model is proposed based on Latent

Dirichlet Allocation (LDA) in this paper. LDA model

assumes that, a document is composed of various topics

with different probabilistic combination, within which

each topic itself is also a probability distribution of a series

of words. For a given corpus, LDA detects all hidden

Proceedings of the 38th Australasian Computer Science Conference (ACSC 2015), Sydney, Australia,

27 - 30 January 2015

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38720762

粉丝: 5
资源: 943

LDA融合情感动态话题检测模型提升主题识别准确性

Python做文本情感分析之情感极性分析

情感词典情感极性词典

中文情感词库

情感计算在客户投诉中的应用.pptx

情感动态分析：社交网络话题与情感演变的追踪技术

LSTM情感分析模型调参与优化：让模型更懂你的心

Seq2Seq模型在社交媒体中的应用与潜力：连接用户、创造价值

情感分析中常见的机器学习算法简介

C 深度学习中的面部识别与情感分析

异常检测在NLP中的应用：机器学习方法与实践

最新资源