短文本情感主题检测：时间-用户情感/主题LDA模型

研究论文

134 浏览量更新于2024-08-26 收藏 655KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇研究论文提出了一种名为时间-用户情感/主题潜 Dirichlet 分配（TUS-LDA）的联合模型，旨在解决社交媒体上短文本情感感知主题检测的问题。传统的基于 LDA 的情感/主题模型在处理如推特和电商短评等短暂文本时，会遇到上下文稀疏问题。TUS-LDA 通过将同一时间片或同用户发布的帖子聚合为伪文档来缓解这个问题。此外，论文还介绍了参数推断的方法以及如何将先验知识融入 TUS-LDA 模型。实验在 Sentiment140 数据集和电子产品的推文数据上进行了验证，证明了新模型的有效性。" 在这篇研究论文中，作者关注的是社交媒体上的情感感知主题检测，这是一个重要的研究领域，因为社交媒体上的大量用户生成内容（如推文和电商短评）反映了公众对不同话题的情绪。传统的情感/主题模型，如基于 LDA（Latent Dirichlet Allocation）的模型，通常适用于处理长篇评论数据，但不适用于短文本，主要原因是短文本缺乏丰富的上下文信息，导致“上下文稀疏”问题。为了解决这个问题，论文提出了TUS-LDA模型。该模型的独特之处在于它将同一时间窗口内的帖子或者同一位用户的帖子组合成一个虚拟文档，这样可以增加上下文的信息量，从而缓解了上下文稀疏问题。这种时间-用户维度的聚合方法使得模型能更好地捕捉到短期趋势和用户特定的偏好。此外，TUS-LDA模型还包含了参数推断的策略，这是模型学习过程中的关键步骤，它确保模型能够从数据中有效地学习和更新参数。同时，模型考虑了先验知识的融合，这可能来自于领域专家的见解或者已有的情感分析数据，使得模型在没有足够训练数据的情况下也能得到更准确的结果。在实验部分，作者使用了 Sentiment140 数据集和电子产品的推文数据进行验证。Sentiment140 是一个广泛用于情感分析研究的数据集，包含了大量带有情感标签的推文。通过对这些数据的分析，作者展示了TUS-LDA模型相比于传统方法在情感感知主题检测上的优越性能。这篇论文贡献了一个创新的模型，即TUS-LDA，它对于理解和分析社交媒体上的情感趋势和主题具有实际应用价值。该模型的提出不仅解决了短文本情感分析的挑战，也为未来的研究提供了新的思路和方法。

资源详情

资源推荐

A Joint Model for Sentiment-Aware Topic Detection on

Social Media

Kang Xu and Guilin Qi and Junheng Huang and Tianxing Wu

Abstract. Joint sentiment/topic models are widely applied in

detecting sentiment-aware topics on the lengthy review data and

they are achieved with Latent Dirichlet Allocation (LDA) based

model. Nowadays plenty of user-generated posts, e.g., tweets and

E-commerce short reviews, are published on the social media and

the posts imply the public’s sentiments (i.e., positive and negative)

towards various topics. However, the existing sentiment/topic mod-

els are not applicable to detect sentiment-aware topics on the posts,

i.e., short texts, because applying the models to the short texts di-

rectly will suffer from the context sparsity problem. In this paper,

we propose a Time-User Sentiment/Topic Latent Dirichlet Alloca-

tion (TUS-LDA) which aggregates posts in the same timeslice or

user as a pseudo-document to alleviate the context sparsity prob-

lem. Moreover, we design approaches for parameter inference and

incorporating prior knowledge into TUS-LDA. Experiments on the

Sentiment140 and tweets of electronic products from Twitter7 show

that TUS-LDA outperforms previous models in the tasks of senti-

ment classiﬁcation and sentiment-aware topic extraction. Finally, we

visualize the sentiment-aware topics discovered by TUS-LDA.

1 Introduction

With the rapid growth of Web 2.0, a mass of user-generated posts,

e.g., tweets and E-commerce short reviews, which capture people’s

interests, thoughts, sentiments and actions. The posts have been accu-

mulating on the social media with each passing day. Sentiment anal-

ysis attempts to ﬁnd user preference, likes and dislikes from the posts

on social media, such as reviews, blogs and microblogs [21] and topic

modeling attempts to discover the topics or aspects from from re-

views, blogs and microblogs etc [3]. Topic modeling and sentiment

analysis on the posts are two signiﬁcant tasks which can beneﬁt many

people. For example, we can discover a topic about “Apple Inc.” and

the overall sentiment of the topic. The sentiment of the topic about

“Apple Inc.” is implicitly associated with the stock trading of “Apple

Inc.”, because negative sentiments towards the company on social

media can fall sales and ﬁnancial gains but positive sentiments can

improve sales [2]. Topic modeling [1] focuses on extracting word-

level or document-level topics, while sentiment analysis [23] is to

analyze the sentiments of words or documents.

Topic modeling and sentiment analysis on the social media are

complementary where sentiments on the social media often change

over different topics and topics on the social media are always re-

lated to public sentiments. So jointly modeling topics and sentiments

on the social media is a feasible and signiﬁcative task and it can re-

ﬂect people’s sentiment on different topics. However, unlike the nor-

Southeast University, Nanjing, China

Email: {kxu,gqi,jhhuang,wutianxing}@seu.edu.cn

mal documents (e.g., news and long reviews), the short and informal

characteristic of the posts, e.g., tweets and short reviews, on the so-

cial media makes the tasks of topic modeling and sentiment analysis

more challenging.

By jointly modeling topics and sentiments on social media, we

want to obtain sentiment-aware topics from the posts, e.g., a topic

about “Apple Inc.” (‘ipad’, ‘iphone’, ‘itouch’, ‘imac’, ‘beautiful’ and

‘popular’) with the overall sentiment polarity “positive”. Topic mod-

els, e.g., LDA [1] and pLSA [10], originally focus on mining top-

ics from texts, but the models can also be extended to extract an

extra aspect of texts, i.e., sentiment. Conventional sentiment-aware

topic models, like Joint Sentiment/Topic Model (JST) [15] and As-

pect/Sentiment Uniﬁcation Model (ASUM) [11], are utilized for un-

covering the hidden topics and sentiments from text corpus where

each document is a mixture of sentiment/topics and each senti-

ment/topic is a mixture of words. Thereinto, each sentiment label

in the models is viewed as a special kind of topic where topics are

unknown and data-driven but sentiments are known and speciﬁed.

However, for the short and informal characteristic of the posts, ap-

plying the models to the short posts on the social media directly al-

ways suffers from the context sparsity problem. So the models fail to

recognize the accurate sentiments and senses of words in the posts.

One simple and effective way to alleviate the sparsity problem is

to aggregate short posts into lengthy pseudo-documents [5, 31]. Here

we assume that the posts on the social media are a mixture of two

kinds of topics: temporal topics which are related to current events

(e.g., tweets about a topic “Announcement of iphone SE” in Fig 1(a)

which are produced in a timeslice) and stable topics which are related

to personal interests (e.g., tweets about a topic “Apple products” in

Fig 1(b) which are produced by a user). Thereinto, temporal topics

are sensitive to time. If posts belong to temporal topics, we aggregate

the posts in the same timeslice as a single document. We assume each

timeslice is a mixture of sentiment-aware topics, i.e., each sentiment

in the timeslice corresponds to several topics. Similar to temporal

topics, stable topics are related to speciﬁc users and each user is a

mixture of sentiment-aware topics. If a post belongs to a temporal

topic, the post is assigned to a sentiment-aware topic in its publishing

timeslice; otherwise, it is assigned to a sentiment-aware topic in its

publishing user.

Moreover, based on the analysis of the characteristics of topics and

sentiments, we exploit the important observation of topics: A single

post always talks about a single topic [31]. Although a post usually

talks about a single topic, a post may talk about multiple aspects of

the topic with different sentiment polarities [12, 18].

For example, while the following short review of cannon cam-

era from Amazon.com expresses the overall sentiment polarity of

Camera, which corresponds to the part in italics, as positive, it addi-

ECAI 2016

G.A. Kaminka et al. (Eds.)

This article is published online with Open Access by IOS Press and distributed under the terms

of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).

doi:10.3233/978-1-61499-672-9-338

338

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38630571

粉丝: 8
资源: 943

短文本情感主题检测：时间-用户情感/主题LDA模型

基于python实现的社交媒体舆论场虚假账号检测项目源码.zip

社交媒体情感分析python

spyder使用机器学习算法,对社交媒体上的文本进行情感分析,判断文本的情绪倾向(如

基于python的情感分析算法在社交媒体中的应用研究

情感分析各个模型应用领域

如何分析社交印象感知程度

社交媒体 遗书 评论内容 共情理论 计算传播

社交媒体监测工具，工作原理

python情感分析模型

详细介绍一下情感分析模型

社交媒体数据挖掘与分析 pdf

用朴素贝叶斯算法探索社交媒体算法原理

Hadoop社交平台情感分析

KeyATM主题模型的建模方法和步骤是什么

社交媒体有哪些新的领域

人工智能生成的社交媒体算法

nlp情感分析模型介绍

pycharm实现人类情感分析模型

最新资源

社交媒体遗书评论内容共情理论计算传播