半监督模型在产品方面提取中的应用

需积分: 9 104 浏览量更新于2024-07-14 收藏 1.47MB PDF 举报

“在线方面知识指导下的产品方面提取”是一篇探讨产品方面提取的学术文章，主要关注如何在基于方面的观点挖掘中改善方面提取的效果和可解释性。文章指出，当前方法存在的问题包括生成过多未分类的细粒度方面以及将语义不相关的产品方面进行分类。为解决这些问题，作者提出了两种新颖的半监督模型，即细粒度标记的LDA（FL-LDA）和统一细粒度标记的LDA（UFL-LDA）。在这篇文章中，作者首先回顾了关于产品方面提取的前期研究，接着介绍了自己的创新方法。他们从电子商务网站上的详细产品描述中提取“播种方面”和相关术语，以此作为引导，再对产品评论进行重新分组，从而为主题建模提供更有效的文本上下文。FL-LDA模型利用这些播种方面来指导模型发现与之相关的词汇，而UFL-LDA模型则结合未标记的文档，扩展FL-LDA模型，提取与播种方面相关或评论中其他高频词汇。这两种半监督主题模型的目标是提高产品方面的可解释性和准确性。实验结果显示，提出的方法在性能上优于现有的最新方法，对于方面提取和意见挖掘领域具有显著的贡献。关键词包括：方面提取、产品方面、主题模型、意见挖掘和评论摘要。这篇研究的重要性在于，它不仅提出了新的算法，还强调了在处理大量在线数据时提高模型解释性的必要性。这对于电子商务、消费者行为分析和产品推荐系统等领域具有实际应用价值，能帮助企业和开发者更好地理解用户对产品的看法和需求，进而优化产品和服务。

The remainder of this paper is organized as follows. Related

work is discussed in Section 2. Section 3 introduces the proposed

semi-supervised methods. In Section 4, we describe the experi-

ments comparing the performance of the proposed models with

that of baseline methods. Finally, we conclude this paper and high-

light the directions of future work in Section 5.

2. Related work

2.1. Aspect-based opinion mining

As the amount of product reviews grows rapidly, aspect-based

opinion mining has become a hot research topic [1,2,19,10,

7,11,12,20]. Aspect-based opinion mining aims to produce a sum-

mary of customer opinion for each product aspect from reviews.

This task is technically challenging since it is a context-aware

and domain-dependent problem [21,22]. In reviews, users may

describe the same product aspect using different expressions. For

instance, in reviews of televisions, the expressions ‘‘screen’’ and

‘‘LED’’ refer to the same aspect of television display. Additionally,

the same opinion expression may deliver opposite sentiment

polarities in different domains. For example, the word ‘‘small’’ in

the expression ‘‘the small MP3 is portable’’ represents a positive sen-

timent, while it represents a negative sentiment in the expression

‘‘the bed in hotel is small’’.

To perform the task of aspect-based opinion mining, the pio-

neering works of [1,2] proposed a framework that is now widely

used. In this framework, the opinion-mining task is broken down

into two major subtasks, namely, aspect extraction and sentiment

classiﬁcation. First, the subtask of aspect extraction identiﬁes

expressions that describe aspects of products (which we call aspect

expressions in this paper), and groups semantically related expres-

sions together. The following subtask, sentiment classiﬁcation,

consists of recognizing the opinions associated with each aspect

and thence analyzing sentiment polarities for aspects [1,19,10,7].

Since the aspects extracted by the ﬁrst subtask are the basis of

analysis in the second, the quality of opinion mining is thus signif-

icantly inﬂuenced by the performance of aspect extraction.

2.2. Traditional methods of aspect extraction

The key issues of aspect extraction are the identiﬁcation of

aspect expressions and the categorization of semantically related

expressions. To address these issues, traditional frequent-string

based approaches extract frequent nouns and phrases as product

aspects [1,9–12]. A well-known limitation of these methods is that

they do not categorize the related expressions according to their

semantic content. Different attributes of the same product aspect

and domain-speciﬁc synonymous expressions are treated as differ-

ent aspects. The aspects extracted by these methods are too ﬁne-

grained. The generated aspects lack of organization and are thus

of limited help in providing useful information to users of the

systems.

Some methods that categorize ﬁne-grained aspects based on

lexical resources have been proposed [19,23]. However, they have

limited power in resolving the problem of domain-dependence,

since the expressions included in lexical resources is frequently

limited. In addition, some lexical synonyms may not describe the

same aspect in some domains. For example, the words ‘‘view’’

and ‘‘opinion’’ may be given as synonyms in a dictionary. In reviews

of cameras, the word ‘‘view’’, meaning the extent or range of vision,

is entirely unrelated to the word ‘‘opinion’’.

Other methods employ association rules and contextual infor-

mation to cluster semantically related aspects [24,25]. However,

the groups generated by these non-hierarchical clustering algo-

rithms may not be uniform because high-frequency terms generate

larger clusters than low-frequency terms. In [22,26], Zhai et al.

grouped some pre-extracted aspect expressions using both lexical

correlation and contextual similarity. However, these studies were

based on the assumption that the pre-extracted aspect expressions

were correct and thus did not take aspect expression identiﬁcation

as part of the method.

2.3. Topic modeling

A topic model is a hierarchical Bayesian model. It introduces a

latent variable topic between the observed variables document

and word to analyze the semantic topic distribution of documents.

In topic models, each document is represented as a random mixture

over latent topics, where each topic is characterized by a distribu-

tion over words [13]. Nowadays, topic models have been widely

used to perform dimensionality reduction in Information Retrieval.

PLSA (Probabilistic Latent Semantic Analysis) [27] and LDA

(Latent Dirichlet Allocation) [13] are two widely-used models. In

PLSA, each document is represented as a vector of topic propor-

tions. However, PLSA has no probabilistic model for these propor-

tions. Consequently, the number of parameters in PLSA grows

linearly with the size of a corpus, which may lead to overﬁtting

problems. Furthermore, PLSA provides no generative process to

assign probabilities for new documents outside of training set [13].

To address these problems inherent in PLSA, Blei et al. propose

the LDA model in [13]. They deﬁne a Dirichlet probabilistic gener-

ative process for document-topic distribution. In each document, a

latent aspect z

2 Z is chosen according to the multinomial distri-

bution h ¼ Pðz j

Þ, which is controlled by Dirichlet prior

. Given

Fig. 1. An example of product descriptions from Newegg.com.

88 T. Wang et al. / Knowledge-Based Systems 71 (2014) 86–100

剩余14页未读，继续阅读

weixin_38680247

粉丝: 4
资源: 922

半监督模型在产品方面提取中的应用

建设工程造价管理基础知识.pdf

音乐乐谱提取软件+音乐知识.

基于受限玻尔兹曼机的情感方面提取

统计基础知识与统计实务期末复习综合指导.doc

评估和比较中英文Web规模提取的知识库

植物领域知识图谱构建中本体非分类关系提取方法

关于酒店行销方面的理论知识啊

联合约束的LDA模型用于产品和观点的联合提取

根据清单扫描指定文件夹下并提取文件

产品包材验证作业指导.pdf

最新资源