词对齐模型在在线评论中提取观点目标与观点词的新方法

158 浏览量更新于2024-07-14 收藏 1.75MB PDF 举报

"这篇研究论文探讨了一种基于词对齐模型的方法，用于从在线评论中同时提取意见目标和意见词，这是细粒度情感分析的关键任务。该模型旨在通过识别词之间的观点关系来提高准确度，尤其对于长跨度的关系表现更优，并能减轻处理非正式在线文本时语法解析错误的影响。" 在当前的互联网环境中，大量的在线评论提供了丰富的用户观点，这些观点涵盖了产品、服务等多方面的反馈。然而，如何有效地从这些评论中自动提取出关键的意见目标（即评论者关注的对象）和意见词（表达评价的词汇），是信息抽取和情感分析领域的一大挑战。本文提出了一种新颖的半监督词对齐模型，以解决这一问题。传统的近邻规则方法往往依赖于相邻词汇的共现来识别观点关系，但这种方法在处理复杂、长距离的观点关联时可能会失效。而句法分析方法虽然可以捕获结构信息，但对非正式文本中的解析错误敏感，这在处理网络语言时是个显著的问题。该词对齐模型引入了一个基于图的协同排名算法，用于估计每个候选关系的置信度。通过对评论中的词汇进行对齐，模型能够更精确地捕捉到观点间的关联，不仅包括短距离的关联，还特别强化了对长跨度关系的识别能力。此外，与传统的无监督对齐模型相比，该模型由于利用了部分监督信息，因此在精度上有所提升。实验结果表明，该模型在提取意见目标和意见词方面表现出色，特别是在处理包含复杂观点结构和非正式语言的在线评论时，能够有效缓解语法解析错误的影响，提高了提取的准确性。这为未来的情感分析和信息抽取系统提供了更高效、更精确的工具，有助于更好地理解用户的观点和情绪，为产品改进和服务优化提供数据支持。

proposed to estimate each candidate’s conﬁdence on

the graph. In this process, we penalize high-degree

vertices to weaken their impacts and decrease the

probability of a random walk running into unrelated

regions on the graph. Meanwhile, we calculate the

prior knowledge of candidates for indicating some

noises and incorporating them into our ranking algo-

rithm to make collaborated operations on candidate

conﬁdence estimations. Finally, candidates with

higher conﬁdence than a threshold are extracted.

Compared to the previous methods based on the

bootstrapping strategy, opinion targets/words are

no longer extracted step by step. Instead, the conﬁ-

dence of each candidate is estimated in a global pro-

cess with graph co-ranking. Intuitively, the error

propagation is effectively alleviated.

To illustrate the effectiveness of the proposed method, we

select real online reviews from different domains and lan-

guages as the evaluation datasets. We compare our method

to several state-of-the-art methods on opinion target/word

extraction. The experimental results show that our approach

improves performance over the traditional methods.

2RELATED WORK

Opinion target and opinion word extraction are not new

tasks in opinion mining. There is signiﬁcant effort focused

on these tasks [1], [6], [12], [13], [14]. They can be divided

into two categories: sentence-level extraction and corpus-

level extraction according to their extraction aims.

In sentence-level extraction, the task of opinion target/

word extraction is to identify the opinion target mentions or

opinion expressions in sentences. Thus, these tasks are usu-

ally regarded as sequence-labeling problems [13], [14], [15],

[16]. Intuitively, contextual words are selected as the fea-

tures to indicate opinion targets/words in sentences. Addi-

tionally, classical sequence labeling models are used to

build the extractor, such as CRFs [13] and HMM [17]. Jin

and Huang [17] proposed a lexicalized HMM model to per-

form opinion mining. Both [13] and [15] used CRFs to

extract opinion targets from reviews. However, these meth-

ods always need the labeled data to train the model. If the

labeled training data are insufﬁcient or come from the dif-

ferent domains than the current texts, they would have

unsatisﬁed extraction performance. Although [2] proposed

a method based on transfer learning to facilitate cross-

domain extraction of opinion targets/words, their method

still needed the labeled data from out-domains and the

extraction performance heavily depended on the relevance

between in-domain and out-domain.

In addition, much research focused on corpus-level

extraction. They did not identify the opinion target/word

mentions in sentences, but aimed to extract a list of opinion

targets or generate a sentiment word lexicon from texts.

Most previous approaches adopted a collective unsuper-

vised extraction framework. As mentioned in our ﬁrst sec-

tion, detecting opinion relations and calculating opinion

associations among words are the key component of this

type of method. Wang and Wang [8] adopted the co-occur-

rence frequency of opinion targets and opinion words to

indicate their opinion associations. Hu and Liu [5] exploited

nearest-neighbor rules to identify opinion relations among

words. Next, frequent and explicit product features were

extracted using a bootstrapping process. Only the use of co-

occurrence information or nearest-neighbor rules to detect

opinion relations among words could not obtain precise

results. Thus, [6] exploited syntax information to extract

opinion targets, and designed some syntactic patterns to

capture the opinion relations among words. The experimen-

tal results showed that their method performed better than

that of [5]. Moreover, [10] and [7] proposed a method,

named as Double Propagation, that exploited syntactic rela-

tions among words to expand sentiment words and opinion

targets iteratively. Their main limitation is that the patterns

based on the dependency parsing tree could not cover all

opinion relations. Therefore, Zhang et al. [3] extended the

work by [7]. Besides the patterns used in [7], Zhang et al. fur-

ther designed speciﬁc patterns to increase recall. Moreover,

they used an HITS [18] algorithm to compute opinion target

conﬁdences to improve precision. Liu et al. [4] focused on

opinion target extraction based on the WAM. They used a

completely unsupervised WAM to capture opinion relations

in sentences. Next, opinion targets were extracted in a stan-

dard random walk framework. Liu’s experimental results

showed that the WAM was effective for extracting opinion

targets. Nonetheless, they present no evidence to demonstrate

the effectiveness of the WAM on opinion word extraction.

Furthermore, a study employed topic modeling to iden-

tify implicit topics and sentiment words [19], [20], [21], [22].

The aims of these methods usually were not to extract an

opinion target list or opinion word lexicon from reviews.

Instead, they were to cluster for all words into corresponding

Fig. 2. Mining opinion relations between words using partially supervised alignment model.

638 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 3, MARCH 2015

剩余14页未读，继续阅读

weixin_38700779

粉丝: 11

词对齐模型在在线评论中提取观点目标与观点词的新方法

词语对齐-NLP

论文研究-一种基于词对齐的中文深层语义解析模型.pdf

【模型输出深度解读】：迁移学习特征提取全过程剖析

MATLAB共轭运算与计算机视觉：揭示图像处理和模式识别的本质

cole_02_0507.pdf

工程硕士开题报告：无线传感器网络路由技术及能量优化LEACH协议研究

【东海期货-2025研报】东海贵金属周度策略：金价高位回落，阶段性回调趋势初现.pdf

图像数据处理工具+数据(帮助用户快速划分数据集并增强图像数据集。通过自动化数据处理流程，简化了深度学习项目的数据准备工作)

diminico_02_0709.pdf

agenda_3cd_01_0716.pdf

最新资源