语义引导的迭代模型提升关系抽取效果

148 浏览量更新于2024-08-26 收藏 1.48MB PDF 举报

关系提取是自然语言处理中的一个重要任务，其目标是从文本中识别出实体之间的关系。传统的关系提取方法往往依赖于模式（pattern-based methods），特别是迭代自举模型（iterative bootstrapping）。这类模型在初始阶段通过查找预定义的触发词（trigger word）或模式来识别潜在的关系，然后逐步扩大候选关系集合。然而，它们常常面临两个主要问题：语义漂移（semantic drift），即随着迭代增加，模型可能逐渐偏离原始语义；以及低召回率（low recall），即模型可能错过部分实际存在的关系。本文提出了一个创新的语义引导框架，旨在解决这些问题。该框架的核心思想在于结合模式的语义信息和灵活的匹配策略。首先，框架引入了对这类自举模型的正式化（formalization），使得在每次迭代中可以加入语义约束，确保学习过程更加精确，避免了由于不恰当的泛化导致的语义漂移。其次，通过采用灵活的自下而上的内核（bottom-up kernel），该框架能够更细致地比较不同模式，从而提高匹配的准确性，减少遗漏（improve recall）。实验部分，作者将这一框架应用到了文本分析会议（TAC）上的知识型人口（KBP）英语插槽填充（ESF）任务中，这是一个典型的关系抽取基准测试环境。结果显示，相比于现有的关系提取技术，使用这个新框架能显著提升性能，显示出其在实际场景中的有效性和可靠性。构建语义自举模型对于关系提取至关重要，它不仅解决了传统方法的问题，还通过引入语义导向和灵活的匹配机制，提高了模型的精度和召回率。这为关系抽取领域的研究者提供了一个新的视角和有效的工具，有助于推动该领域的发展。在未来的研究中，这种语义引导框架有可能被进一步优化和扩展，以适应更多语言和更复杂的关系类型。

Construction of semantic bootstrapping models for relation extraction

Chunyun Zhang

⇑

, Weiran Xu

, Zhanyu Ma

, Sheng Gao

, Qun Li

, Jun Guo

Pattern Recognition and Intelligent System Laboratory, Beijing University of Posts and Telecommunications, Beijing, China

School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, China

article info

Article history:

Received 14 July 2014

Received in revised form 16 March 2015

Accepted 17 March 2015

Available online 25 March 2015

Keywords:

Relation extraction

Bootstrapping

Trigger word

Kernel

Pattern learning

abstract

Traditionally, pattern-based relation extraction methods are usually based on iterative bootstrapping

model which generally implies semantic drift or low recall problem. In this paper, we present a novel

semantic bootstrapping framework that uses semantic information of patterns and ﬂexible match

method to address such problem. We introduce formalization for this class of bootstrapping models,

which allows semantic constraint to guide learning iterations and use ﬂexible bottom-up kernel to com-

pare patterns. To obtain the insights of reliability and applicability of our framework, we applied it to the

English Slot Filling (ESF) task of Knowledge Based Population (KBP) at Text Analysis Conference (TAC).

Experimental results show that our framework obtains performance superior to the state of the art.

1. Introduction

Relation extraction (RE) is an important but unsolved problem

in information extraction (IE). It focuses on extracting structured

relations from unstructured sources such as documents or webs,

which can potentially beneﬁt a wide range of natural language pro-

cessing (NLP) tasks such as question answering, ontology learning,

and summarization [1].

To solve the RE problem, a number of machine learning

approaches have been recently applied. One common paradigm

is the usage of bootstrapping [2] to learn relation patterns. The

popularity of this framework lies in its ability to learn sufﬁcient

patterns and instances simply by iterations starting from a small

number of seeds. Its central assumption is the pattern-relation

duality principle [3] that good seed samples lead to good patterns,

while good patterns help to extract good instances. Here, good pat-

terns are usually referred to patterns that have high coverage (high

recall) and low error rate (high precision), and good instances are

instances that are realized by good patterns. Systems such as

DIPRE [3], Snowball [4], and ExDisco [5] took a small set of

domain-speciﬁc examples as seeds and an unannotated corpus as

input. The seed examples can be either target relation instances

or sample linguistic patterns in which the linguistic arguments

correspond to the target relation arguments. New instances or

new patterns will be found in the documents where the seed is

located. The new instances or patterns will be used as new seed

for the next iteration. However, Komachi’ analysis in [6] showed

that semantic drift is an inherent property of iterative bootstrap-

ping algorithms and, therefore, poses a fundamental problem.

Hence, these systems without semantic constraint are greatly trou-

bled by the problem of semantic drift.

Relation patterns are deﬁned as the structured features of the

context of the entity and its attribute value (e.g. Bill Gates and

Microsoft of the relation org:founded_by of organization entity) in

a target relation mentioning [7]. Consequently, how well the sys-

tem performs largely depends on how well patterns are repre-

sented. However, most existing patterns are with inﬂexible

representation or without semantic constraint. Patterns in [3,4,

7–9] using shallow syntactic features have poor performances in

the extraction of the relations that are ambiguous or lexically dis-

tant in their expression. Dependency patterns [10–15] have been

shown to perform better, since they are more informative for rela-

tion extraction. The shortest dependency pattern (SDP) and the

subject–verb–object (SVO) pattern, among other dependency pat-

terns, are two commonly used patterns [10,1,12,13]. However,

due to less semantic constraint, they gain the generality at the cost

of lacking speciﬁc information and thus may produce semantic

drift in bootstrapping iterations.

Similarity method, a measure which determines whether a pat-

tern or instance derived from a new sentence is relation oriented or

not, is another important key method for bootstrapping model.

Unfortunately, the existing similarity methods are rigid or unsuit-

able for extracting relations expressed in complex structure pat-

terns, since they cannot weigh the relative importance of

different features of patterns only by using exact match method

[3,7,8] or cosine-like method [12,4]. Kernel methods

[16,10,11,17,15] have been proven to be effective in measuring

http://dx.doi.org/10.1016/j.knosys.2015.03.017

⇑

Corresponding author.

E-mail address: zhangchunyun1009@126.com (C. Zhang).

Knowledge-Based Systems 83 (2015) 128–137

Contents lists available at ScienceDirect

Knowledge-Based Systems

journal homepage: www.elsevier.com/locate/knosys

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38701407

粉丝: 5
资源: 917

语义引导的迭代模型提升关系抽取效果

用于微博情感分析的一种情感语义增强的深度学习模型.pdf

基于关联语义链接模型的课程依赖图自动构建

基于语义的三维CAD模型可重用区域自动提取.pdf

远程监督神经网络的关系提取模型

基于语义网格的语义关联存储模型及管理和通信平台973语.pptx

中文基本复合名词短语语义关系体系及知识库构建

树核驱动的语义关系提取：融合语法与语义信息的新方法

基于语义关系图的词语语义相关度计算新模型

CMACC: 探索语义网络数据模型

基于语义的自动文摘技术：统计主题与语义知识模型

最新资源