生成对抗网络驱动的实体事件抽取新框架

需积分: 0 107 浏览量更新于2024-08-05 收藏 963KB PDF 举报

本文探讨了基于生成对抗性 imitation learning（GAN）的联合实体和事件提取（JointEntityandEventExtraction，简称 EE）方法。在传统事件提取任务中，目标是识别文本中的事件触发词和相关参与者，以构建事件结构，这对于理解和分析自然语言文本至关重要。传统的 EE 方法通常依赖于监督学习或规则驱动的系统，然而，这些方法可能无法处理各种难度级别的实例，并且奖励或惩罚机制（即收益）可能不够灵活。作者提出了一种新颖的框架，引入了生成对抗网络（GAN），将逆强化学习（Inverse Reinforcement Learning，IRL）的概念应用于 EE 中。在这个框架下，他们假设每个实例和对应的标签具有不同程度的复杂性，因此期望的收益和惩罚（奖励）会有所不同。他们利用一个鉴别器来评估提取器（即模型）根据其与地面真理（专家提供的标签）之间的差异所获得的适当奖励。鉴别器通过比较模型的预测与实际标签的差距，提供指导以改进模型的性能。这个创新方法的优势在于，通过生成对抗的过程，模型能够学习到更加丰富的上下文信息和潜在的表示，从而提高对复杂事件结构的捕捉能力。实验结果显示，这种方法相较于当前最先进的 EE 算法，展现出更好的性能和泛化能力。这表明，生成对抗性 imitation learning 在事件和实体提取任务中具有显著的优势，为该领域的研究开辟了新的可能性。总结来说，本文的主要贡献包括： 1. 提出了一种基于 GAN 的逆强化学习框架，它能自适应地处理不同难度级别的 EE 任务。 2. 利用鉴别器来动态分配奖励，根据模型预测与真实标签的差异优化提取过程。 3. 实验证明了该方法在 EE 上的优越性能，可能推动了领域内模型设计的革新。这个研究对于那些关注自然语言处理、事件抽取以及深度学习在信息提取任务中应用的学者和工程师来说，提供了有价值的理论支持和实践指导。未来的研究可以进一步探索如何改进鉴别器的设计，以及如何在其他类型的 IE 任务中推广这一框架。

For broader readers who might not be familiar

with reinforcement learning, we brieﬂy introduce

by their counterparts or equivalent concepts in su-

pervised models with the RL terms in the paren-

theses: our goal is to train an extractor (agent A)

to label entities, event triggers and argument roles

(actions a) in text (environment e); to commit cor-

rect labels, the extractor consumes features (state

s) and follow the ground truth (expert E); a re-

ward R will be issued to the extractor according

to whether it is different from the ground truth and

how serious the difference is – as shown in Fig-

ure 1, a repeated mistake is deﬁnitely more serious

– and the extractor improves the extraction model

(policy π) by pursuing maximized rewards.

Our framework can be brieﬂy described as fol-

lows: given a sentence, our extractor scans the

sentence and determines the boundaries and types

of entities and event triggers using Q-Learning

(Section 3.1); meanwhile, the extractor determines

the relations between triggers and entities – argu-

ment roles with policy gradient (Section 3.2). Dur-

ing the training epochs, GANs estimate rewards

which stimulate the extractor to pursue the most

optimal joint model (Section 4).

3 Framework and Approach

3.1 Q-Learning for Entities and Triggers

The entity and trigger detection is often mod-

eled as a sequence labeling problem, where long-

term dependency is a core nature; and reinforce-

ment learning is a well-suited method (Maes et al.,

2007).

From RL perspective, our extractor (agent A)

is exploring the environment, or unstructured nat-

ural language sentences when going through the

sequences and committing labels (actions a) for

the tokens. When the extractor arrives at tth to-

ken in the sentence, it observes information from

the environment and its previous action a

t−1

as its

current state s

; the extractor commits a current

action a

and moves to the next token, it has a new

state s

t+1

. The information from the environment

is token’s context embedding v

, which is usually

acquired from Bi-LSTM (Hochreiter and Schmid-

huber, 1997) outputs; previous action a

t−1

may

impose some constraint for current action a

, e.g.,

I-ORG does not follow B-PER

. With the afore-

In this work, we use BIO, e.g., “B-Meet” indicates the

token is beginning of Meet trigger, “I-ORG” means that the

token is inside an organization phrase, and “O” denotes null.

mentioned notations, we have

=< v

, a

t−1

> . (1)

To determine the current action a

, we generate

a series of Q-tables with

, a

) = f

t−1

, s

t−2

, . . . , a

t−1

, a

t−2

, . . .),

(2)

where f

(·) denotes a function that determine the

Q-values using the current state as well as previ-

ous states and actions. Then we achieve

ˆa

= arg max

, a

). (3)

Equation 2 and 3 suggest that an RNN-based

framework which consumes current input and pre-

vious inputs and outputs can be adopted, and we

use a unidirectional LSTM as (Bakker, 2002). We

have a full pipeline as illustrated in Figure 2.

For each label (action a

) with regard to s

, a

reward r

= r(s

, a

) is assigned to the extractor

(agent). We use Q-learning to pursue the most op-

timal sequence labeling model (policy π) by max-

imizing the expected value of the sum of future re-

wards E(R

), where R

represents the sum of dis-

counted future rewards r

+ γr

t+1

+ γ

t+2

+ . . .

with a discount factor γ, which determines the in-

ﬂuence between current and next states.

We utilize Bellman Equation to update the Q-

value with regard to the current assigned label to

approximate an optimal model (policy π

∗

, a

) = r

+ γ max

t+1

, a

t+1

). (4)

As illustrated in Figure 3, when the extractor

assigns a wrong label on the “death” token be-

cause the Q-value of Die ranks ﬁrst, Equation 4

will penalize the Q-value with regard to the wrong

label; while in later epochs, if the extractor com-

mits a correct label of Execute, the Q-value will

be boosted and make the decision reinforced.

We minimize the loss in terms of mean squared

error between the original and updated Q-values

notated as Q

, a

) − Q

, a

))

(5)

and apply back propagation to optimize the param-

eters in the neural network.

剩余12页未读，继续阅读

食色也

粉丝: 39

生成对抗网络驱动的实体事件抽取新框架

Generative Adversarial Imitation Learning 生成对抗的模仿学习

Image Blind Denoising With Generative Adversarial Network Based Noise Modeling

Learning Generative Adversarial Networks epub

Generative adversarial nets

Generative Adversarial Nets

Generative Adversarial Networks

Generative Adversarial Networks ppt

cycle generative adversarial networks

generative adversarial networks

generative adversarial network

最新资源