中文文本双传播驱动的开放领域原子事件提取方法

122 浏览量更新于2024-08-27 收藏 262KB PDF 举报

本文主要探讨了在开放领域（OpenDomain）中文文本中进行原子事件提取（AtomicEventExtraction）的问题。近年来，结构化的原子事件信息对于理解语义和构建复杂的文本理解模型至关重要。然而，由于开放领域的广泛性和多样性，直接应用专有领域的事件提取方法面临挑战。这些方法往往受限于特定领域的知识和预定义的事件模式，无法直接迁移到无固定框架的开放环境。传统的事件抽取工作往往将原子事件提取作为预处理步骤，而较少关注开放领域中的原子事件分析。针对这一问题，研究者们提出了一个无监督的方法，旨在解决中文开放领域文本中原子事件的有效提取。该方法特别针对中文文本中常见的省略和灵活句法结构设计，采用了一种名为“双传播”（Double Propagation, DP）的技术。双传播策略利用文本的上下文信息，通过两次迭代的过程来逐步聚合和筛选可能的事件候选。第一次传播是基于词级别的特征融合，捕捉词语之间的潜在联系；第二次传播则是在第一次基础上进一步考虑句子层面的语义结构，通过动态地整合上下文信息，提高原子事件的识别准确度。这种方法的优势在于它能够适应开放领域文本的复杂性，并且无需大量的标注数据，节省了人力和资源成本。实验结果显示，与传统方法相比，这种双传播策略在开放领域中文原子事件提取任务上取得了显著的性能提升，证明了其在处理非结构化和多变的开放领域文本中的有效性。未来的研究可能进一步探索深度学习和迁移学习等技术，以优化双传播算法，使其在更广泛的场景下达到更高的性能。这篇文章为中文开放领域原子事件抽取提供了一个新的研究方向和实用工具，对于自然语言处理和信息抽取领域的理解和应用具有重要的推动作用。

Open Domain Atomic Event Extraction via Double Propagation for Chinese Text

Rui Sun

Computer School

Wuhan University

Wuhan, China

e-mail: ruisun@whu.edu.cn

Sheng Guo

Computer School

Wuhan University

Wuhan, China

e-mail: whucsgs@163.com

Donghong Ji

Computer School

Wuhan University

Wuhan, China

e-mail: dhji@whu.edu.cn

Abstract—Recent studies show structured atomic event

information is beneﬁcial to represent the discourse semantic.

However, extracting useful structured representation of events

from open domain is a challenging problem. On one hand,

previous event extraction methods on special domain, cannot

be directly used for open domain because of domain limitation

and predeﬁned event pattern. On the other hand, atomic

event extraction is simply regarded as a preprocessing step

in previous related work, and few studies focus on atomic

event extraction in open domain. In this paper, we propose

an unsupervised method for Chinese event extraction in open

domain. Being directed against the ellipsis and ﬂexible sentence

structure in Chinese text, the proposed method exploits double

propagation (DP) to combine event extraction and event pattern

generation, which does not require seed events or seed event

patterns and is able to eliminate noise from syntactic parsing.

Experimental results on standard benchmark show that our

proposed method outperforms state-of-the-art algorithm.

Keywords-atomic event; double propagation; event

extraction; open domain;

I. INTRODUCTION

With continuous growth of text resources, it is necessary

to study how to extract knowledge from unstructured text. As

a special structure, event represents more complex semantic

relation than entity relation. Most previous studies (e.g.,[1],

[2], [3]) on event extraction are conducted on news articles

in special domain, such as ACE 2005 standard dataset. These

methods cannot be directly used for open domain because

of predeﬁned event pattern. In recent years, the form of

event, Subject + Predicate + Object, has been proved to

be signiﬁcantly effective for a range of natural language

processing applications (e.g.,[4], [5], [6]). These studies

exploit structured atomic event information to represent the

discourse semantic. However, event extraction is simply

regarded as a preprocessing step in these work. Few studies

pay more attentions on event extraction in open domain.

In this paper, we focus on atomic event extraction in open

domain. The methods of atomic event extraction mainly are

divided into two categories. One is Rule-based (e.g.,[7],

[8], [9]) which directly exploits the syntactic rules like

dependency relation. The event trigger and arguments can

be identiﬁed according to some special dependency, such as

nsubj and dobj. The main drawback of the method is that it

relies on the dependency parser. The other is ORE-based

(e.g.,[10], [11]) which extracts the events based on open

relation extraction. As so far, it has achieved a great success

that to extract entity and entity relations from news and

microblogs in open domain. Most of these relations present

the structured event information. However, these methods

give a low recall, because they neglect the fact that the

argument of an event may not be an entity. Especially

in Chinese, as a paratactic language (e.g., discourse-driven

and pro-drop), there are wide spread of ellipsis and more

open ﬂexible sentence structure in the text [2]. Consider the

following discourse as a sample:

“

(E1)8830

(E2)



12.46(E3)1(E4)324

(E5)

8(E6)56880(E7), 

(E8)

6988 (E9)13017”

(According to the report of Pu’er City Civil Affairs

Bureau(E1), up to 8 at 8:30, Yunnan Earthquake

caused(E2) 12.46 million people were affected(E3) in

Jinggu County, Simao District, Zhenyuan County, Linxiang

District, Shuangjiang County, etc., one person was

killed(E4) and 324 people were wounded(E5) and eight

people were injured(E6), 56880 people were evacuated(E7),

6988 houses were collapsed(E8) and 13017 houses were

severely damaged(E9). )

In above discourse, we can extract 9 atomic events

exploiting dependency relations, and only extract 6 events

based on open relation extraction tool ZORE [12]. However,

we observe there are some phenomena due to above-

mentioned characteristic of Chinese from this sample.

First, the forms of these atomic events are diversiﬁed due

to the open sentence structure. For example, the event

E5“

(people), (wounded), nil” is similar to the event

E6 “nil,

(injured), (people)”, but there syntactic

structures are different. Intuitively, this kind of events

need a uniﬁed form. Second, some events may lose there

arguments due to the ellipsis or the far distance between

the arguments and the trigger. For example, the subject

of E7 “nil,

(injured), (people)” is lost due to

missing the dependency relation. The discourse or cross

document information should be exploited to ﬁnd the lost

2016 IEEE 28th International Conference on Tools with Artificial Intelligence

DOI 10.1109/ICTAI.2016.128

843

2016 IEEE 28th International Conference on Tools with Artificial Intelligence

DOI 10.1109/ICTAI.2016.128

844

2016 IEEE 28th International Conference on Tools with Artificial Intelligence

DOI 10.1109/ICTAI.2016.128

844

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38717143

粉丝: 3
资源: 946

中文文本双传播驱动的开放领域原子事件提取方法

DE-MP雷达信号Gabor原子特征提取算法：有效性与应用

DE-MP雷达信号Gabor原子特征提取：复杂环境下有效识别方法

网络攻击效果分析：原子功能提取与分类

textsummarization:具有监督学习和原子事件的文本摘要

基于DE-MP的雷达信号Gabor原子特征提取

双原子正方形晶格中边缘态的光传播和定位特性

双原子分子

正点原子esp32汉字例程双屏兼容版本

原子文本管理员自动完成

atom-mapping-cmu:使用 C++ 和 Open Babel 进行原子映射

最新资源