微博文本开放关系抽取：MICRO-ORE方法

147 浏览量更新于2024-08-26 2 收藏 426KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"中文微博文本中的开放关系抽取" 在当前信息化社会，微博作为社交媒体的重要平台，其迅速发展为信息提取和理解提供了丰富的数据源。然而，微博的特性，如短小精悍、信息密集以及非正式的语言风格，为传统的实体关系抽取（ERE）任务带来了新的挑战。开放关系抽取（Open Relation Extraction, ORE）则是在这一背景下，针对微博文本提出的一种无监督方法，旨在在缺乏标注数据的情况下，从文本中识别和抽取未预定义的关系。本文介绍的MICRO-ORE（Microblog-based Open Relation Extraction）方法，首先利用左右信息熵方法对微博文本进行分析。信息熵是一种衡量信息不确定性的度量，此处用于自动识别出文本中的关键短语。通过计算每个词语的左右信息熵，可以判断其在句子中的重要性，从而提取出可能包含关键信息的短语。接下来，这些关键短语被链接到外部知识库，如百科全书或知识图谱，以进行语义规范化，增强短语的语义信息，这有助于提高关系抽取的准确性。考虑到汉语的句法特点，MICRO-ORE进一步制定了专门针对中文的提取规则。汉语的词序、成语、省略句等语言现象使得关系抽取更为复杂。因此，MICRO-ORE结合汉语语法结构，设计了相应的规则模板来捕获实体间的潜在关系，以提取出关系元组。实验部分，研究人员使用新浪微博的数据集对MICRO-ORE进行了验证。结果显示，相比于传统的关系抽取方法，MICRO-ORE在保持较高准确性的同时，能从微博文本中抽取更多的信息，充分体现了其在处理中文微博客数据上的优势。此外，由于其无监督的特性，MICRO-ORE具有较强的适应性和泛化能力，对于未见过的新型关系也能进行一定程度的识别。中文微博文本中的开放关系抽取是一个重要的研究方向，它结合了信息熵分析、语义扩展和句法规则，为微博数据的理解和挖掘开辟了新的途径。MICRO-ORE作为首个针对中文微博的ORE方法，展示了在这一领域的潜力，对于推动社交媒体数据分析和信息抽取技术的发展具有积极意义。

资源详情

资源推荐

Open Relation Extraction from Chinese

Microblog Text

Jing Xu, Liang Gan, Zhou Yan, Quanyuan Wu, Yan Jia

School of Computer Science

National University of Defense Technology

Changsha, China

{jing.xu, gl, zhou.y, qyw, jy} @nudt.edu.cn

Abstract—In recent years, the rapid development of

microblog provides entity relation extraction (ERE) with a new

carrier. However, the characteristics of microblog also bring

some challenges for ERE research. Considering the

characteristics of microblog, the paper proposes an unsupervised

open relation extraction (ORE) method, namely MICRO-ORE.

Firstly, MICRO-ORE uses the left-right information entropy

method to automatically extract the key phrases from microblog

texts, and links them to the external knowledge sources to

normalize microblog texts and add the semantic information.

Secondly, according to the Chinese syntactic features, MICRO-

ORE formulates the extraction rules to extract the relation tuples.

We evaluate the proposed method with SINA microblog texts

and show that it can extract more information than the

traditional relation extraction methods, and meet the accuracy

demand. To our best Knowledge, MICRO-ORE is the first

Chinese ORE method for microblog texts.

Keywords—microblog; semantic extension; open relation

extraction; information extraction

I INTRODUCTION

Microblog, namely Micro Blog, which is a broadcast social

network platform for sharing the real-time information based

on the users’ relationship[1]. Compared with the traditional

blogs, microblog has some different characteristics, which are

“short, flexible, fast”. With the rapid development of Web3.0

technologies, microblog has become the mainstream media of

online social networks, and the users can share, spread and

access the information with the microblog platform. Because of

the convenience, high efficiency as well as large amount of

users, microblog releases huge amounts of information every

day, which shows the latest trends and ideas of the social

public. Therefore, microblog is an import information resource

and has great value in making marketing decisions, obtaining

intelligence information and monitoring social public opinion.

Relation extraction means determining whether there is a

relation between two entities and identifying the relation type.

According to the predefined relation types, the traditional

relation extraction methods extract the relation between two

entities in special domains. Because of the limitations of the

relation types and the corpus size, the traditional methods

cannot process the fast-growing and changing data. In big data

era, the massive network texts and diverse relation types

inspire researching the open relation extraction (ORE)

technologies. The goal of ORE is to break the restriction of the

closed relation types and training corpus, extract the entities

and relation words between the entities from the massive

network texts[2]. Because ORE uses the words in the text to

express the relation between the entities, and the types of

entities are not restricted, it can extract more information than

the traditional methods.

The ORE technologies are mature in processing English

texts and produce some useful tools. TextRunner (Banko et al.)

is the first ORE system, which exploits the heuristic rules to

get the training corpus from the Penn Treebank, and uses the

CRF method to automatically extract the relation phrases and

arguments from the texts[3]. WOE (Wu and Weld) exploits the

infobox information in Wikipedia to label the texts and extracts

the entities and their relations[4]. ReVerb (Fader et al.) regards

the verbs and verb phrases as the relation phrases, and extracts

the nearest nouns as arguments on both sides, then restricts the

extraction procedure to improve the accuracy with the syntactic

and lexical information[5].

Compared to English, Chinese syntax is more complex,

which leads to few ORE researches in Chinese texts. CORE

(Tseng et al.) exploits a series of NLP techniques and

extraction rules to extract entity-relation triples from Chinese

free texts[6]. UnCORE (Qin et al.) is an unsupervised Chinese

ORE method for large scale network texts, which focuses on

extracting the relations between the person, organization, and

location[7].

The current ORE methods mainly focus on the standardized

texts, such as Wikipedia, news, Web pages, and the method for

microblog texts is blank. Therefore, according to the

characteristics of microblog texts, We propose an unsupervised

ORE method to mine the useful information from microblog

texts, namely MICRO-ORE. The contributions are as follow.

(1) MICRO-ORE uses the left-right information entropy

method to automatically extract key phrases from microblog

texts, and links them to the external knowledge sources to

normalize microblog texts and add the semantic information. (2)

According to Chinese syntactic features, MICRO-ORE

formulates the rules to extract relations and arguments.

Arguments are some phrases with the behavior patterns or

expression abilities, which have information value and are not

limited to the entities. In addition, MICRO-ORE uses the

words in the text to express the relation between the entities,

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38517212

粉丝: 8
资源: 952

微博文本开放关系抽取：MICRO-ORE方法

微博开放领域的事件抽取

无指导的开放式中文实体关系抽取

命名实体关系抽取技术

nlpcc2013评估任务_中文微博观点要素抽取

提取微博文本中的具体地名有哪些方法

在文档级关系抽取中，英文关系抽取模型和中文关系抽取模型的区别

知识图谱怎么算关系抽取

opennre 中文关系抽取_基于bert的中文实体关系识别（实体关系抽取）项目开源

帮我找一下远程监督，用于中文实体关系抽取的代码。

实体关系联合抽取和关系抽取的区别

实体识别 关系抽取 属性抽取的顺序

金融知识图谱的关系抽取有哪些

语料标注和实体关系抽取有什么关系

关系抽取方法 nlp

知识图谱关系抽取 java

基于机器学习的实体关系抽取算法

实体关系抽取的基本概念

最新资源

实体识别关系抽取属性抽取的顺序