统一序列到序列框架解决NER子任务

版权申诉

165 浏览量更新于2024-07-06 收藏 666KB PDF 举报

本文档探讨了命名实体识别（Named Entity Recognition，NER）这一自然语言处理领域的核心任务，它旨在从句子中定位表示实体的跨度。NER通常分为三个子任务：平坦命名实体识别（flat NER）、嵌套命名实体识别（nested NER）以及不连续命名实体识别（discontinuous NER）。传统上，这些问题分别通过词级别序列标注或跨度级别分类方法来解决。然而，这些方法往往难以同时处理这三种不同类型的NER。作者提出了一种统一的生成框架，将NER子任务转化为实体跨度序列生成任务。这个框架基于序列到序列（Sequence-to-Sequence，Seq2Seq）模型，能够在一个单一的架构中处理所有三种NER类型。通过这种方式，模型可以利用预训练的Seq2Seq技术，如Transformer或其他递归神经网络结构，学习捕捉文本中的上下文信息，以便更有效地识别和生成实体的正确位置和类别。该论文的核心贡献包括： 1. **方法创新**：提出了将NER任务统一为生成式模型，而非传统的分类或标注方式，这使得模型能够灵活应对不同类型的实体关系，包括是否嵌套以及实体跨度的断裂情况。 2. **Seq2Seq模型应用**：展示了如何将预训练的Seq2Seq模型应用于NER，通过端到端的学习，简化了模型设计，提高了识别性能。 3. **跨任务处理能力**：通过统一的框架，模型能够在不牺牲准确性的前提下，同时处理多种NER子任务，提高了整体的效率和通用性。 4. **实验评估**：论文还可能包含详细的实验设置和结果分析，展示了在基准数据集上的性能对比，证明了新框架的有效性和优越性。 5. **潜在应用**：这种统一的生成框架对于那些需要处理复杂实体结构的领域，如信息提取、问答系统和知识图谱构建，具有显著的应用前景。本文的研究成果提供了一个强大的工具，有望推动命名实体识别领域的研究进一步向前发展，提高系统的灵活性和泛化能力。

tried to predict a tagging sequence. Therefore, they

still need to design tagging schemas for different

NER subtasks.

Span-level classiﬁcation

When applying the se-

quence labelling method to the nested NER and

discontinous NER subtasks, the tagging will be

complex (Strakov

a et al., 2019; Metke-Jimenez and

Karimi, 2016) or multi-level (Ju et al., 2018; Fisher

and Vlachos, 2019; Shibuya and Hovy, 2020).

Therefore, the second line of work directly con-

ducted the span-level classiﬁcation. The main dif-

ference between publications in this line of work is

how to get the spans. Finkel and Manning (2009)

regarded the parsing nodes as a span. Xu et al.

(2017); Luan et al. (2019); Yamada et al. (2020); Li

et al. (2020b); Yu et al. (2020); Wang et al. (2020a)

tried to enumerate all spans. Following Lu and

Roth (2015), hypergraph methods which can effec-

tively represent exponentially many possible nested

mentions in a sentence have been extensively stud-

ied in the NER tasks (Katiyar and Cardie, 2018;

Wang and Lu, 2018; Muis and Lu, 2016).

Combined token-level and span-level classiﬁ-

cation

To avoid enumerating all possible spans

and incorporate the entity boundary information

into the model, Wang and Lu (2019); Zheng et al.

(2019); Lin et al. (2019); Wang et al. (2020b); Luo

and Zhao (2020) proposed combining the token-

level classiﬁcation and span-level classiﬁcation.

2.2 Sequence-to-Sequence Models

The Seq2Seq framework has been long studied and

adopted in NLP (Sutskever et al., 2014; Cho et al.,

2014; Luong et al., 2015; Vaswani et al., 2017;

Vinyals et al., 2015). Gillick et al. (2016) pro-

posed a Seq2Seq model to predict the entity’s start,

span length and label for the NER task. Recently,

the amazing performance gain achieved by PTMs

(pre-trained models) (Qiu et al., 2020; Peters et al.,

2018; Devlin et al., 2019; Dai et al., 2021; Yan

et al., 2020) has attracted several attempts to pre-

train a Seq2Seq model (Song et al., 2019; Lewis

et al., 2020; Raffel et al., 2020). We mainly focus

on the newly proposed BART (Lewis et al., 2020)

model because it can achieve better performance

than MASS (Song et al., 2019). And the sentence-

piece tokenization used in T5 (Raffel et al., 2020)

will cause different tokenizations for the same to-

ken, making it hard to generate pointer indexes to

conduct the entity extraction.

BART is formed by several transformer encoder

and decoder layers, like the transformer model used

in the machine translation (Vaswani et al., 2017).

BART’s pre-training task is to recover corrupted

text into the original text. BART uses the encoder

to input the corrupted sentence and the decoder

to recover the original sentence. BART has base

and large versions. The base version has 6 encoder

layers and 6 decoder layers, while the large version

has 12. Therefore, the number of parameters is

similar to its equivalently sized BERT

3 Proposed Method

In this part, we ﬁrst introduce the task formulation,

then we describe how we use the Seq2Seq model

with the pointer mechanism to generate the entity

index sequences. After that, we present the detailed

formulation of our model with BART.

3.1 NER Task

The three kinds of NER tasks can all be formulated

as follows, given an input sentence of

tokens

X = [x

, x

, ..., x

]

, the target sequence is

Y =

, e

, ..., s

, e

, t

, ..., s

, e

, ..., s

, e

, t

]

where

s, e

are the start and end index of a span,

since an entity may contain one (for ﬂat and

nested NER) or more than one (for discontinu-

ous NER) spans, each entity is represented as

, e

, ..., s

, e

, t

]

, where

is the entity tag

index. We use

G = [g

, ..., g

]

to denote the entity

tag tokens (such as “Person”, “Location”, etc.),

where

is the number of entity tags. We make

∈ (n, n + l]

, the

shift is to make sure

is not

confusing with pointer indexes (pointer indexes

will be in range [1, n]).

3.2 Seq2Seq for Uniﬁed Decoding

Since we formulate the NER task in a generative

way, we can view the NER task as the following

equation:

P (Y |X) =

t=1

P (y

|X, Y

) (1)

where

is the special “start of sentence” control

token.

We use the Seq2Seq framework with the pointer

mechanism to tackle this task. Therefore, our

model consists of two components:

Because of the cross-attention between encoder and de-

coder, the number of parameters of BART is about 10% larger

than its equivalently sized of BERT (Lewis et al., 2020).

剩余14页未读，继续阅读

易小侠

粉丝: 6628
资源: 9万+

统一序列到序列框架解决NER子任务

WeiboNER_微博NER_中文NER_

ChineseNER-master_ner实体识别_源码

复制GitHub项目BERT-BiLSMT-CRF-NER-BERT-Bilstm_CRF_NER.zip

Chinese-Literature-NER-RE-Dataset-master.zip_ner_中文命名实体识别_命名实体_命

char-rnn-master_ner_biLSTM+CRF_BiLSTM-CRF_

ChineseNER-master.zip_chinese ner bilstm_chinesener python_crf n

chinese-nlp-ner-master_chinese_BLSTM_

NER-BERT-pytorch-master_ner_

model_ner_ner_源码

code_for_ACL_2020_paper_FLAT_Chinese_NER_Using_F

最新资源