命名实体识别的双向LSTM与CRF神经架构

需积分: 0 165 浏览量更新于2024-08-05 收藏 408KB PDF 举报

"bilstm_crf论文1" 在自然语言处理领域，命名实体识别（NER）是一项关键任务，它涉及到从文本中识别出具有特定意义的实体，如人名、组织名、地点等。传统的NER系统高度依赖手工设计的特征和领域特定知识，以便在有限的标注训练数据上高效学习。然而，这篇论文"bilstm_crf论文1"提出了一种新颖的方法，通过引入神经网络架构来降低对人工特征的依赖。论文首先介绍了一种基于双向长短期记忆网络（BiLSTM）和条件随机场（CRF）的模型。BiLSTM是一种深度学习模型，能够捕获单词的上下文信息，因为它同时考虑了前向和后向的上下文。而CRF则是一种用于序列标注的统计建模方法，它可以捕捉到标注之间的依赖关系，从而提高整体预测的准确性。这种结合使得模型在理解词汇语义的同时，也能考虑整个序列的结构信息。另外，论文还提出了一种受移位减少解析器启发的基于转移的段落构建和标注方法。这种转换基方法通过一系列操作（如移位和减少）逐步构造并标记序列中的实体，类似于句法分析的过程。这种方法允许模型动态地分析句子结构，从而更灵活地识别复杂的实体模式。论文的核心创新之一在于，模型利用了两种类型的信息源来表示单词：一是基于字符的词表示，这些表示是从监督语料库中学习得到的；二是无监督的词表示，这些表示是从未标注的大量文本中学习的。这两种表示方式可以互补，字符级表示能捕获单词的形态信息，而无监督表示则可以捕获更广泛的语言共性。实验结果显示，该模型在四种语言的NER任务上达到了最先进的性能，且没有依赖任何特定语言的知识或资源，比如地名词典。这表明，即使在缺乏特定领域知识的情况下，这种神经网络架构也能有效地进行跨语言的命名实体识别。 "bilstm_crf论文1"提出的模型提供了一种强大的、基于深度学习的NER解决方案，它能够从原始文本中自动学习特征，并在多种语言环境下表现出色，对于推动NER领域的进步有着重要的贡献。

where A is a matrix of transition scores such that

i,j

represents the score of a transition from the

tag i to tag j. y

and y

are the start and end

tags of a sentence, that we add to the set of possi-

ble tags. A is therefore a square matrix of size k +2.

A softmax over all possible tag sequences yields a

probability for the sequence y:

p(y|X) =

s(X,y)

y∈Y

s(X,

During training, we maximize the log-probability of

the correct tag sequence:

log(p(y|X)) = s(X, y) − log





y∈Y

s(X,





= s(X, y) − logadd

y∈Y

s(X,

y), (1)

where Y

represents all possible tag sequences

(even those that do not verify the IOB format) for

a sentence X. From the formulation above, it is ev-

ident that we encourage our network to produce a

valid sequence of output labels. While decoding, we

predict the output sequence that obtains the maxi-

mum score given by:

∗

= argmax

y∈Y

s(X,

y). (2)

Since we are only modeling bigram interactions

between outputs, both the summation in Eq. 1 and

the maximum a posteriori sequence y

∗

in Eq. 2 can

be computed using dynamic programming.

2.3 Parameterization and Training

The scores associated with each tagging decision

for each token (i.e., the P

i,y

’s) are deﬁned to be

the dot product between the embedding of a word-

in-context computed with a bidirectional LSTM—

exactly the same as the POS tagging model of Ling

et al. (2015b) and these are combined with bigram

compatibility scores (i.e., the A

y,y

’s). This archi-

tecture is shown in ﬁgure 1. Circles represent ob-

served variables, diamonds are deterministic func-

tions of their parents, and double circles are random

variables.

Figure 1: Main architecture of the network. Word embeddings

are given to a bidirectional LSTM. l

represents the word i and

its left context, r

represents the word i and its right context.

Concatenating these two vectors yields a representation of the

word i in its context, c

The parameters of this model are thus the matrix

of bigram compatibility scores A, and the parame-

ters that give rise to the matrix P, namely the param-

eters of the bidirectional LSTM, the linear feature

weights, and the word embeddings. As in part 2.2,

let x

denote the sequence of word embeddings for

every word in a sentence, and y

be their associated

tags. We return to a discussion of how the embed-

dings x

are modeled in Section 4. The sequence of

word embeddings is given as input to a bidirectional

LSTM, which returns a representation of the left and

right context for each word as explained in 2.1.

These representations are concatenated (c

) and

linearly projected onto a layer whose size is equal

to the number of distinct tags. Instead of using the

softmax output from this layer, we use a CRF as pre-

viously described to take into account neighboring

tags, yielding the ﬁnal predictions for every word

. Additionally, we observed that adding a hidden

layer between c

and the CRF layer marginally im-

proved our results. All results reported with this

model incorporate this extra-layer. The parameters

are trained to maximize Eq. 1 of observed sequences

of NER tags in an annotated corpus, given the ob-

served words.

剩余10页未读，继续阅读

江水流春去

粉丝: 48
资源: 352

命名实体识别的双向LSTM与CRF神经架构

基于 pytorch 实现 bert-bilstm-crf-ner 命名实体识别 完整代码+数据 可直接运行

BERT-BiLSTM-CRF-NER:NER任务的Tensorflow解决方案将BiLSTM-CRF模型与Google BERT微调和私有服务器服务结合使用

结合GAN与BiLSTM_Att_省略_ion_CRF的领域命名实体识别

named_entity_recognition：中文命名实体识别（包括多种模型：HMM，CRF，BiLSTM，BiLSTM + CRF的具体实现）

PS-project-Contract_Element_Extraction_-BiLSTM-CRF-:实习学校守则-1合同要素提取项目

scite:基于转移转移嵌入的自专心BiLSTM-CRF的因果关系提取

基于大数据和BiLSTM+CRF的网络空间安全领域命名实体识别研究.zip

基于注意力的BiLSTM-CRF模型在中国临床命名实体识别中的应用

BERT+BiLSTM+CRF模型提升中文景点识别准确性

BERT-BiLSTM-CRF模型提升中文专业术语抽取精度

最新资源

基于 pytorch 实现 bert-bilstm-crf-ner 命名实体识别完整代码+数据可直接运行