图卷积网络在语义角色标注中的句子编码

需积分: 10 139 浏览量更新于2024-09-09 收藏 622KB PDF 举报

"使用图卷积网络对句子进行编码以实现语义角色标注" 本文主要探讨了如何利用图卷积网络（Graph Convolutional Networks, GCNs）来提升语义角色标注（Semantic Role Labeling, SRL）的性能。SRL是一项关键的自然语言处理任务，其目标是识别出句子中的谓词-论元结构，即找出句子中动词与其相关的主语、宾语等成分，这对于理解句子的意义至关重要。传统的SRL方法通常依赖于复杂的句法分析，而本文提出了一种新的GCN变体，这种网络结构能够有效地处理句法依赖树，从而为句子中的每个单词生成隐含的特征表示。GCNs在处理非欧几里得数据，如图结构时表现出色，这使得它们成为建模句法依赖关系的理想选择。在模型设计上，作者将GCNs用作句子编码器，通过遍历句法依赖树，GCN可以捕捉到单词间的复杂句法关系。同时，他们发现GCN层与长短期记忆网络（LSTM）层相结合可以带来显著的性能提升。LSTM擅长捕捉序列数据中的长期依赖关系，而GCN则能捕获局部结构信息。因此，将这两种类型的层堆叠在一起，能够形成一种综合的特征学习机制，增强模型对句法和语义的理解。实验结果显示，这种结合GCN和LSTM的模型在标准基准数据集CoNLL-2009上取得了最先进的成绩，对于中文SRL任务尤其明显。这表明，GCNs对于句法信息的编码能力可以有效提升SRL任务的准确性和效率。此外，文章可能还讨论了模型训练的细节，如优化算法的选择、损失函数的设计以及可能的数据增强策略。可能还涉及了模型泛化能力和对不同句法结构的适应性分析。通过对比实验，作者可能论证了GCNs在处理不同类型的语言结构时的优势，并且可能探讨了如何进一步优化模型以适应更广泛的应用场景。 "encoding sentences with graph convolutional networks for semantic role labeling"这篇论文提出了一种创新的方法，将图卷积网络引入到语义角色标注中，通过结合GCNs和LSTMs的特性，实现了对句子深层句法和语义信息的有效编码，从而提升了SRL任务的性能。这一研究对于自然语言处理领域具有重要的理论和实践意义，为后续的工作提供了新的研究方向和工具。

Lane disputed those estimates

ReLU(⌃·)

⇥W

(1)

self

⇥W

(1)

self

⇥W

(1)

self

⇥W

(1)

self

⇥W

(1)

subj

⇥W

(1)

obj

⇥W

(1)

nmod

⇥W

(1)

nmod

⇥W

(1)

obj

⇥W

(1)

subj

ReLU(⌃·)

⇥W

(2)

self

⇥W

(2)

self

⇥W

(2)

self

⇥W

(2)

self

⇥W

(2)

subj

⇥W

(2)

subj

⇥W

(2)

obj

⇥W

(2)

obj

⇥W

(2)

nmod

⇥W

(2)

nmod

… … … …

NMOD

SBJ

OBJ

Figure 2: A simpliﬁed syntactic GCN (bias terms

and gates are omitted); the syntactic graph of the

sentence is shown with dashed lines at the bottom.

Parameter matrices are sub-indexed with syntactic

functions, and apostrophes (e.g., subj’) signify that

information ﬂows in the direction opposite of the

dependency arcs (i.e., from dependents to heads).

As in standard convolutional networks (LeCun

et al., 2001), by stacking GCN layers one can in-

corporate higher degree neighborhoods:

(k+1)

= ReLU





u∈N (v)

(k)

+ b

(k)





where k denotes the layer number and h

(1)

= x

3 Syntactic GCNs

As syntactic dependency trees are directed and la-

beled (we refer to the dependency labels as syn-

tactic functions), we ﬁrst need to modify the com-

putation in order to incorporate label information

(Section 3.1). In the subsequent section, we incor-

porate gates in GCNs, so that the model can decide

which edges are more relevant to the task in ques-

tion. Having gates is also important as we rely on

automatically predicted syntactic representations,

and the gates can detect and downweight poten-

tially erroneous edges.

3.1 Incorporating directions and labels

Now, we introduce a generalization of GCNs ap-

propriate for syntactic dependency trees, and in

general, for directed labeled graphs. First note

that there is no reason to assume that information

ﬂows only along the syntactic dependency arcs

(e.g., from makes to Sequa), so we allow it to ﬂow

in the opposite direction as well (i.e., from depen-

dents to heads). We use a graph G = (V, E), where

the edge set contains all pairs of nodes (i.e., words)

adjacent in the dependency tree. In our example,

both (Sequa, makes) and (makes, Sequa) belong

to the edge set. The graph is labeled, and the label

L(u, v) for (u, v) ∈ E contains both information

about the syntactic function and indicates whether

the edge is in the same or opposite direction as

the syntactic dependency arc. For example, the la-

bel for (makes, Sequa) is subj, whereas the label

for (Sequa, makes) is subj

, with the apostrophe

indicating that the edge is in the direction oppo-

site to the corresponding syntactic arc. Similarly,

self-loops will have label self. Consequently, we

can simply assume that the GCN parameters are

label-speciﬁc, resulting in the following computa-

tion, also illustrated in Figure 2:

(k+1)

= ReLU





u∈N (v)

(k)

L(u,v)

(k)

+ b

(k)

L(u,v)





This model is over-parameterized,

especially

given that SRL datasets are moderately sized, by

deep learning standards. So instead of learning the

GCN parameters directly, we deﬁne them as

(k)

L(u,v)

= V

(k)

dir(u,v)

, (2)

where dir(u, v) indicates whether the edge (u, v)

is directed (1) along, (2) in the opposite direction

to the syntactic dependency arc, or (3) is a self-

loop; V

(k)

dir(u,v)

∈ R

m×m

. Our simpliﬁcation cap-

tures the intuition that information should be prop-

agated differently along edges depending whether

this is a head-to-dependent or dependent-to-head

edge (i.e., along or opposite the corresponding

syntactic arc) and whether it is a self-loop. So we

do not share any parameters between these three

very different edge types. Syntactic functions are

important, but perhaps less crucial, so they are en-

coded only in the feature vectors b

L(u,v)

3.2 Edge-wise gating

Uniformly accepting information from all neigh-

boring nodes may not be appropriate for the SRL

Chinese and English CoNLL-2009 datasets used 41 and

48 different syntactic functions, which would result in having

83 and 97 different matrices in every layer, respectively.

剩余10页未读，继续阅读

樱夕夕

粉丝: 45
资源: 3

图卷积网络在语义角色标注中的句子编码

多标签分类问题multi-label recognition

Recurrent Convolutional Neural Networks for Text Classification

tensorflow构建图像多标签分类器入门文件

The Analysis of Sentences Containing Words with Multiple Heads based on Chinese Semantic Dependency Graph

A convolutional neural network for modelling sentences论文及翻译

Text Generation from Knowledge Graphs with Graph Transformers.pdf

Deep Visual-Semantic Alignments for Generating Image Descriptions

【Advanced】Using MATLAB to Implement Long Short-Term Memory (LSTM) Networks for Classification and ...

with open('sentences.txt','r',encoding='utf-8') as f: sentences = f.read().splitlines()

最新资源