树状卷积神经网络在EMNLP2015中的句子区分性建模

需积分: 10 113 浏览量更新于2024-09-10 收藏 575KB PDF 举报

自然语言会议（EMNLP279）是一场于2015年9月17日至21日在葡萄牙里斯本召开的重要学术活动，由计算语言学协会主办。该会议的焦点在于探索和应用实证方法在自然语言处理领域。其中一篇论文《基于树结构卷积神经网络的区分性句子建模》（Discriminative Neural Sentence Modeling by Tree-Based Convolution）引起了广泛关注。论文作者Lili Mou、Hao Peng、Ge Li、Yan Xu、Lu Zhang和Zhi Jin提出了一个名为树基卷积神经网络（TBCNN）的新模型，其目的是改进对句子的区分性建模。TBCNN利用了句法分析中的两种主要形式：依存树和成分树，通过这种方式捕捉句子的结构信息。树结构卷积过程能够提取出句子的结构性特征，这些特征随后通过最大池化进行聚合，这种设计使得信息可以在输出层和底层特征检测器之间短路径传递，从而实现有效的结构特征学习和提取。实验结果显示，TBCNN在两个关键任务上表现卓越，即情感分析和问题分类。与当时最先进的方法相比，TBCNN不仅提高了准确性和效率，还展示了它在理解和处理复杂语义结构上的优势。这项工作不仅革新了自然语言处理技术，而且为后续研究提供了新的思路，即如何结合深度学习和句法分析来优化文本理解和预测性能。总结来说，自然语言会议EMNLP279上的这篇论文对于理解如何通过树结构卷积神经网络在句子层面进行有区分性的建模具有重要意义，它强调了结构信息在机器学习自然语言处理任务中的价值，并为未来的自然语言处理研究设定了新的基准。这一成果推动了自然语言处理领域的进步，促进了人工智能技术在文本理解和智能决策方面的应用。

provements for semantic compositionality include

matrix-vector interaction (Socher et al., 2012),

tensor interaction (Socher et al., 2013). They are

more suitable for capturing logical information in

sentences, such as negation and exclamation.

One potential problem of RNNs is that the long

propagation paths—through which leaf nodes are

connected to the output layer—may lead to infor-

mation loss. Thus, RNNs bury illuminating in-

formation under a complicated neural architecture.

Further, during back-propagation over a long path,

gradients tend to vanish (or blow up), which makes

training difﬁcult (Erhan et al., 2009). Long short

term memory (LSTM), ﬁrst proposed for model-

ing time-series data (Hochreiter and Schmidhuber,

1997), is integrated to RNNs to alleviate this prob-

lem (Tai et al., 2015; Le and Zuidema, 2015; Zhu

et al., 2015).

Recurrent networks. A variant class of RNNs

is the recurrent neural network (Bengio et al.,

1994; Shang et al., 2015), whose architecture is

a rightmost tree. In such models, meaningful tree

structures are also lost, similar to CNNs.

3 Tree-based Convolution

This section introduces the proposed tree-based

convolutional neural networks (TBCNNs). Figure

1c depicts the convolution process on a tree.

First, a sentence is converted to a parse tree, ei-

ther a constituency or dependency tree. The corre-

sponding model variants are denoted as c-TBCNN

and d-TBCNN. Each node in the tree is repre-

sented as a distributed, real-valued vector.

Then, we design a set of ﬁxed-depth subtree fea-

ture detectors, called the tree-based convolution

window. The window slides over the entire tree

to extract structural information of the sentence,

illustrated by a dashed triangle in Figure 1c. For-

mally, let us assume we have t nodes in the con-

volution window, x

, · · · , x

, each represented as

an n

-dimensional vector. Let n

be the number

of feature detectors. The output of the tree-based

convolution window, evaluated at the current sub-

tree, is given by the following generic equation.

y = f

i=1

·x

+ b

(2)

where W

∈ R

×n

is the weight parameter asso-

ciated with node x

; b ∈ R

is the bias term.

Extracted features are thereafter packed into

one or more ﬁxed-size vectors by max pooling,

that is, the maximum value in each dimension is

taken. Finally, we add a fully connected hidden

layer, and a softmax output layer.

From the designed architecture (Figure 1c), we

see that our TBCNN models allow short propaga-

tion paths between the output layer and any posi-

tion in the tree. Therefore structural feature learn-

ing becomes effective.

Several main technical points in tree-based con-

volution include: (1) How can we represent hid-

den nodes as vectors in constituency trees? (2)

How can we determine weights, W

, for depen-

dency trees, where nodes may have different num-

bers of children? (3) How can we pool varying

sized and shaped features to ﬁxed-size vectors?

In the rest of this section, we explain model

variants in detail. Particularly, Subsections 3.1 and

3.2 address the ﬁrst and second problems; Sub-

section 3.3 deals with the third problem by intro-

ducing several pooling heuristics. Subsection 3.4

presents our training objective.

3.1 c-TBCNN

Figure 2a illustrates an example of the con-

stituency tree, where leaf nodes are words in the

sentence, and non-leaf nodes represent a grammat-

ical constituent, e.g., a noun phrase. Sentences

are parsed by the Stanford parser;

further, con-

stituency trees are binarized for simplicity.

One problem of constituency trees is that non-

leaf nodes do not have such vector representations

as word embeddings. Our strategy is to pretrain

the constituency tree with an RNN by Equation 1

(Socher et al., 2011b). After pretraining, vector

representations of nodes are ﬁxed.

We now consider the tree-based convolution

process in c-TBCNN with a two-layer-subtree

convolution window, which operates on a parent

node p and its direct children c

and c

, their vec-

tor representations denoted as p, c

, and c

. The

convolution equation, speciﬁc for c-TBCNN, is

y = f



(c)

·p + W

(c)

·c

+ W

(c)

·c

+ b

(c)



where W

(c)

, W

(c)

, and W

(c)

are weights asso-

ciated with the parent and its child nodes. Su-

perscript (c) indicates that the weights are for c-

TBCNN. For leaf nodes, which do not have chil-

dren, we set c

and c

to be 0.

http://nlp.stanford.edu/software/lex-parser.shtml

2317

剩余10页未读，继续阅读

chenkejin123

粉丝: 8
资源: 19

树状卷积神经网络在EMNLP2015中的句子区分性建模

2016 R语言会议PPT

自然语言处理研究.pdf

北大语言学 自然语言处理课程 NLP系列课程 1_自然语言处理概论 共48页.pptx

机器学习，深度学习，自然语言处理，计算机视觉方面的顶级期刊会议论文集_awesome-papers.zip

live-manning-nlpconf20:我在2020年live @ manning NLP会议上的会议论文，代码和幻灯片，涵盖了我关于用于自然语言处理的深度迁移学习的演讲

自然语言处理相关资料

gencon-nlp:一个学校项目，使用自然语言处理来分析一般性会议演讲（来自耶稣基督后期圣徒教会）

conference_call_for_paper:2019-2020年人工智能，机器学习，计算机视觉，数据挖掘，自然语言处理和机器人技术国际会议

自然语言处理（NLP）思诺学长-语言识别

自然语言处理 机器翻译概论

最新资源

北大语言学自然语言处理课程 NLP系列课程 1_自然语言处理概论共48页.pptx

自然语言处理机器翻译概论