AGCNN：提升句子分类的注意力门控卷积神经网络

63 浏览量更新于2024-07-15 收藏 128KB PDF 举报

"本文主要介绍了注意力门控卷积神经网络(AGCNN)在句子分类任务中的应用，以及一种新的激活函数NLReLU。AGCNN通过卷积编码器从不同大小的特征上下文窗口中生成注意权重，有效利用有限的上下文信息来提升关键特征的提取和增强，从而改善句子类别的预测效果。实验结果显示，AGCNN相比于标准CNN模型，精度提高了3.1%，并在六项任务中有四项表现优于基准。此外，文章还提出了NLReLU，一种自然对数重定比例的整流线性单元，其性能优于ReLU，并与其他知名激活函数相媲美。" 在这篇文章中，作者首先指出句子分类的挑战性，由于句子本身的上下文信息有限，使得分类任务变得复杂。为了解决这个问题，他们提出了一种新的神经网络结构——注意力门控卷积神经网络(AGCNN)。AGCNN的核心是利用卷积编码器来生成不同尺度的特征上下文窗口的注意力权重。这种方法能够帮助模型更好地捕捉句子中的关键信息，即使在信息量有限的情况下，也能有效提取和增强重要特征，从而提高对句子类别的预测准确性。实验部分展示了AGCNN的优越性。与传统的卷积神经网络（CNN）相比，AGCNN在句子分类任务上的精度提升了3.1%，这表明了门控机制对于处理句子语义信息的有效性。此外，他们在六项不同的任务中对比了AGCNN与基准模型，发现AGCNN在其中四项任务上取得了更优的结果，进一步证明了其在处理句子分类问题上的强大能力。除了AGCNN，文章还引入了一种名为NLReLU的新激活函数。NLReLU是对ReLU的改进，通过自然对数重定比例，它在保持ReLU的优势的同时，可能解决了ReLU的一些固有问题，如死亡ReLU现象。实验表明，NLReLU在性能上超越了ReLU，并且可以与其它知名的激活函数相竞争，这为神经网络的设计提供了新的选择。这篇文章通过提出AGCNN和NLReLU，为句子分类任务提供了一种更为高效的方法，同时也在激活函数领域进行了创新，这些成果对于自然语言处理和深度学习领域的研究具有重要的参考价值。

4



NeuralNetwork(GCNN)[7]firstlyintroducedthegatingmechanismintoCNNforthe

language modeling, which could reduce the vanishing gradient problem for deep

architectures. GCNN[7]utilized halfoftheabstract featuresasthegatingweightsto

control the other half abstract features. However, since the weights and the abstract

featuresareconvolvedatthesamelevel,theinformationcarriedbythecontrolweights

isverymonotonous.Inthispaper,wealsointroducethegatingmechanisminCNN,but

the control weights, i.e., attention weights, are generated by a variety of specialized

convolution kernels. Therefore, the contextual information of a particular context

windowisintegratedintothecontrolweights.

Attention mechanisms attempt to mimic the human’s perception, which focus

attention selectively on parts of the target areas to obtain more details of the targets

while suppressing other useless information. Mnih et al. [33] firstly applied attention

mechanisminRNNforimageclassification.Thentheextensionsoftheattention-based

RNN model are applied to various NLP tasks [2,28]. Attention mechanism in neural

networks has attracted much attention and has been applied in a variety of neural

network architectures including encoder-decoder [55]. The process of focusing

attention in these architectures mainly reflected in the calculation of the weight

coefficient.Thelargertheweight,themoretheattentionfocusedonitscorresponding

value, that is, weight represents the importance of information, and value is its

correspondinginformation.Recently,howtousetheattentionmechanisminCNNshas

becomearesearchhotspot[51].

Activation functions havea crucial impact on the neural networks’performance.

Sigmoid[10],RectifiedLinearUnit(ReLU)[38],Softplus[38],LeakyReLU(LReLU)

[34],ParametricReLU (PReLU)[17],ExponentialLinear Unit(ELU) [5]andScaled

ExponentialLinearUnit(SELU)[26]areallfairly-knownandwidely-usedactivation

units.Activationfunctionsmakeitpossibletocarryoutthenon-lineartransformation

of the input to solve the complex problems. However, it may also bring with

disadvantages,e.g.,vanishinggradientandneuronaldeath.Therefore,itisessentialto

choosetheappropriateactivationfunctionfortheneuralnetwork.

3. The proposed model

CNN is very suitable for natural language processing, because CNN not only

allows to precisely control the length of dependencies but also enables nearby input

elementstointeractatlowerlayerswhiledistantelementsinteractathigherlayers,and

CNNcanproducethehierarchicalabstractrepresentationsoftheinputtextbystacking

multipleconvolutionlayers.Mostcurrentmethodsforsentenceclassificationbasedon

CNN intend to utilize the pooling layer to find the most significant features. In this

paper, we construct an attention-gated layer before pooling layer to identify critical

features,suppresstheimpactofotherunimportantfeaturesandhelppoolinglayerfind

thegenuinelycrucialfeatures.

In this section, wedescribe our model in detail. As depicts in Fig. 1, our model

consistsofaconvolutionallayeroperatingontheinputsentencematrix,anattention-

gated layer,a max-over-timepooling layer, anda fullyconnectedlayer with dropout

andsoftmaxoutput.Wechooseasentencewiththelengthnof7andthewordvectors'

dimensionality d of 4 as an example. In the demo model shown in Fig. 1, the first

convolutionallayerusesconvolutionkernelswiththewindowsizehof2or3words,

andtheconvolutionlayerintheattentiongatedlayerusesconvolutionkernelswiththe

剩余18页未读，继续阅读

weixin_38705873

粉丝: 7
资源: 926

AGCNN：提升句子分类的注意力门控卷积神经网络

sentence-classification:神经网络的句子分类

基于卷积深度神经网络的句子单子关系分类

门控卷积和部分卷积的区别

基于门控卷积的语言模型训练代码

用MATLAB编写一个基于注意力机制的卷积神经网络和双向门控循环单元融合算法的分类模型

常用的一维卷积神经网络

用MATLAB编写一个基于卷积神经网络-双向门控循环单元结合SE注意力机制的分类预测模型

写一段基于卷积神经网络的微博情感分析的国内外研究现状1000字

用图片的方式介绍一下3D卷积神经网络在TVG中的过程

基于深度学习的行为检测开源模型有哪些？

最新资源