门控卷积神经网络在句子分类中的应用与优势

94 浏览量更新于2024-07-14 收藏 1.95MB PDF 举报

"本文主要介绍了一种用于句子分类的新型深度学习模型——注意力门控卷积神经网络（AGCNN）。该模型旨在解决句子分类中由于上下文信息有限而导致的挑战。AGCNN通过专用的卷积编码器从不同大小的特征上下文窗口中生成注意力权重，从而增强关键特征对句子类别预测的影响。实验结果显示，AGCNN相比于标准的CNN模型在精度上有3.1%的提升，并在多个任务中表现出优于基准的结果。此外，文中还提出了一种新的激活函数——自然对数重定比例的整流线性单位（NLReLU），其性能优于传统的ReLU激活函数，并且在AGCNN中与其他知名激活函数表现相当。" 在这篇文章中，作者首先指出了句子分类任务的困难，即句子内部的上下文信息相对有限，这给理解和分类带来了挑战。为了解决这个问题，他们提出了AGCNN模型。AGCNN的核心在于其门控机制，它能够根据不同的特征上下文窗口生成注意力权重，使得模型能够关注到对分类任务更重要的部分，有效地利用了有限的上下文信息。这种机制提高了模型对关键特征的提取能力，增强了它们在句子类别预测中的影响力。在实验部分，AGCNN被与标准的卷积神经网络（CNN）模型进行对比。结果显示，AGCNN的分类精度显著提高，达到了3.1%的提升，这意味着在处理同样的句子分类任务时，AGCNN能够更准确地识别和分类句子。此外，AGCNN在六项任务中有四项目标上超越了基准模型，显示出其在多种场景下的适用性和有效性。同时，为了进一步优化模型性能，作者还设计了一种新的激活函数NLReLU。传统的ReLU激活函数在处理负值输入时存在“死亡ReLU”问题，即一部分神经元可能永久失效。NLReLU通过自然对数重定比例的方式，改善了这一问题，实验表明NLReLU在AGCNN中的性能优于ReLU，并且可以与其它知名的激活函数相媲美，为神经网络的学习提供了更稳定的梯度传播和更好的非线性转换。 AGCNN模型和NLReLU激活函数为句子分类任务提供了一种有效且有竞争力的解决方案，展示了深度学习在处理语言理解问题上的潜力，特别是在有限信息条件下如何通过注意力机制和改进的激活函数来提升模型性能。

4



NeuralNetwork(GCNN)[7]firstlyintroducedthegatingmechanismintoCNNforthe

language modeling, which could reduce the vanishing gradient problem for deep

architectures. GCNN[7]utilized halfoftheabstract featuresasthegatingweightsto

control the other half abstract features. However, since the weights and the abstract

featuresareconvolvedatthesamelevel,theinformationcarriedbythecontrolweights

isverymonotonous.Inthispaper,wealsointroducethegatingmechanisminCNN,but

the control weights, i.e., attention weights, are generated by a variety of specialized

convolution kernels. Therefore, the contextual information of a particular context

windowisintegratedintothecontrolweights.

Attention mechanisms attempt to mimic the human’s perception, which focus

attention selectively on parts of the target areas to obtain more details of the targets

while suppressing other useless information. Mnih et al. [33] firstly applied attention

mechanisminRNNforimageclassification.Thentheextensionsoftheattention-based

RNN model are applied to various NLP tasks [2,28]. Attention mechanism in neural

networks has attracted much attention and has been applied in a variety of neural

network architectures including encoder-decoder [55]. The process of focusing

attention in these architectures mainly reflected in the calculation of the weight

coefficient.Thelargertheweight,themoretheattentionfocusedonitscorresponding

value, that is, weight represents the importance of information, and value is its

correspondinginformation.Recently,howtousetheattentionmechanisminCNNshas

becomearesearchhotspot[51].

Activation functions havea crucial impact on the neural networks’performance.

Sigmoid[10],RectifiedLinearUnit(ReLU)[38],Softplus[38],LeakyReLU(LReLU)

[34],ParametricReLU (PReLU)[17],ExponentialLinear Unit(ELU) [5]andScaled

ExponentialLinearUnit(SELU)[26]areallfairly-knownandwidely-usedactivation

units.Activationfunctionsmakeitpossibletocarryoutthenon-lineartransformation

of the input to solve the complex problems. However, it may also bring with

disadvantages,e.g.,vanishinggradientandneuronaldeath.Therefore,itisessentialto

choosetheappropriateactivationfunctionfortheneuralnetwork.

3. The proposed model

CNN is very suitable for natural language processing, because CNN not only

allows to precisely control the length of dependencies but also enables nearby input

elementstointeractatlowerlayerswhiledistantelementsinteractathigherlayers,and

CNNcanproducethehierarchicalabstractrepresentationsoftheinputtextbystacking

multipleconvolutionlayers.Mostcurrentmethodsforsentenceclassificationbasedon

CNN intend to utilize the pooling layer to find the most significant features. In this

paper, we construct an attention-gated layer before pooling layer to identify critical

features,suppresstheimpactofotherunimportantfeaturesandhelppoolinglayerfind

thegenuinelycrucialfeatures.

In this section, wedescribe our model in detail. As depicts in Fig. 1, our model

consistsofaconvolutionallayeroperatingontheinputsentencematrix,anattention-

gated layer,a max-over-timepooling layer, anda fullyconnectedlayer with dropout

andsoftmaxoutput.Wechooseasentencewiththelengthnof7andthewordvectors'

dimensionality d of 4 as an example. In the demo model shown in Fig. 1, the first

convolutionallayerusesconvolutionkernelswiththewindowsizehof2or3words,

andtheconvolutionlayerintheattentiongatedlayerusesconvolutionkernelswiththe

剩余18页未读，继续阅读

weixin_38699352

粉丝: 8
资源: 920

门控卷积神经网络在句子分类中的应用与优势

Python-TensorFlow实现卷积神经网络对句子分类的任务

基于卷积门控循环单元结合注意力机制(CNN-GRU-Attention)时间序列预测（Matlab完整源码和数据）

generative_inpainting：具有上下文注意和门控卷积的DeepFill v1v2，CVPR 2018和ICCV 2019 Oral

AGCNN：提升句子分类的注意力门控卷积神经网络

基于注意力门控卷积循环神经网络的通用音频标记.pdf

基于卷积神经网络和双向门控循环单元网络注意力机制的情感分析.pdf

麻雀算法优化注意力机制卷积神经网络结合门控循环单元SSA-Attention-CNN-GRU预测3247期.zip

注意力机制卷积神经网络结合门控单元CNN-GRU-SAM-Attention柴油机故障诊断【4910期.zip

【KOA-MultiAttention-CNN-GRU回归预测】基于开普勒算法优化多头注意力机制卷积神经网络结合门控循环单元实现

CNN-GRU-SAM-Attention分类-基于卷积神经网络结合门控循环单元-空间注意力机制多特征分类预测

最新资源