基于预训练词向量的句子分类卷积神经网络研究

下载需积分: 50 | PDF格式 | 247KB | 更新于2024-09-08 | 32 浏览量 | 举报

有關CNN的文献资料本资源摘要信息是关于 Convolutional Neural Networks（CNN）的文献资料，主要关注国外最新的使用 CNN 进行语义分类的文章。 CNN 概述 Convolutional Neural Networks（CNN）是一种深度学习算法，主要用于图像识别和自然语言处理等领域。CNN 的核心思想是使用多层感知机来学习图像或文本的表示，通过卷积和池化操作来提取特征，然后使用全连接层来分类或回归。 CNN 在自然语言处理中的应用在自然语言处理领域，CNN 主要用于文本分类、命名实体识别、机器翻译等任务。CNN 可以学习文本的表示，从而捕捉文本中的语义信息。例如，在文本分类任务中，CNN 可以学习文本的表示，然后使用 softmax 输出层来分类。论文概述论文《Convolutional Neural Networks for Sentence Classification》介绍了一种使用 CNN 进行句子分类的方法。作者使用了预训练的词向量，并在其基础上训练了一个简单的 CNN 模型。实验结果表明，该方法在多个基准测试集上获得了优秀的结果。论文的贡献论文的贡献主要有两个方面：第一，作者证明了使用预训练的词向量和简单的 CNN 模型可以获得优秀的结果；第二，作者提出了一个简单的修改来允许使用任务特定向量和静态向量。论文的结论论文的结论是，使用 CNN 进行句子分类可以获得优秀的结果，并且可以通过 fine-tuning 任务特定向量来进一步提高性能。此外，作者还提出了一个简单的修改来允许使用任务特定向量和静态向量。 CNN 模型的架构论文中使用的 CNN 模型架构主要包括以下几个部分： * 输入层：将句子转换为词向量表示 * 卷积层：使用卷积操作来提取特征 * 池化层：使用池化操作来降低维数 * 全连接层：使用全连接层来分类 * 输出层：使用 softmax 输出层来获得分类结果结论本资源摘要信息提供了关于 CNN 在自然语言处理领域的应用，特别是句子分类任务的研究成果。论文证明了使用 CNN 进行句子分类可以获得优秀的结果，并且可以通过 fine-tuning 任务特定向量来进一步提高性能。

Convolutional Neural Networks for Sentence Classiﬁcation

Yoon Kim

New York University

yhk255@nyu.edu

Abstract

We report on a series of experiments with

convolutional neural networks (CNN)

trained on top of pre-trained word vec-

tors for sentence-level classiﬁcation tasks.

We show that a simple CNN with lit-

tle hyperparameter tuning and static vec-

tors achieves excellent results on multi-

ple benchmarks. Learning task-speciﬁc

vectors through ﬁne-tuning offers further

gains in performance. We additionally

propose a simple modiﬁcation to the ar-

chitecture to allow for the use of both

task-speciﬁc and static vectors. The CNN

models discussed herein improve upon the

state of the art on 4 out of 7 tasks, which

include sentiment analysis and question

classiﬁcation.

1 Introduction

Deep learning models have achieved remarkable

results in computer vision (Krizhevsky et al.,

2012) and speech recognition (Graves et al., 2013)

in recent years. Within natural language process-

ing, much of the work with deep learning meth-

ods has involved learning word vector representa-

tions through neural language models (Bengio et

al., 2003; Yih et al., 2011; Mikolov et al., 2013)

and performing composition over the learned word

vectors for classiﬁcation (Collobert et al., 2011).

Word vectors, wherein words are projected from a

sparse, 1-of-V encoding (here V is the vocabulary

size) onto a lower dimensional vector space via a

hidden layer, are essentially feature extractors that

encode semantic features of words in their dimen-

sions. In such dense representations, semantically

close words are likewise close—in euclidean or

cosine distance—in the lower dimensional vector

space.

Convolutional neural networks (CNN) utilize

layers with convolving ﬁlters that are applied to

local features (LeCun et al., 1998). Originally

invented for computer vision, CNN models have

subsequently been shown to be effective for NLP

and have achieved excellent results in semantic

parsing (Yih et al., 2014), search query retrieval

(Shen et al., 2014), sentence modeling (Kalch-

brenner et al., 2014), and other traditional NLP

tasks (Collobert et al., 2011).

In the present work, we train a simple CNN with

one layer of convolution on top of word vectors

obtained from an unsupervised neural language

model. These vectors were trained by Mikolov et

al. (2013) on 100 billion words of Google News,

and are publicly available.

We initially keep the

word vectors static and learn only the other param-

eters of the model. Despite little tuning of hyper-

parameters, this simple model achieves excellent

results on multiple benchmarks, suggesting that

the pre-trained vectors are ‘universal’ feature ex-

tractors that can be utilized for various classiﬁca-

tion tasks. Learning task-speciﬁc vectors through

ﬁne-tuning results in further improvements. We

ﬁnally describe a simple modiﬁcation to the archi-

tecture to allow for the use of both pre-trained and

task-speciﬁc vectors by having multiple channels.

Our work is philosophically similar to Razavian

et al. (2014) which showed that for image clas-

siﬁcation, feature extractors obtained from a pre-

trained deep learning model perform well on a va-

riety of tasks—including tasks that are very dif-

ferent from the original task for which the feature

extractors were trained.

2 Model

The model architecture, shown in ﬁgure 1, is a

slight variant of the CNN architecture of Collobert

et al. (2011). Let x

∈ R

be the k-dimensional

word vector corresponding to the i-th word in the

sentence. A sentence of length n (padded where

https://code.google.com/p/word2vec/

arXiv:1408.5882v2 [cs.CL] 3 Sep 2014

下载后可阅读完整内容，剩余5页未读，立即下载

wangjun19007

粉丝: 1

基于预训练词向量的句子分类卷积神经网络研究

卷积神经网络相关文献整理

深度学习卷积神经网络图像参考文献

R-CNN系列三篇论文英文原文

一些图像匹配算法的相关文献资料

Deep learning: basics and CNN很好的深度学习DNN英文文献

深入学习卷积神经网络（CNN）外文文献精读

创新融合：SCSSA+CNN-BiLSTM无文献优化算法实操与应用

写一篇有关CNN预测的文献综述

关于cnn的参考文献

cnn-lstm参考文献

最新资源