统一神经网络架构：自然语言处理基础

需积分: 29 122 浏览量更新于2024-07-18 收藏 727KB PDF 举报

"自然语言处理几乎从零开始" 自然语言处理（NLP）是人工智能领域的一个重要分支，旨在使计算机能够理解和生成人类语言。本文“自然语言处理几乎从零开始”探讨了一种统一的神经网络架构和学习算法，该算法可以应用于多个NLP任务，包括词性标注、词块提取、命名实体识别和语义角色标注等。作者们提出的方法旨在避免针对特定任务的工程设计，从而不依赖于大量的先验知识。传统的NLP系统通常会利用人为设计的输入特征，这些特征针对每个任务都经过精心优化。然而，新方法更倾向于让系统从大量未标记的训练数据中学习内部表示，这种方法被称为无监督学习或半监督学习。神经网络是实现这一目标的关键工具，尤其是卷积神经网络（CNN）和深度学习模型。卷积神经网络在图像处理中已经取得了巨大成功，而在NLP领域，它们被用来捕捉文本中的局部结构和模式。深度学习则通过多层非线性变换允许模型学习复杂的语言结构和语义。文章中提到的系统不仅在性能上表现出色，而且计算需求较低，这意味着它可以作为一个免费且高效的基础工具，供研究者和开发者使用。关键词包括自然语言处理、神经网络，表明了该研究的核心关注点。该文的贡献在于提供了一个通用的框架，使得无需对每个NLP任务进行单独的特征工程，而是通过网络自我学习来实现任务的泛化。这种通用性对于简化NLP系统的开发流程和提高效率具有重要意义。此外，这种方法也强调了无监督学习在处理大规模无标签数据时的能力，这对于资源有限的环境尤其有用。尽管这种方法减少了对领域专业知识的依赖，但仍然需要大量的训练数据来驱动模型学习。因此，数据收集和预处理仍然是NLP研究中的关键步骤。同时，尽管模型可能在某些任务上表现良好，但在处理特定语言现象或复杂语境时，可能仍需要进一步的定制和调整。这篇文章展示了如何使用深度学习技术来构建一个灵活且高效的NLP系统，它可以从大量的文本数据中自动学习和提取语言特征，为后续的NLP研究和应用提供了新的思路和工具。

arXiv

Natural Language Processing (almost) from Scratch

3.1 Transforming Words into Feature Vectors

One of the essential key points of our architecture is its ability to perform well with the

use of (almost

) raw words. The ability for our method to learn good word representations

is thus crucial to our approach. For eﬃciency, words are fed to our architecture as indices

taken from a ﬁnite dictionary D. Obviously, a simple index does not carry much useful

information about the word. However, the ﬁrst layer of our network maps each of these

word indices into a feature vector, by a lookup table operation. Given a task of interest, a

relevant representation of each word is then given by the corresponding lookup table feature

vector, which is trained by backpropagation.

More formally, for each word w ∈ D, an internal d

wrd

-dimensional feature vector

representation is given by the lookup table layer LT

(·):

(w) = hW i

where W ∈ R

wrd

×|D|

is a matrix of parameters to be learnt, hW i

∈ R

wrd

is the w

column of W and d

wrd

is the word vector size (a hyper-parameter to be chosen by the user).

Given a sentence or any sequence of T words [w]

in D, the lookup table layer applies the

same operation for each word in the sequence, producing the following output matrix:

([w]

) =



hW i

[w]

hW i

[w]

. . . hW i

[w]



. (1)

This matrix can then be fed to further neural network layers, as we will see below.

3.1.1 Extending to Any Discrete Features

One might want to provide features other than words if one suspects that these features are

helpful for the task of interest. For example, for the NER task, one could provide a feature

which says if a word is in a gazetteer or not. Another common practice is to introduce some

basic pre-processing, such as word-stemming or dealing with upper and lower case. In this

latter option, the word would be then represented by three discrete features: its lower case

stemmed root, its lower case ending, and a capitalization feature.

Generally speaking, we can consider a word as represented by K discrete features w ∈

×· · ·×D

, where D

is the dictionary for the k

feature. We associate to each feature a

lookup table LT

(·), with parameters W

∈ R

wrd

×|D

where d

wrd

∈ N is a user-speciﬁed

vector size. Given a word w, a feature vector of dimension d

wrd

is then obtained

by concatenating all lookup table outputs:

,...,W

(w) =







)



















8. We did some pre-processing, namely lowercasing and encoding capitalization as another feature. With

enough (unlabeled) training data, presumably we could learn a model without this processing. Ideally,

an even more raw input would be to learn from letter sequences rather than words, however we felt that

this was beyond the scope of this work.

arXiv

Collobert, Weston, Bottou, Karlen, Kavukcuoglu and Kuksa

The matrix output of the lookup table layer for a sequence of words [w]

is then similar

to (1), but where extra rows have been added for each discrete feature:

,...,W

([w]

) =







]

. . . hW

]

. . . hW

]







. (2)

These vector features in the lookup table eﬀectively learn features for words in the dictionary.

Now, we want to use these trainable features as input to further layers of trainable feature

extractors, that can represent groups of words and then ﬁnally sentences.

3.2 Extracting Higher Level Features from Word Feature Vectors

Feature vectors produced by the lookup table layer need to be combined in subsequent layers

of the neural network to produce a tag decision for each word in the sentence. Producing

tags for each element in variable length sequences (here, a sentence is a sequence of words)

is a standard problem in machine-learning. We consider two common approaches which tag

one word at the time: a window approach, and a (convolutional) sentence approach.

3.2.1 Window Approach

A window approach assumes the tag of a word depends mainly on its neighboring words.

Given a word to tag, we consider a ﬁxed size k

(a hyper-parameter) window of words

around this word. Each word in the window is ﬁrst passed through the lookup table layer (1)

or (2), producing a matrix of word features of ﬁxed size d

wrd

× k

. This matrix can be

viewed as a d

wrd

-dimensional vector by concatenating each column vector, which can be

fed to further neural network layers. More formally, the word feature window given by the

ﬁrst network layer can be written as:

= hLT

([w]

win







hW i

[w]

t−d

win

hW i

[w]

hW i

[w]

t+d

win







. (3)

Linear Layer The ﬁxed size vector f

can be fed to one or several standard neural

network layers which perform aﬃne transformations over their inputs:

= W

l−1

+ b

, (4)

where W

∈ R

×n

l−1

and b

∈ R

are the parameters to be trained. The hyper-parameter

is usually called the number of hidden units of the l

layer.

HardTanh Layer Several linear layers are often stacked, interleaved with a non-linearity

function, to extract highly non-linear features. If no non-linearity is introduced, our network

剩余46页未读，继续阅读

yqnt418

粉丝: 3
资源: 5

统一神经网络架构：自然语言处理基础

神经网络架构：从零开始的自然语言处理

NLP实战入门：掌握自然语言处理从零开始

HanLP自然语言处理Python入门实践

从零开始自然语言处理

RNN自然语言处理：从零开始构建智能模型

【Python自然语言处理入门】：掌握10大核心技能，从零开始构建文本处理基础

Python-TransferNLP自然语言处理库

Python自然语言处理-BERT实战

Perl语言编程入门：从零开始学习

Ruby语言入门教程：从零开始掌握编程

最新资源