深度学习在自然语言处理中的应用综述

需积分: 8 88 浏览量更新于2024-07-14 收藏 736KB PDF 举报

"这篇PDF是2021年发表在IEEE Transactions on Neural Networks and Learning Systems上的文章《深度学习在自然语言处理中的应用调查》。文章由Daniel W. Otter、Julian R. Medina和Jugal K. Kalita撰写，主要讨论了深度学习如何推动自然语言处理（NLP）领域的发展，提供了对深度学习架构和方法的简要介绍，并总结了近年来的相关研究贡献。涉及的领域包括核心语言处理问题和计算语言学的应用。文章还分析了当前的研究现状，并对未来的研究方向提出了建议。关键词包括：计算语言学、深度学习、机器学习、自然语言处理（NLP）和神经网络。" 本文是关于自然语言处理领域的深度学习应用的一篇综合综述。随着深度学习模型的广泛应用，NLP领域在过去几年取得了显著进步。深度学习以其强大的模式识别和语言理解能力，为NLP任务如机器翻译、情感分析、文本分类、问答系统、语音识别和语义解析等带来了革命性的变化。文章首先介绍了深度学习的基本概念，包括深度神经网络（DNN）、卷积神经网络（CNN）、循环神经网络（RNN）、长短时记忆网络（LSTM）、门控循环单元（GRU）以及自注意力机制（Self-Attention）等，这些模型在处理序列数据时表现出色，能够捕捉到语言的复杂结构和依赖性。接着，文章详细探讨了深度学习在NLP中的应用，例如词嵌入（Word Embeddings，如Word2Vec和GloVe），它们将词汇转换为连续向量，以便更好地捕捉语义关系；预训练模型（Pre-trained Models，如BERT、GPT系列），这些模型通过大规模无监督学习获得通用表示，然后在特定任务上进行微调，极大地提高了NLP任务的性能。此外，文章还涵盖了深度学习在句法分析、语义角色标注、命名实体识别（NER）、机器翻译、情感分析和对话系统等任务中的应用，并对比了传统方法与深度学习方法的优缺点。在分析当前状态时，作者指出尽管深度学习在许多NLP任务中取得了突破，但仍然存在挑战，如模型的可解释性、泛化能力、计算效率和数据需求。同时，他们还讨论了迁移学习、多模态学习以及强化学习在NLP中的潜在作用。最后，文章给出了未来研究的建议，包括探索更有效的模型结构、减少数据需求的方法、增强模型的解释性和泛化能力，以及将深度学习与其他领域（如计算机视觉或音频处理）结合，以实现跨模态理解和交互。这篇综述为读者提供了一个全面了解深度学习在NLP领域应用的平台，不仅涵盖了技术细节，也关注了该领域的发展趋势和未来挑战。对于从事或有兴趣了解NLP和深度学习的学者来说，是一份宝贵的参考资料。

608 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 32, NO. 2, FEBRUARY 2021

On the PTB, the results were produced on par with the

existing state of the art [62]. However, the network had only

19 million trainable parameters, which is considerably lower

than others. Since the network focused on morphological

similarities produced by character-level analysis, it was more

capable than previous models of handling rare words. Analysis

showed that without the use of highway layers, many words

had nearest neighbors that were orthographically similar but

not necessarily semantically similar. In addition, the network

was capable of recognizing misspelled words or words not

spelled in the standard way (e.g., looooook instead of look)

and of recognizing out of vocabulary words. The analysis also

showed that the network was capable of identifying preﬁxes,

roots, and sufﬁxes, as well as understanding hyphenated words,

making it a robust model.

Jozefowicz et al. [63] tested a number of architectures

producing character-level outputs [55], [64]–[66]. While many

of these models had only b een tested on small-scale language

modeling, th is study tested them on a large scale, testing them

with the Billion Word Benchm ark. The most effective m odel,

achieving a state-of-the-art (for single models) perplexity

of 30.0 with 1.04 billion trainable parameters (compared to

a previous best by a single model of 51.3 with 20 billion

parameters [55]), was a large LSTM using a character-level

CNN as an input network. The b est performance, however,

was achieved using an ensemble of ten LSTMs. This ensemble,

with a perplexity of 23.7, far surpassed the previous state-of-

the-art ensemble [65], which had a perplexity of 41.0.

6) Development of Word Embeddings: Notonlydoneural

language models allow for the prediction of unseen synony-

mous words, but also they allow for modeling the relationships

between the words [67], [68]. Vectors with numeric compo-

nents, representing individual words, obtained by language

modeling techniques are called embeddings. This is usually

done either by the use of principle component analysis or by

capturing internal states in a neural language model. (Note

that these are not standard language modelings, but rather are

language modelings constructed speciﬁcally for this purpose.)

Typically, word embeddings have between 50 and 300 dimen-

sions. An overused example is that of the distributed represen-

tations of the words king, queen, man,andwoman. If one takes

the embedding vectors f or each of these words, computation

can be performed to obtain highly sensible results. If the

vectors representing these words are, r espectively, represented



k, q, m,and w, it can be observed that



k −q ≈m −w,

which is extremely intuitive to human reasoning. In recent

years, word embeddings have been the standard form of input

to NLP systems.

7) Recent Advances and Challenges: Language model-

ing has been evolving on a weekly basis, beginning with

the works of Radford et al. [69] and Peters et al. [70].

Radford et al. [69] introduced generative pretraining (GPT)

which pretrained a language model based on the transformer

model [42] (Section IV-G), learning dependencies of words

in sentences and longer segments of text, rather than just

the immediately surrounding words. Peters et al. [70] incorpo-

rated bidirectionalism to capture backward context in addition

to the forward context, in their Embeddings from Language

Models (ELMo). In addition, they captured the vectorizations

at multiple levels, rather than just the ﬁnal layer. This allowed

for multiple encodings of the same information to be captured,

which was empirically shown to signiﬁcantly boost the per-

formance.

Devlin et al. [71] added the additional unsupervised training

tasks of random masked neighbor word prediction and next-

sentence-prediction (NSP), in which given a sentence (or o ther

continuous segment of text), another sentence was predicted

to either be the next sentence or not. These Bidirectional

Encoder Representations from Transformers (BERT) were

further built upon by Liu et al. [72] to create multitask DNN

(MT-DNN) rep r esentations, which are the cu rrent state o f the

art in language modeling. The model used a stochastic answer

network (SAN) [73], [74] ontop of a BERT-like model. After

pretraining, the model was trained on a number of different

tasks before being ﬁne-tuned to the task at hand. Using

MT-DNN as the language modeling, they achieved state-of-

the-art results on ten out of eleven of the attempted tasks.

While these pretrained models have made excellent h ead-

way in “understanding” language, as is required for some tasks

such as entailment inference, it has been hypothesized by some

that these models are learning templates or syntactic patterns

present within the data sets, unrelated to logic or inference.

When new data sets are created by removing such patterns

carefully, the models do not perform well [75]. In addition,

while there has been recent work on cross-language modeling

and universal language modeling, the amount and level of

work need to pick up to address low-resource languages.

B. Morphology

Morphology is concerned with ﬁnding segments within

single words, including roots and stems, preﬁxes, sufﬁxes,

and—in some languages—inﬁxes. Afﬁxes (preﬁxes, sufﬁxes,

and inﬁxes) are used to overtly modify stems for gender,

number, person, and so on.

Luong et al. [76] constructed a morphologically aware lan-

guage modeling. An RvNN was used to model the morpho-

logical structure. A neural language model was then placed

on top of the RvNN. The model was trained on the WordSim-

353 data set [77], and segmentation was performed using Mor-

fessor [78]. Two models were constructed—one using context

and one not. It was found that the model that was insensitive

to context overaccounted for certain morphological structures.

In particular, words with the same stem were clustered together

even if they were antonyms. The context-sensitive model

performed better, noting the relationships between the stems

but also accounting for other features such as the preﬁx “un.”

The m odel was also tested on several other popular data

sets [79]–[81], signiﬁcantly outperforming previous embed-

ding models on all.

A good morphological analyzer is often important for many

NLP tasks. As such, one recent study by Belinkov et al. [82]

examined the extent to which morphology was learned and

used by a variety of neural machine translation (NMT)

models. A number of translation models were constructed,

all translating from English to French, German, Czech,

Arabic, or Hebrew. Encoders and decoders were LSTM-based

models (some with attentio n mechanisms) or character

aware CNNs, and the models were trained on the WIT

corpus [83], [84]. The decoders were then replaced with POS

taggers and morphological taggers, ﬁxing the weights of the

encoders to preserve the internal representations. The effects

of the encoders were examined as were the effects of the

Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on March 15,2021 at 09:05:41 UTC from IEEE Xplore. Restrictions apply.

剩余20页未读，继续阅读

计科学习者

粉丝: 4
资源: 1

深度学习在自然语言处理中的应用综述

CLion-2021.1.2 linux最新试用版

[Go语言入门（含源码）] The Way to Go (with source code)

The way to go

TypeError: sum() received an invalid combination of arguments - got (out=NoneType, axis=NoneType, ), but expected one of:

interface com.qctc.twoxz.hlht.service.HlhtAllocateService has 278 usages that are not safe to delete. Of those 277 usages are in strings, comments, non-code files or generated code.

能列举一下USENIX ATC、HPCA、ASPLOS、OSDI、NSDI、EuroSys会议中有关操作系统的新算法吗

var list = await _db.Queryable<com_drug>() .LeftJoin<com_drug_common_usages_admi>((a, b) => a.HOS_DRUG_CODE == b.HOS_DRUG_CODE)如何给com_drug_common_usages_admi表添加where筛选条件

usages是什么意思

最新资源