深度学习与自然语言处理：前沿与应用

需积分: 0 63 浏览量更新于2024-07-18 收藏 10.89MB PDF 举报

"《Deep Learning in Natural Language Processing》是一本由Li Deng和Yang Liu主编的书籍，详述了深度学习在自然语言处理领域的最新研究进展和应用。这本书涵盖了语音识别与理解、对话系统、词法分析、句法分析、知识图谱、机器翻译、问答系统、情感分析、社会计算以及从图像生成自然语言等多个NLP主要任务。各章节由该领域的顶级研究人员撰写，包含了技术术语和深度学习与NLP交集处常用缩写的词汇表，适合研究生、博士后、讲师、工业研究人员以及对深度学习和自然语言处理感兴趣的读者阅读。" 在自然语言处理（NLP）中，深度学习已经成为一个关键的推动者，改变了AI的多个领域的面貌，如语音、视觉、游戏等。深度学习在NLP中的成功应用不仅作为人工智能进步的基准，还展示了其在复杂语言任务中的潜力。书中重点讨论了以下几个方面： 1. **语音识别与理解**：深度学习模型如深度神经网络（DNN）、卷积神经网络（CNN）和循环神经网络（RNN）在提高语音识别准确性和理解力方面发挥了重要作用。 2. **对话系统**：深度学习通过构建更智能的对话代理，使机器能够理解并生成自然对话，从而实现更人性化的交互。 3. **词法分析**：深度学习模型可以自动学习词性标注和命名实体识别，提高语言理解的精度。 4. **句法分析**：利用深度学习的结构学习能力，可以解析句子结构，理解语义关系。 5. **知识图谱**：深度学习帮助构建和推理知识图谱，促进机器理解语义和进行复杂的问答任务。 6. **机器翻译**：基于深度学习的神经机器翻译（NMT）模型已经显著提升了翻译质量，实现了端到端的翻译。 7. **问答系统**：结合深度学习的文本理解和生成能力，可以开发出能回答各种问题的系统。 8. **情感分析**：深度学习用于识别和理解文本中的情绪，有助于社交媒体监控和市场分析。 9. **社会计算**：深度学习模型帮助分析大规模社交媒体数据，揭示用户行为模式和社会趋势。 10. **从图像生成自然语言**：通过视觉问答和图像描述生成，深度学习使机器能够将视觉信息转化为自然语言描述。此外，本书还提供了深入的技术术语和缩写词汇表，对于学者和研究人员来说，是了解深度学习在NLP领域最新进展的重要参考资料。无论是学术界还是工业界的研究人员，都可以从中受益，进一步推动NLP技术的发展。

2L.DengandY.Liu

sentiment analysis, social computing, natural language generation, and natural lan-

guage summarization. These NLP application areas form the core content of this

book.

Natural language is a system constructed speciﬁcally to convey meaning or seman-

tics, and is by its fundamental nature a symbolic or discrete system. The surface or

observable “physical” signal of natural language is called text, always in a sym-

bolic form. The text “signal” has its counterpart—the speech signal; the latter can

be regarded as the continuous correspondence of symbolic text, both entailing the

same latent linguistic hierarchy of natural language. From NLP and signal processing

perspectives, speech can be treated as “noisy” versions of text, imposing additional

difﬁculties in its need of “de-noising” when performing the task of understanding the

common underlying semantics. Chapters 2 and 3 as well as current Chap. 1 of this

book cover the speech aspect of NLP in detail, while the remaining chapters start

directly from text in discussing a wide variety of text-oriented tasks that exemplify

the pervasive NLP applications enabled by machine learning techniques, notably

deep learning.

The symbolic nature of natural language is in stark contrast to the continuous

nature of language’s neural substrate in the human brain. We will defer this discussion

to Sect. 1.6 of this chapter when discussing future challenges of deep learning in NLP.

A related contrast is how the symbols of natural language are encoded in several

continuous-valued modalities, such as gesture (as in sign language), handwriting

(as an image), and, of course, speech. On the one hand, the word as a symbol is

used as a “signiﬁer” to r efer to a concept or a thing in real world as a “signiﬁed”

object, necessarily a categorical entity. On the other hand, the continuous modalities

that encode symbols of words constitute the external signals sensed by the human

perceptual system and transmitted to the brain, which in turn operates in a continuous

fashion. While of great theoretical interest, the subject of contrasting the symbolic

nature of language versus its continuous rendering and encoding goes beyond the

scope of this book.

In the next few sections, we outline and discuss, from a historical perspective, the

development of general methodology used to study NLP as a rich interdisciplinary

ﬁeld. Much like several closely related sub- and super-ﬁelds such as conversational

systems, speech recognition, and artiﬁcial intelligence, the development of NLP can

be described in terms of three major waves (Deng 2017; Pereira 2017), each of which

is elaborated in a separate section next.

1.2 The First Wave: Rationalism

NLP research in its ﬁrst wave lasted for a long time, dating back to 1950s. In 1950,

Alan Turing proposed the Turing test to evaluate a computer’s ability to exhibit intelli-

gent behavior indistinguishable from that of a human (Turing 1950). This test is based

on natural language conversations between a human and a computer designed to gen-

erate human-like responses. In 1954, the Georgetown-IBM experiment demonstrated

1 A Joint Introduction to Natural Language Processing and to Deep Learning 3

the ﬁrst machine translation system capable of translating more than 60 Russian sen-

tences into English.

The approaches, based on the belief t hat knowledge of language in the human

mind is ﬁxed in advance by generic inheritance, dominated most of NLP research

between about 1960 and late 1980s. These approaches have been called rationalist

ones (Church 2007). The dominance of rationalist approaches in NLP was mainly

due to the widespread acceptance of arguments of Noam Chomsky for an innate

language structure and his criticism of N-grams (Chomsky 1957). Postulating that

key parts of language are hardwired in the brain at birth as a part of the human

genetic inheritance, rationalist approaches endeavored to design hand-crafted rules

to incorporate knowledge and reasoning mechanisms into intelligent NLP systems.

Up until 1980s, most notably successful NLP systems, such as ELIZA for s imulating

a Rogerian psychotherapist and MARGIE for structuring real-world information into

concept ontologies, were based on complex sets of handwritten rules.

This period coincided approximately with the early development of artiﬁcial

intelligence, characterized by expert knowledge engineering, where domain experts

devised computer programs according to the knowledge about the (very narrow)

application domains they have (Nilsson 1982;Winston1993). The experts designed

these programs using symbolic logical rules based on careful representations and

engineering of such knowledge. These knowledge-based artiﬁcial intelligence sys-

tems tend to be effective in solving narrow-domain problems by examining the

“head” or most important parameters and reaching a solution about the appropriate

action to take in each speciﬁc situation. These “head” parameters are identiﬁed in

advance by human experts, leaving the “tail” parameters and cases untouched. Since

they lack learning capability, they have difﬁculty in generalizing the solutions to new

situations and domains. The typical approach during this period is exempliﬁed by

the expert system, a computer system that emulates the decision-making ability of a

human expert. Such systems are designed to solve complex problems by reasoning

about knowledge (Nilsson 1982). The ﬁrst expert system was created in 1970s and

then proliferated in 1980s. The main “algorithm” used was the inference rules in the

form of “if-then-else” (Jackson 1998). The main strength of these ﬁrst-generation

artiﬁcial intelligence systems is its transparency and interpretability in their (limited)

capability in performing logical reasoning. Like NLP systems such as ELIZA and

MARGIE, the general expert systems in the early days used hand-crafted expert

knowledge which was often effective in narrowly deﬁned problems, although the

reasoning could not handle uncertainty that is ubiquitous in practical applications.

In speciﬁc NLP application areas of dialogue systems and spoken language under-

standing, to be described in more detail in Chaps. 2 and 3 of this book, such ratio-

nalistic approaches were represented by the pervasive use of symbolic rules and

templates (Seneff et al. 1991). The designs were centered on grammatical and onto-

logical constructs, which, while interpretable and easy to debug and update, had

experienced severe difﬁculties in practical deployment. When such systems worked,

they often worked beautifully; but unfortunately this happened just not very often

and the domains were necessarily limited.

4L.DengandY.Liu

Likewise, speech recognition research and system design, another long-standing

NLP and artiﬁcial intelligence challenge, during this rationalist era were based

heavily on the paradigm of expert knowledge engineering, as elegantly analyzed

in (Church and Mercer 1993). During 1970s and early 1980s, the expert system

approach to speech recognition was quite popular (Reddy 1976;Zue1985). How-

ever, the lack of abilities to learn from data and to handle uncertainty in reasoning was

acutely recognized by researchers, leading to the s econd wave of speech recognition,

NLP, and artiﬁcial intelligence described next.

1.3 The Second Wave: Empiricism

The second wave of NLP was characterized by the exploitation of data corpora and

of (shallow) machine learning, statistical or otherwise, to make use of such data

(Manning and Schtze 1999). As much of the structure of and theory about natural

language were discounted or discarded in favor of data-driven methods, the main

approaches developed during this era have been called empirical or pragmatic ones

(Church and Mercer 1993; Church 2014). With the increasing availability of machine-

readable data and steady increase of computational power, empirical approaches have

dominated NLP since around 1990. One of the major NLP conferences was even

named “Empirical Methods in Natural Language Processing (EMNLP)” to reﬂect

most directly the strongly positive sentiment of NLP researchers during that era

toward empirical approaches.

In contrast to rationalist approaches, empirical approaches assume that the human

mind only begins with general operations for association, pattern recognition, and

generalization. Rich sensory input is required to enable the mind to learn the detailed

structure of natural language. Prevalent in linguistics between 1920 and 1960, empiri-

cism has been undergoing a resurgence since 1990. Early empirical approaches to

NLP focused on developing generative models such as the hidden Markov model

(HMM) (Baum and Petrie 1966), the IBM translation models (Brown et al. 1993),

and the head-driven parsing models (Collins 1997) to discover the regularities of

languages from large corpora. Since late 1990s, discriminative models have become

the de facto approach in a variety of NLP tasks. Representative discriminative mod-

els and methods in NLP include the maximum entropy model (Ratnaparkhi 1997),

supporting vector machines (Vapnik 1998), conditional random ﬁelds (Lafferty et al.

2001), maximum mutual information and minimum classiﬁcation error (He et al.

2008), and perceptron (Collins 2002).

Again,this era ofempiricism in NLPwas paralleledwith corresponding approaches

in artiﬁcial intelligence as well as in speech recognition and computer vision. It came

about after clear evidence that learning and perception capabilities are crucial for

complex artiﬁcial intelligence systems but missing in the expert systems popular in

the previous wave. For example, when DARPA opened its ﬁrst Grand Challenge for

autonomous driving, most vehicles then relied on the knowledge-based artiﬁcial intel-

ligence paradigm. Much likespeech recognition and NLP, the autonomous drivingand

1 A Joint Introduction to Natural Language Processing and to Deep Learning 5

computer vision researchers immediately realized the limitation of the knowledge-

based paradigm due to the necessity for machine learning with uncertainty handling

and generalization capabilities.

The empiricism in NLP and speech recognition in this second wave was based

on data-intensive machine learning, which we now call “shallow” due to the general

lack of abstractions constructed by many-layer or “deep” representations of data

which would come in the third wave to be described in the next section. In machine

learning, researchers do not need to concern with constructing precise and exact rules

as required for the knowledge-based NLP and speech systems during the ﬁrst wave.

Rather, they focus on statistical models (Bishop 2006; Murphy 2012) or simple neural

networks (Bishop 1995) as an underlying engine. They then automatically learn or

“tune” the parameters of the engine using ample training data to make them handle

uncertainty, and to attempt to generalize from one condition to another and from one

domain to another. The key algorithms and methods for machine learning include EM

(expectation-maximization), Bayesian networks, support vector machines, decision

trees, and, for neural networks, backpropagation algorithm.

Generally speaking, the machine learning based NLP, speech, and other artiﬁcial

intelligence systems perform much better than the earlier, knowledge-based counter-

parts. Successful examples include almost all artiﬁcial intelligence tasks in machine

perception—speech recognition (Jelinek 1998), face recognition (Viola and Jones

2004), visual object recognition (Fei-Fei and Perona 2005), handwriting recognition

(Plamondon and Srihari 2000), and machine translation (Och 2003).

More speciﬁcally, in a core NLP application area of machine translation, as to be

described in detail in Chap. 6 of this book as well as in (Church and Mercer 1993), the

ﬁeld has switched rather abruptly around 1990 from rationalistic methods outlined in

Sect. 1.2 to empirical, largely statistical methods. The availability of sentence-level

alignments in the bilingual training data made it possible to acquire surface-level

translation knowledge not by rules but from data directly, at the expense of discarding

or discounting structured information in natural languages. The most representative

work during this wave is that empowered by various versions of IBM translation

models (Brown et al. 1993). Subsequent developments during this empiricist era of

machine translation further signiﬁcantly improved the quality of translation systems

(Och and Ney 2002;Och2003; Chiang 2007; He and Deng 2012), but not at the

level of massive deployment in real world (which would come after the next, deep

learning wave).

In the dialogue and spoken language understanding areas of NLP, this empiri-

cist era was also marked prominently by data-driven machine learning approaches.

These approaches were well suited to meet the requirement for quantitative evalua-

tion and concrete deliverables. They focused on broader but shallow, surface-level

coverage of text and domains instead of detailed analyses of highly restricted text

and domains. The training data were used not to design rules for language under-

standing and response action from the dialogue systems but to learn parameters of

(shallow) statistical or neural models automatically from data. Such learning helped

reduce the cost of hand-crafted complex dialogue manager’s design, and helped

improve robustness against speech recognition errors in the overall spoken language

6L.DengandY.Liu

understanding and dialogue systems; for a review, see He and Deng (2013). More

speciﬁcally, for the dialogue policy component of dialogue systems, powerful rein-

forcement learning based on Markov decision processes had been introduced during

this era; for a review, see Young et al. (2013). And for spoken language understand-

ing, the dominant methods moved from rule- or template-based ones during the ﬁrst

wave to generative models like hidden Markov models (HMMs) (Wang et al. 2011)

to discriminative models like conditional random ﬁelds (Tur and Deng 2011).

Similarly, in speech recognition, over close to 30 years from early 1980 s to around

2010, the ﬁeld was dominated by the (shallow) machine learning paradigm using the

statistical generative model based on the HMM integrated with Gaussian mixture

models, along with various versions of its generalization (Baker et al. 2009a, b;

Deng and O’Shaughnessy 2003; Rabiner and Juang 1993). Among many versions of

the generalized HMMs were statistical and neural-network-based hidden dynamic

models (Deng 1998;Bridleetal.1998; Deng and Yu 2007). The former adopted EM

and switching extended Kalman ﬁlter algorithms for learning model parameters (Ma

and Deng 2004; Lee et al. 2004), and the latter used backpropagation (Picone et al.

1999). Both of them made extensive use of multiple latent layers of representations for

the generative process of speech waveforms following the long-standing framework

of analysis-by-synthesis in human speech perception. More signiﬁcantly, inverting

this “deep” generative process to its counterpart of an end-to-end discriminative

process gave rise to the ﬁrst industrial success of deep learning (Deng et al. 2010,

2013; Hinton et al. 2012), which formed a driving force of the third wave of speech

recognition and NLP that will be elaborated next.

1.4 The Third Wave: Deep Learning

While the NLP systems, including speech recognition, language understanding, and

machine translation, developed during the second wave performed a lot better and

with higher robustness than those during the ﬁrst wave, they were far from human-

level performance and left much to desire. With a few exceptions, the (shallow)

machine learning models for NLP often did not have the capacity sufﬁciently large to

absorb the large amounts of training data. Further, the learning algorithms, methods,

and infrastructures were not powerful enough. All this changed several years ago,

giving rise to the third wave of NLP, propelled by the new paradigm of deep-structured

machine learning or deep learning (Bengio 2009; Deng and Yu 2014; LeCun et al.

2015; Goodfellow et al. 2016).

In traditional machine learning, features are designed by humans and feature

engineering is a bottleneck, requiring signiﬁcant human expertise. Concurrently,

the associated shallow models lack the representation power and hence the ability

to form levels of decomposable abstractions that would automatically disentangle

complex factors in shaping the observed language data. Deep learning breaks away

the above difﬁculties by the use of deep, layered model structure, often in the form of

neural networks, and the associated end-to-end learning algorithms. The advances in

剩余337页未读，继续阅读

monkeypony

粉丝: 17
资源: 15

深度学习与自然语言处理：前沿与应用

Deep Learning in Natural Language Processing 无水印原版pdf

Deep Learning in Natural Language Processing epub

Deep learning in natural language processing by Li Deng Springer

Deep Learning in Natural Language Processing 邓力 - 英文、文字、带目录版本

Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing MEAP

Deep Learning for Natural Language Processing--2018

[machine_learning_mastery系列]Deep Learning For Natural Language Processing

deep learning for natural language processing stephan raaijmakers

iOS版微信抢红包Tweak.zip小程序

最新资源