《Natural Language Processing》教材概览：学习、建模与应用

需积分: 1 96 浏览量更新于2024-07-18 收藏 5.64MB PDF 举报

"《Natural Language Processing》是一本深入探讨自然语言处理（NLP）的教材，作者Jacob Eisenstein。本书涵盖了NLP领域的核心概念，包括监督与无监督学习方法、序列建模、语义理解，以及这些技术在信息抽取、机器翻译和文本生成等实际应用中的实践。" 在自然语言处理领域，这本书首先介绍了NLP的基本概念和其与其他领域的关系，如人工智能、计算机科学和统计学。接着，书中突出了NLP的三个关键主题：学习与知识、搜索与学习，以及关系性、组合性和分布性视角。这些主题是理解NLP复杂性的基石。第一部分“Learning”深入讨论了用于NLP的各种学习方法。例如，线性文本分类是一个重要的话题，其中包含了朴素贝叶斯分类器的介绍，包括类型和令牌的处理、预测、估计、平滑及最大似然估计。此外，书中还涵盖了判别式学习，如感知机和平均感知机，以及它们在优化过程中的作用。进一步，损失函数和大 margin 分类被用来引入支持向量机（SVM），并解释了松弛变量的概念。最后，逻辑回归作为另一种重要的分类工具被讨论，强调了正则化及其梯度计算的重要性。通过这些章节，读者不仅可以了解到NLP的基本原理，还能掌握如何运用这些知识去解决实际问题，比如信息检索、机器翻译和自动生成文本等。这本书为初学者提供了扎实的理论基础，同时也为有经验的研究者提供了深入研究NLP的宝贵资源。无论是对NLP感兴趣的学者，还是希望在相关领域进行研究或开发的工程师，都将从中受益匪浅。

14 CHAPTER 1. INTRODUCTION

natural language processing is focused on the design and analysis of computational al-352

gorithms and representations for processing natural human language. The goal of natu-353

ral language processing is to provide new computational capabilities around human lan-354

guage: for example, extracting information from texts, translating between languages, an-355

swering questions, holding a conversation, taking instructions, and so on. Fundamental356

linguistic insights may be crucial for accomplishing these tasks, but success is ultimately357

measured by whether and how well the job gets done.358

Machine Learning Contemporary approaches to natural language processing rely heav-359

ily on machine learning, which makes it possible to build complex computer programs360

from examples. Machine learning provides an array of general techniques for tasks like361

converting a sequence of discrete tokens in one vocabulary to a sequence of discrete to-362

kens in another vocabulary — a generalization of what normal people might call “transla-363

tion.” Much of today’s natural language processing research can be thought of as applied364

machine learning. However, natural language processing has characteristics that distin-365

guish it from many of machine learning’s other application domains.366

• Unlike images or audio, text data is fundamentally discrete, with meaning created367

by combinatorial arrangements of symbolic units. This is particularly consequential368

for applications in which text is the output, such as translation and summarization,369

because it is not possible to gradually approach an optimal solution.370

• Although the set of words is discrete, new words are always being created. Further-371

more, the distribution over words (and other linguistic elements) resembles that of a372

power law (Zipf, 1949): there will be a few words that are very frequent, and a long373

tail of words that are rare. A consequence is that natural language processing algo-374

rithms must be especially robust to observations that do not occur in the training375

data.376

• Language is recursive: units such as words can combine to create phrases, which377

can combine by the very same principles to create larger phrases. For example, a378

noun phrase can be created by combining a smaller noun phrase with a preposi-379

tional phrase, as in the whiteness of the whale. The prepositional phrase is created by380

combining a preposition (in this case, of ) with another noun phrase (the whale). In381

this way, it is possible to create arbitrarily long phrases, such as,382

(1.1) . . . huge globular pieces of the whale of the bigness of a human head.

383

The meaning of such a phrase must be analyzed in accord with the underlying hier-384

archical structure. In this case, huge globular pieces of the whale acts as a single noun385

phrase, which is conjoined with the prepositional phrase of the bigness of a human386

Throughout the text, this notation will be used to introduce linguistic examples.

1.1. NATURAL LANGUAGE PROCESSING AND ITS NEIGHBORS 15

head. The interpretation would be different if instead, huge globular pieces were con-387

joined with the prepositional phrase of the whale of the bigness of a human head —388

implying a disappointingly small whale. Even though text appears as a sequence,389

machine learning methods must account for its implicit recursive structure.390

Artiﬁcial Intelligence The goal of artiﬁcial intelligence is to build software and robots391

with the same range of abilities as humans (Russell and Norvig, 2009). Natural language392

processing is relevant to this goal in several ways. The capacity for language is one of the393

central features of human intelligence, and no artiﬁcial intelligence program could be said394

to be complete without the ability to communicate in words.

395

Much of artiﬁcial intelligence research is dedicated to the development of systems396

that can reason from premises to a conclusion, but such algorithms are only as good as397

what they know (Dreyfus, 1992). Natural language processing is a potential solution to398

the “knowledge bottleneck”, by acquiring knowledge from natural language texts, and399

perhaps also from conversations; This idea goes all the way back to Turing’s 1949 pa-400

per Computing Machinery and Intelligence, which proposed the Turing test and helped to401

launch the ﬁeld of artiﬁcial intelligence (Turing, 2009).402

Conversely, reasoning is sometimes essential for basic tasks of language processing,403

such as determining who a pronoun refers to. Winograd schemas are examples in which404

a single word changes the likely referent of a pronoun, in a way that seems to require405

knowledge and reasoning to decode (Levesque et al., 2011). For example,406

(1.2) The trophy doesn’t ﬁt into the brown suitcase because it is too [small/large].407

When the ﬁnal word is small, then the pronoun it refers to the suitcase; when the ﬁnal408

word is large, then it refers to the trophy. Solving this example requires spatial reasoning;409

other schemas require reasoning about actions and their effects, emotions and intentions,410

and social conventions.411

The Winograd schemas demonstrate that natural language understanding cannot be412

achieved in isolation from knowledge and reasoning. Yet the history of artiﬁcial intelli-413

gence has been one of increasing specialization: with the growing volume of research in414

subdisciplines such as natural language processing, machine learning, and computer vi-415

This view seems to be shared by some, but not all, prominent researchers in artiﬁcial intelli-

gence. Michael Jordan, a specialist in machine learning, has said that if he had a billion dollars

to spend on any large research project, he would spend it on natural language processing (https:

//www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/). On the

other hand, in a public discussion about the future of artiﬁcial intelligence in February 2018, com-

puter vision researcher Yann Lecun argued that language was perhaps the “50th most important”

thing to work on, and that it would be a great achievement if AI could attain the capabilities of an

orangutan, which presumably do not include language (http://www.abigailsee.com/2018/02/21/

deep-learning-structure-and-innate-priors.html).

16 CHAPTER 1. INTRODUCTION

sion, it is difﬁcult for anyone to maintain expertise across the entire ﬁeld. Still, recent work416

has demonstrated interesting connections between natural language processing and other417

areas of AI, including computer vision (e.g., Antol et al., 2015) and game playing (e.g.,418

Branavan et al., 2009). The dominance of machine learning throughout artiﬁcial intel-419

ligence has led to a broad consensus on representations such as graphical models and420

knowledge graphs, and on algorithms such as backpropagation and combinatorial opti-421

mization. Many of the algorithms and representations covered in this text are part of this422

consensus.423

Computer Science The discrete and recursive nature of natural language invites the ap-424

plication of theoretical ideas from computer science. Linguists such as Chomsky and425

Montague have shown how formal language theory can help to explain the syntax and426

semantics of natural language. Theoretical models such as ﬁnite-state and pushdown au-427

tomata are the basis for many practical natural language processing systems. Algorithms428

for searching the combinatorial space of analyses of natural language utterances can be429

analyzed in terms of their computational complexity, and theoretically motivated approx-430

imations can sometimes be applied.431

The study of computer systems is also relevant to natural language processing. Pro-432

cessing large datasets of unlabeled text is a natural application for parallelization tech-433

niques like MapReduce (Dean and Ghemawat, 2008; Lin and Dyer, 2010); handling high-434

volume streaming data sources such as social media is a natural application for approx-435

imate streaming and sketching techniques (Goyal et al., 2009). When deep neural net-436

works are implemented in production systems, it is possible to eke out speed gains using437

techniques such as reduced-precision arithmetic (Wu et al., 2016). Many classical natu-438

ral language processing algorithms are not naturally suited to graphics processing unit439

(GPU) parallelization, suggesting directions for further research at the intersection of nat-440

ural language processing and computing hardware (Yi et al., 2011).441

Speech Processing Natural language is often communicated in spoken form, and speech442

recognition is the task of converting an audio signal to text. From one perspective, this is443

a signal processing problem, which might be viewed as a preprocessing step before nat-444

ural language processing can be applied. However, context plays a critical role in speech445

recognition by human listeners: knowledge of the surrounding words inﬂuences percep-446

tion and helps to correct for noise (Miller et al., 1951). For this reason, speech recognition447

is often integrated with text analysis, particularly with statistical language model, which448

quantify the probability of a sequence of text (see chapter 6). Beyond speech recognition,449

the broader ﬁeld of speech processing includes the study of speech-based dialogue sys-450

tems, which are brieﬂy discussed in chapter 19. Historically, speech processing has often451

been pursued in electrical engineering departments, while natural language processing452

1.2. THREE THEMES IN NATURAL LANGUAGE PROCESSING 17

has been the purview of computer scientists. For this reason, the extent of interaction453

between these two disciplines is less than it might otherwise be.454

Others Natural language processing plays a signiﬁcant role in emerging interdisciplinary455

ﬁelds like computational social science and the digital humanities. Text classiﬁcation456

(chapter 4), clustering (chapter 5), and information extraction (chapter 17) are particularly457

useful tools; another is probabilistic topic models (Blei, 2012), which are not covered in458

this text. Information retrieval (Manning et al., 2008) makes use of similar tools, and459

conversely, techniques such as latent semantic analysis (§ 14.3) have roots in information460

retrieval. Text mining is sometimes used to refer to the application of data mining tech-461

niques, especially classiﬁcation and clustering, to text. While there is no clear distinction462

between text mining and natural language processing (nor between data mining and ma-463

chine learning), text mining is typically less concerned with linguistic structure, and more464

interested in fast, scalable algorithms.465

1.2 Three themes in natural language processing466

Natural language processing covers a diverse range of tasks, methods, and linguistic phe-467

nomena. But despite the apparent incommensurability between, say, the summarization468

of scientiﬁc articles (§ 16.3.4.1) and the identiﬁcation of sufﬁx patterns in Spanish verbs469

(§ 9.1.4.3), some general themes emerge. Each of these themes can be expressed as an470

opposition between two extreme viewpoints on how to process natural language, and in471

each case, existing approaches can be placed on a continuum between these two extremes.472

1.2.1 Learning and knowledge473

A recurring topic of debate is the relative importance of machine learning and linguistic474

knowledge. On one extreme, advocates of “natural language processing from scratch” (Col-475

lobert et al., 2011) propose to use machine learning to train end-to-end systems that trans-476

mute raw text into any desired output structure: e.g., a summary, database, or transla-477

tion. On the other extreme, the core work of natural language processing is sometimes478

taken to be transforming text into a stack of general-purpose linguistic structures: from479

subword units called morphemes, to word-level parts-of-speech, to tree-structured repre-480

sentations of grammar, and beyond, to logic-based representations of meaning. In theory,481

these general-purpose structures should then be able to support any desired application.482

The end-to-end learning approach has been buoyed by recent results in computer vi-483

sion and speech recognition, in which advances in machine learning have swept away484

expert-engineered representations based on the fundamentals of optics and phonology (Krizhevsky485

et al., 2012; Graves and Jaitly, 2014). But while some amount of machine learning is an el-486

ement of nearly every contemporary approach to natural language processing, linguistic487

18 CHAPTER 1. INTRODUCTION

representations such as syntax trees have not yet gone the way of the visual edge detector488

or the auditory triphone. Linguists have argued for the existence of a “language faculty”489

in all human beings, which encodes a set of abstractions specially designed to facilitate490

the understanding and production of language. The argument for the existence of such491

a language faculty is based on the observation that children learn language faster and492

from fewer examples than would be reasonably possible, if language was learned from493

experience alone.

Regardless of the cognitive validity of these arguments, it seems that494

linguistic structures are particularly important in scenarios where training data is limited.495

Moving away from the extreme ends of the continuum, there are a number of ways in496

which knowledge and learning can be combined in natural language processing. Many497

supervised learning systems make use of carefully engineered features, which transform498

the data into a representation that can facilitate learning. For example, in a task like doc-499

ument classiﬁcation, it may be useful to identify each word’s stem, so that a learning500

system can more easily generalize across related terms such as whale, whales, whalers, and501

whaling. This is particularly important in the many languages that exceed English in the502

complexity of the system of afﬁxes that can attach to words. Such features could be ob-503

tained from a hand-crafted resource, like a dictionary that maps each word to a single504

root form. Alternatively, features can be obtained from the output of a general-purpose505

language processing system, such as a parser or part-of-speech tagger, which may itself506

be built on supervised machine learning.507

Another synthesis of learning and knowledge is in model structure: building machine508

learning models whose architectures are inspired by linguistic theories. For example, the509

organization of sentences is often described as compositional, with meaning of larger510

units gradually constructed from the meaning of their smaller constituents. This idea511

can be built into the architecture of a deep neural network, which is then trained using512

contemporary deep learning techniques (Dyer et al., 2016).513

The debate about the relative importance of machine learning and linguistic knowl-514

edge sometimes becomes heated. No machine learning specialist likes to be told that their515

engineering methodology is unscientiﬁc alchemy;

nor does a linguist want to hear that516

the search for general linguistic principles and structures has been made irrelevant by big517

data. Yet there is clearly room for both types of research: we need to know how far we518

can go with end-to-end learning alone, while at the same time, we continue the search for519

linguistic representations that generalize across applications, scenarios, and languages.520

For more on the history of this debate, see Church (2011); for an optimistic view of the521

potential symbiosis between computational linguistics and deep learning, see Manning522

The Language Instinct (Pinker, 2003) articulates these arguments in an engaging and popular style. For

arguments against the innateness of language, see Elman et al. (1998).

Ali Rahimi argued that much of deep learning research was similar to “alchemy” in a presentation at

the 2017 conference on Neural Information Processing Systems. He was advocating for more learning theory,

not more linguistics.

剩余572页未读，继续阅读

google_xperia

粉丝: 0
资源: 1

《Natural Language Processing》教材概览：学习、建模与应用

统计自然语言处理基础：英文版概览

Python自然语言处理实践指南

掌握Python进行自然语言处理的英文原版教程

自然语言处理英语翻译代码

自然语言处理教材英文版

自然语言处理

自然语言处理纳米度：Udacity自然语言处理纳米度

英文版自然语言处理综论

自然语言处理宗论中英文

统计自然语言处理基础 英文+中文

最新资源

统计自然语言处理基础英文+中文