Python入门：自然语言处理实战指南

需积分: 10 158 浏览量更新于2024-07-20 2 收藏 11.48MB PDF 举报

"《自然语言处理入门：Python实践》是一本专为初学者打造的自然语言处理（Natural Language Processing, NLP）教材。该书由Steven Bird、Ewan Klein和Edward Loper合著，深入浅出地介绍了如何利用Python编程语言和Natural Language Toolkit (NLTK)开源库在NLP领域开展实践。书中内容涵盖了广泛的主题，旨在帮助读者掌握关键技能，包括： 1. 文本信息提取：通过Python程序，学习如何从大量无结构文本中提取有用信息，如确定主题或识别命名实体，这对于Web应用程序开发和多语言新闻分析至关重要。 2. 语言结构分析：理解文本中的语法和语义结构，涉及解析和深度理解句子的组成，这是构建智能对话系统和机器翻译的基础。 3. 数据库访问：介绍如何接入流行的语言学数据库，如WordNet，以及利用树形银行（treebanks）进行词汇和句法研究。 4. 跨学科融合：书中还将引导读者将来自语言学、人工智能等多个领域的技术整合到NLP项目中，提高解决方案的多样性和实用性。 5. 教学与应用：无论是个人自学还是课堂教学和工作坊，本书都提供丰富的实例和练习，帮助读者在实践中提升技能，对探索人类语言的工作原理非常有帮助。版权信息表明，本书版权属于Steven Bird、Ewan Klein和Edward Loper，出版于2009年，并且适用于教育、商业或销售推广用途。此外，还提供了电子版选项，可通过O'Reilly Media在线平台获取。编辑、生产编辑、校对员和封面/内部设计师等团队成员也列出了具体职责。《自然语言处理与Python》是一本实用而全面的指南，适合那些希望在NLP领域利用Python技术进行创新和解决问题的读者。无论你是软件开发者、语言学家还是对自然语言处理感兴趣的爱好者，这本书都能为你打开一扇通向这一复杂领域的窗口。"

NLTK-Data

This contains the linguistic corpora that are analyzed and processed in the book.

NumPy (recommended)

This is a scientific computing library with support for multidimensional arrays and

linear algebra, required for certain probability, tagging, clustering, and classifica-

tion tasks.

Matplotlib (recommended)

This is a 2D plotting library for data visualization, and is used in some of the book’s

code samples that produce line graphs and bar charts.

NetworkX (optional)

This is a library for storing and manipulating network structures consisting of

nodes and edges. For visualizing semantic networks, also install the Graphviz

library.

Prover9 (optional)

This is an automated theorem prover for first-order and equational logic, used to

support inference in language processing.

Natural Language Toolkit (NLTK)

NLTK was originally created in 2001 as part of a computational linguistics course in

the Department of Computer and Information Science at the University of Pennsylva-

nia. Since then it has been developed and expanded with the help of dozens of con-

tributors. It has now been adopted in courses in dozens of universities, and serves as

the basis of many research projects. Table P-2 lists the most important NLTK modules.

Table P-2. Language processing tasks and corresponding NLTK modules with examples of

functionality

Language processing task NLTK modules Functionality

Accessing corpora nltk.corpus Standardized interfaces to corpora and lexicons

String processing nltk.tokenize, nltk.stem Tokenizers, sentence tokenizers, stemmers

Collocation discovery nltk.collocations t-test, chi-squared, point-wise mutual information

Part-of-speech tagging nltk.tag n-gram, backoff, Brill, HMM, TnT

Classification nltk.classify, nltk.cluster Decision tree, maximum entropy, naive Bayes, EM, k-means

Chunking nltk.chunk Regular expression, n-gram, named entity

Parsing nltk.parse Chart, feature-based, unification, probabilistic, dependency

Semantic interpretation nltk.sem, nltk.inference Lambda calculus, first-order logic, model checking

Evaluation metrics nltk.metrics Precision, recall, agreement coefficients

Probability and estimation nltk.probability Frequency distributions, smoothed probability distributions

Applications nltk.app, nltk.chat Graphical concordancer, parsers, WordNet browser, chatbots

xiv | Preface

www.it-ebooks.info

hey321.taobao.com

Language processing task NLTK modules Functionality

Linguistic fieldwork nltk.toolbox Manipulate data in SIL Toolbox format

NLTK was designed with four primary goals in mind:

Simplicity

To provide an intuitive framework along with substantial building blocks, giving

users a practical knowledge of NLP without getting bogged down in the tedious

house-keeping usually associated with processing annotated language data

Consistency

To provide a uniform framework with consistent interfaces and data structures,

and easily guessable method names

Extensibility

To provide a structure into which new software modules can be easily accommo-

dated, including alternative implementations and competing approaches to the

same task

Modularity

To provide components that can be used independently without needing to un-

derstand the rest of the toolkit

Contrasting with these goals are three non-requirements—potentially useful qualities

that we have deliberately avoided. First, while the toolkit provides a wide range of

functions, it is not encyclopedic; it is a toolkit, not a system, and it will continue to

evolve with the field of NLP. Second, while the toolkit is efficient enough to support

meaningful tasks, it is not highly optimized for runtime performance; such optimiza-

tions often involve more complex algorithms, or implementations in lower-level pro-

gramming languages such as C or C++. This would make the software less readable

and more difficult to install. Third, we have tried to avoid clever programming tricks,

since we believe that clear implementations are preferable to ingenious yet indecipher-

able ones.

For Instructors

Natural Language Processing is often taught within the confines of a single-semester

course at the advanced undergraduate level or postgraduate level. Many instructors

have found that it is difficult to cover both the theoretical and practical sides of the

subject in such a short span of time. Some courses focus on theory to the exclusion of

practical exercises, and deprive students of the challenge and excitement of writing

programs to automatically process language. Other courses are simply designed to

teach programming for linguists, and do not manage to cover any significant NLP con-

tent. NLTK was originally developed to address this problem, making it feasible to

cover a substantial amount of theory and practice within a single-semester course, even

if students have no prior programming experience.

Preface | xv

www.it-ebooks.info

hey321.taobao.com

A significant fraction of any NLP syllabus deals with algorithms and data structures.

On their own these can be rather dry, but NLTK brings them to life with the help of

interactive graphical user interfaces that make it possible to view algorithms step-by-

step. Most NLTK components include a demonstration that performs an interesting

task without requiring any special input from the user. An effective way to deliver the

materials is through interactive presentation of the examples in this book, entering

them in a Python session, observing what they do, and modifying them to explore some

empirical or theoretical issue.

This book contains hundreds of exercises that can be used as the basis for student

assignments. The simplest exercises involve modifying a supplied program fragment in

a specified way in order to answer a concrete question. At the other end of the spectrum,

NLTK provides a flexible framework for graduate-level research projects, with standard

implementations of all the basic data structures and algorithms, interfaces to dozens

of widely used datasets (corpora), and a flexible and extensible architecture. Additional

support for teaching using NLTK is available on the NLTK website.

We believe this book is unique in providing a comprehensive framework for students

to learn about NLP in the context of learning to program. What sets these materials

apart is the tight coupling of the chapters and exercises with NLTK, giving students—

even those with no prior programming experience—a practical introduction to NLP.

After completing these materials, students will be ready to attempt one of the more

advanced textbooks, such as Speech and Language Processing, by Jurafsky and Martin

(Prentice Hall, 2008).

This book presents programming concepts in an unusual order, beginning with a non-

trivial data type—lists of strings—then introducing non-trivial control structures such

as comprehensions and conditionals. These idioms permit us to do useful language

processing from the start. Once this motivation is in place, we return to a systematic

presentation of fundamental concepts such as strings, loops, files, and so forth. In this

way, we cover the same ground as more conventional approaches, without expecting

readers to be interested in the programming language for its own sake.

Two possible course plans are illustrated in Table P-3. The first one presumes an arts/

humanities audience, whereas the second one presumes a science/engineering audi-

ence. Other course plans could cover the first five chapters, then devote the remaining

time to a single area, such as text classification (Chapters 6 and 7), syntax (Chapters

8 and 9), semantics (Chapter 10), or linguistic data management (Chapter 11).

Table P-3. Suggested course plans; approximate number of lectures per chapter

Chapter Arts and Humanities Science and Engineering

Chapter 1, Language Processing and Python 2–4 2

Chapter 2, Accessing Text Corpora and Lexical Resources 2–4 2

Chapter 3, Processing Raw Text 2–4 2

Chapter 4, Writing Structured Programs 2–4 1–2

xvi | Preface

www.it-ebooks.info

hey321.taobao.com

剩余503页未读，继续阅读

quxue4183

粉丝: 1
资源: 30

Python入门：自然语言处理实战指南

Natural Language Processing with Python 无水印pdf

Python Natural Language Processing

NaturalLanguageProcessingWithPython.pdf 英文原版

Natural language processing with Python

natural language processing with python

Natural Language Processing with Python Cookbook

Natural Language Processing with Python.pdf

2009新书Natural Language Processing with Python

《Natural Language Processing with Python》

natural language processing with python最新版本 python3,NLTK3

最新资源