A significant fraction of any NLP syllabus deals with algorithms and data structures.
On their own these can be rather dry, but NLTK brings them to life with the help of
interactive graphical user interfaces that make it possible to view algorithms step-by-
step. Most NLTK components include a demonstration that performs an interesting
task without requiring any special input from the user. An effective way to deliver the
materials is through interactive presentation of the examples in this book, entering
them in a Python session, observing what they do, and modifying them to explore some
empirical or theoretical issue.
This book contains hundreds of exercises that can be used as the basis for student
assignments. The simplest exercises involve modifying a supplied program fragment in
a specified way in order to answer a concrete question. At the other end of the spectrum,
NLTK provides a flexible framework for graduate-level research projects, with standard
implementations of all the basic data structures and algorithms, interfaces to dozens
of widely used datasets (corpora), and a flexible and extensible architecture. Additional
support for teaching using NLTK is available on the NLTK website.
We believe this book is unique in providing a comprehensive framework for students
to learn about NLP in the context of learning to program. What sets these materials
apart is the tight coupling of the chapters and exercises with NLTK, giving students—
even those with no prior programming experience—a practical introduction to NLP.
After completing these materials, students will be ready to attempt one of the more
advanced textbooks, such as Speech and Language Processing, by Jurafsky and Martin
(Prentice Hall, 2008).
This book presents programming concepts in an unusual order, beginning with a non-
trivial data type—lists of strings—then introducing non-trivial control structures such
as comprehensions and conditionals. These idioms permit us to do useful language
processing from the start. Once this motivation is in place, we return to a systematic
presentation of fundamental concepts such as strings, loops, files, and so forth. In this
way, we cover the same ground as more conventional approaches, without expecting
readers to be interested in the programming language for its own sake.
Two possible course plans are illustrated in Table P-3. The first one presumes an arts/
humanities audience, whereas the second one presumes a science/engineering audi-
ence. Other course plans could cover the first five chapters, then devote the remaining
time to a single area, such as text classification (Chapters 6 and 7), syntax (Chapters
8 and 9), semantics (Chapter 10), or linguistic data management (Chapter 11).
Table P-3. Suggested course plans; approximate number of lectures per chapter
Chapter Arts and Humanities Science and Engineering
Chapter 1, Language Processing and Python 2–4 2
Chapter 2, Accessing Text Corpora and Lexical Resources 2–4 2
Chapter 3, Processing Raw Text 2–4 2
Chapter 4, Writing Structured Programs 2–4 1–2
xvi | Preface