[ ix ]
Chapter 4, Network Analysis, is a tour through the basics of network or graph analysis,
as used to describe the relationships between various interconnected groups of
entities. We investigate the various types of network and learn how to describe and
measure them. Then we put our learning into practice to describe how a network of
software developers has changed over time.
Chapter 5, Sentiment Analysis in Text, is the rst of four text mining chapters in this
book. This chapter serves as an introduction to the growing eld of sentiment, or
mood, analysis in text. After comparing various approaches to sentiment mining and
learning how to evaluate the results, we practice using a machine learning classier
to determine the sentiment of a set of software developer chat logs and e-mail logs.
Chapter 6, Named Entity Recognition in Text, is about nding proper nouns and proper
names in text. We spend some time learning why this task is useful, and why nding
named entities can sometimes be more difcult than it sounds. At the end of the
chapter we implement a named entity recognition system on several different types
of real-world text data including e-mail, chat logs, and board meeting minutes.
Along the way we apply different techniques for quantifying the success or failure
of our results.
Chapter 7, Automatic Text Summarization, presents several strategies for automatically
create condensed summaries of text. This chapter emphasizes extractive
summarization tools, which are designed to nd the most important sentences in a
text sample. To this end, we experiment with three different tools for accomplishing
this goal, testing the summarization methods, and learning how they differ.
Following the introduction of each tool, we attempt to summarize a common
set of text documents and compare the results.
Chapter 8, Topic Modeling in Text, shows how to use software tools to reveal what
topics or concepts are present in a given text. Can we train a computer program to
infer the themes that are present in large amounts of text? In a series of experiments,
we learn how to use common topic modeling libraries to reveal the topics present in
software developer e-mails, and how those topics change over time.
Chapter 9, Mining for Data Anomalies, is where we learn how to use data mining and
statistical techniques to improve our own data mining process. While all of the other
chapters in this book deal with nding different types of patterns in data, here we
focus on nding data that is anomalous or that does not match a particular pattern.
Whether it is because the data is empty, missing, or just plain weird, this chapter
presents strategies for nding or xing this type of data so that the rest of your data
can be mined more effectively.