![](https://csdnimg.cn/release/download_crawler_static/11090729/bg13.jpg)
Foreword 2
A November 2008 search on Amazon.com for “data mining” books yielded over 15,000
hits—including 72 to be published in 2009. Most of these books either describe data mining
in very technical and mathematical terms, beyond the reach of most individuals, or
approach data mining at an introductory level without sufficient detail to be useful to the
practitioner. The Handbook of Statistical Analysis and Data Mining Applications is the book that
strikes the right balance between these two treatments of data mining.
This volume is not a theoretical treatment of the subject—the authors themselves recom-
mend other books for this—but rather contains a description of data mining principles and
techniques in a series of “knowledge-transfer” sessions, where examples from real data
mining projects illustrate the main ideas. This aspect of the book makes it most valuable
for practitioners, whether novice or more experienced.
While it would be easier for everyone if data mining were merely a matter of finding and
applying the correct mathematical equation or approach for any given problem, the reality
is that both “art” and “science” are necessary. The “art” in data mining requ ires experience:
when one has seen and overcome the difficulties in finding solutions from among the many
possible approaches, one can apply newfound wisdom to the next project. However, this
process takes considerable time and, particularly for data mining novices, the iterative proces s
inevitable in data mi ning can lead to discouragement when a “textbook” approach doesn’t
yield a good solution.
This book is different; it is organized with the practitioner in mind. The volume is
divided into four parts. Part I provides an overview of analytics from a historical perspec-
tive and frameworks from which to appr oach data mining, including CRISP-DM and
SEMMA. These chapters will provide a novice analyst an excellent overview by defining
terms and methods to use, and will prov ide program managers a framework from which
to approach a wide variety of data mining problems. Part II describes algorithms, though
without extensive mathematics. These will appeal to practitioners who are or will be
involved with day-to-day analytics and need to understand the qua litative aspects of the
algorithms. The inclusion of a chapter on text mining is particularly timely, as text mining
has shown tremend ous growth in recent years.
Part III provides a series of tutorials that are both domain-specific and software-
specific. Any instructor knows that examples make the abstract concept more concrete, and
these tutorials accomplish exactly that. In addition, each tutorial shows how the solutions
were developed using popular data mining software tools, such as Clementine, Enterprise
Miner, Weka, and STATISTICA. The step-by-step specifics will assist practitioners in learning
not only how to approach a wide variety of problems, but also how to use these software
xvii