PREFACE xv
sets have been sampled; and apply an error estimation scheme based on resampling
the data, typically cross-validation. With regard to the third step, we are given no
characterization of the accuracy of the error estimator and why it should provide a
reasonably good estimate. Most strikingly, as we show in this book, we can expect
it to be inaccurate in small-sample cases. Nevertheless, the claim is made that the
proposed algorithm has been “validated.” Very little is said about the accuracy of the
error estimation step, except perhaps that cross-validation is close to being unbiased
if not too many points are held out. But this kind of comment is misleading, given
that a small bias may be of little consequence if the variance is large, which it usually
is for small samples and large feature sets. In addition, the classical cross-validation
unbiasedness theorem holds if sampling is random over the mixture of the populations.
In situations where this is not the case, for example, the populations are sampled
separately, bias is introduced, as it is shown in Chapter 5. These kinds of problems
are especially detrimental in the current era of high-throughput measurement devices,
for which it is now commonplace to be confronted with tens of thousands of features
and very small sample sizes.
The subject of error estimation has in fact a long history and has produced a large
body of literature; four main review papers summarize the major advances in the
eld up to 2000 (Hand, 1986; McLachlan, 1987; Schiavo and Hand, 2000; Toussaint,
1974); recent advances in error estimation since 2000 include work on model selection
(Bartlett et al., 2002), bolstering (Braga-Neto and Dougherty, 2004a; Sima et al.,
2005b), feature selection (Hanczar et al., 2007; Sima et al., 2005a; Xiao et al.,
2007; Zhou and Mao, 2006), condence intervals (Kaariainen, 2005; Kaariainen and
Langford, 2005; Xu et al., 2006), model-based second-order properties (Zollanvari
et al., 2011, 2012), and Bayesian error estimators (Dalton and Dougherty, 2011b,c).
This book covers the classical studies as well as the recent developments. It discusses
in detail nonparametric approaches, but gives special consideration, especially in the
latter part of the book, to parametric, model-based approaches.
Pattern recognition plays a key role in many disciplines, including engineer-
ing, physics, statistics, computer science, social science, manufacturing, materials,
medicine, biology, and more, so this book will be useful for researchers and prac-
titioners in all these areas. This book can serve as a text at the graduate level, can
be used as a supplement for general courses on pattern recognition and machine
learning, or can serve as a reference for researchers across all technical disciplines
where classication plays a major role, which may in fact be all technical disciplines.
The book is organized into eight chapters. Chapters 1 and 2 provide the foundation
for the rest of the book and must be read rst. Chapters 3, 4, and 8 stand on their own
and can be studied separately. Chapter 5 provides the foundation for Chapters 6 and
7, so these chapters should be read in this sequence. For example, chapter sequences
1-2-3-4, 1-2-5-6-7, and 1-2-8 are all possible ways of reading the book. Naturally,
the book is best read from beginning to end. Short descriptions of each chapter are
provided next.
Chapter 1. Classication. To make the book self-contained, the rst chapter
covers basic topics in classication required for the remainder of the text: classiers,
population-based and sample-based discriminants, and classication rules. It denes a