Adaptive Boosting算法详解：机器学习与模式识别基石

5星 · 超过95%的资源需积分: 9 98 浏览量更新于2024-07-25 2 收藏 15.61MB PDF 举报

"Boosting: Foundations and Algorithms" 是一本专著，由罗伯特·E·施普利尔（Robert E. Schapire）和约瓦夫·弗雷恩德（Yoav Freund）共同编撰，作为《适应计算与机器学习》系列的一部分，由麻省理工学院出版社出版。这本书详细探讨了梯度提升（Boosting）这一强大的机器学习技术，它是现代集成学习方法的核心组成部分，特别在模式识别、数据分类和回归问题中表现出色。 Boosting是一种迭代的算法框架，通过结合多个弱学习器形成一个强大的预测模型。其基本思想是逐步改进，每次迭代中，算法会选择尚未很好地被前一轮学习器处理的数据样本，针对这些样本训练新的学习器，并赋予其更高的权重，以减少错误。这个过程可以看作是对基础模型进行加权平均，从而提高整体预测性能。书中不仅涵盖了Boosting的基本原理，还深入剖析了AdaBoost（Adaptive Boosting）算法，这是Boosting的一个著名变种，特别是AdaBoost.M1和AdaBoost.M2，它们在实践中广泛应用于文本分类、图像识别等领域。本书不仅介绍了理论背景，还提供了实用的算法实现和技术细节，包括如何处理过拟合、选择合适的弱学习器、以及如何调整参数以优化性能。此外，读者可以从中了解到Boosting与其他学习方法如决策树、随机森林等的比较，以及其在实际问题中的应用案例。对于那些对机器学习和模式识别感兴趣的读者来说，"Boosting: Foundations and Algorithms" 是不可或缺的一本参考书籍，它能够帮助读者深入理解这一关键技术的工作原理、优点和局限性，从而在实际项目中做出明智的决策和优化。同时，由于版权原因，所有复制或机械复制内容必须获得出版社的书面许可。如果你想获取特殊质量折扣，可以直接联系出版社获取更多信息。

Preface xv

Schwing, Umar Syed, Yongxin (Taylor) Xi, and Zhen (James) Xiang. Their close reading

and numerous suggestions, both in and out of class, were extraordinarily helpful and led to

signiﬁcant improvements in content and presentation in every one of the chapters.

Thanks also to Peter Bartlett, Vladimir Koltchinskii, Saharon Rosset, Yoram Singer, and

other anonymous reviewers of this book for their time and their many constructive sug-

gestions and criticisms. An incomplete list of the many, many others who provided help,

comments, ideas, and insights includes: Shivani Agarwal, Jordan Boyd-Graber, Olivier

Chapelle, Kamalika Chaudhuri, Michael Collins, Edgar Dobriban, Miro Dudík, Dave

Helmbold, Ludmila Kuncheva, John Langford, Phil Long, Taesup Moon, Lev Reyzin,

Ron Rivest, Cynthia Rudin, Rocco Servedio, Matus Telgarsky, Paul Viola, and Manfred

Warmuth. Our apologies to others who were surely, though certainly not intentionally,

omitted from this list.

We are grateful to our past and present employers for supporting this work: AT&T Labs;

Columbia Center for Computational Learning Systems; Princeton University; University of

California, San Diego; and Yahoo! Research. Support for this research was also generously

provided by the National Science Foundation under awards 0325463, 0325500, 0513552,

0812598, and 1016029.

Thanks to all of our collaborators and colleagues whose research appears in this book,

and who kindly allowed us to include speciﬁc materials, especially ﬁgures, as cited and

acknowledged with gratitude in the appropriate chapters. We are grateful to Katherine

Almeida, Ada Brunstein, Jim DeWolf, Marc Lowenthal, and everyone at MIT Press for

their tireless assistance in preparing and publishing this book. Thanks also to the various

editors at other publishers we considered, and to all those who helped with some occasionally

thorny copyright issues, particularly Laurinda Alcorn and Frank Politano.

Finally, we are grateful for the love, support, encouragement, and patience provided by

our families: Roberta, Jeni, and Zak; Laurie, Talia, and Raﬁ; and by our parents: Hans and

Libby, Ora and Raﬁ.

Introduction and Overview

How is it that a committee of blockheads can somehow arrive at highly reasoned decisions,

despite the weak judgment of the individual members? How can the shaky separate views

of a panel of dolts be combined into a single opinion that is very likely to be correct? That

this possibility of garnering wisdom from a council of fools can be harnessed and used to

advantage may seem far-fetched and implausible, especially in real life. Nevertheless, this

unlikely strategy turns out to form the basis of boosting, an approach to machine learning that

is the topic of this book. Indeed, at its core, boosting solves hard machine-learning problems

by forming a very smart committee of grossly incompetent but carefully selected members.

To see how this might work in the context of machine learning, consider the problem of

ﬁltering out spam, or junk email. Spam is a modern-day nuisance, and one that is ideally

handled by highly accurate ﬁlters that can identify and remove spam from the ﬂow of

legitimate email. Thus, to build a spam ﬁlter, the main problem is to create a method by

which a computer can automatically categorize email as spam (junk) or ham (legitimate).

The machine learning approach to this problem prescribes that we begin by gathering a

collection of examples of the two classes, that is, a collection of email messages which

have been labeled, presumably by a human, as spam or ham. The purpose of the machine

learning algorithm is to automatically produce from such data a prediction rule that can be

used to reliably classify new examples (email messages) as spam or ham.

For any of us who has ever been bombarded with spam, rules for identifying spam or

ham will immediately come to mind. For instance, if it contains the word Viagra, then it

is probably spam. Or, as another example, email from one’s spouse is quite likely to be

ham. Such individual rules of thumb are far from complete as a means of separating spam

from ham. A rule that classiﬁes all email containing Viagra as spam, and all other email

as ham, will very often be wrong. On the other hand, such a rule is undoubtedly telling us

something useful and nontrivial, and its accuracy, however poor, will nonetheless be signi-

ﬁcantly better than simply guessing entirely at random as to whether each email is spam

or ham.

Intuitively, ﬁnding these weak rules of thumb should be relatively easy—so easy, in

fact, that one might reasonably envision a kind of automatic “weak learning” program that,

2 1 Introduction and Overview

given any set of email examples, could effectively search for a simple prediction rule that

may be rough and rather inaccurate, but that nonetheless provides some nontrivial guidance

in separating the given examples as spam or ham. Furthermore, by calling such a weak

learning program repeatedly on various subsets of our dataset, it would be possible to extract

a collection of rules of thumb. The main idea of boosting is to somehow combine these

weak and inaccurate rules of thumb into a single “committee” whose overall predictions

will be quite accurate.

In order to use these rules of thumb to maximum advantage, there are two critical problems

that we face: First, how should we choose the collections of email examples presented to

the weak learning program so as to extract rules of thumb that will be the most useful?

And second, once we have collected many rules of thumb, how can they be combined into

a single, highly accurate prediction rule? For the latter question, a reasonable approach is

simply for the combined rule to take a vote of the predictions of the rules of thumb. For the

ﬁrst question, we will advocate an approach in which the weak learning program is forced

to focus its attention on the “hardest” examples, that is, the ones for which the previously

chosen rules of thumb were most apt to give incorrect predictions.

Boosting refers to a general and provably effective method of producing a very accurate

prediction rule by combining rough and moderately inaccurate rules of thumb in a manner

similar to that suggested above. This book presents in detail much of the recent work

on boosting, focusing especially on the AdaBoost algorithm, which has undergone intense

theoretical study and empirical testing. In this ﬁrst chapter, we introduceAdaBoost and some

of the key concepts required for its study. We also give a brief overview of the entire book.

See the appendix for a description of the notation used here and throughout the book, as

well as some brief, mathematical background.

1.1 Classiﬁcation Problems and Machine Learning

This book focuses primarily on classiﬁcation problems in which the goal is to categorize

objects into one of a relatively small set of classes. For instance, an optical character re-

cognition (OCR) system must classify images of letters into the categories A, B, C, etc. Medi-

cal diagnosis is another example of a classiﬁcation problem in which the goal is to diagnose a

patient. In other words, given the symptoms manifested by the patient, our goal is to catego-

rize him or her as a sufferer or non-sufferer of a particular disease. The spam-ﬁltering exam-

ple is also a classiﬁcation problem in which we attempt to categorize emails as spam or ham.

We focus especially on a machine-learning approach to classiﬁcation problems. Machine

learning studies the design of automatic methods for making predictions about the future

based on past experiences. In the context of classiﬁcation problems, machine-learning

methods attempt to learn to predict the correct classiﬁcations of unseen examples through

the careful examination of examples which were previously labeled with their correct

classiﬁcations, usually by a human.

1.1 Classiﬁcation Problems and Machine Learning 3

We refer to the objects to be classiﬁed as instances. Thus, an instance is a description

of some kind which is used to derive a predicted classiﬁcation. In the OCR example, the

instances are the images of letters. In the medical-diagnosis example, the instances are

descriptions of a patient’s symptoms. The space of all possible instances is called the in-

stance space or domain, and is denoted by X .A(labeled) example is an instance together

with an associated label indicating its correct classiﬁcation. Instances are also sometimes

referred to as (unlabeled) examples.

During training, a learning algorithm receives as input a training set of labeled examples

called the training examples. The output of the learning algorithm is a prediction rule called

a classiﬁer or hypothesis. A classiﬁer can itself be thought of as a computer program which

takes as input a new unlabeled instance and outputs a predicted classiﬁcation; so, in math-

ematical terms, a classiﬁer is a function that maps instances to labels. In this book, we use

the terms classiﬁer and hypothesis fairly interchangeably, with the former emphasizing a

prediction rule’s use in classifying new examples, and the latter emphasizing the fact that

the rule has been (or could be) generated as the result of some learning process. Other

terms that have been used in the literature include rule, prediction rule, classiﬁcation rule,

predictor, and model.

To assess the quality of a given classiﬁer, we measure its error rate, that is, the frequency

with which it makes incorrect classiﬁcations. To do this, we need a test set, a separate set of

test examples. The classiﬁer is evaluated on each of the test instances, and its predictions are

compared against the correct classiﬁcations of the test examples. The fraction of examples on

which incorrect classiﬁcations were made is called the test error of the classiﬁer. Similarly,

the fraction of mistakes on the training set is called the training error. The fraction of correct

predictions is called the (test or training) accuracy.

Of course, the classiﬁer’s performance on the training set is not of much interest since our

purpose is to build a classiﬁer that works well on unseen data. On the other hand, if there is

no relationship at all between the training set and the test set, then the learning problem is

unsolvable; the future can be predicted only if it resembles the past. Therefore, in designing

and studying learning algorithms, we generally assume that the training and test examples

are taken from the same random source. That is, we assume that the examples are chosen

randomly from some ﬁxed but unknown distribution D over the space of labeled examples

and, moreover, that the training and test examples are generated by the same distribution.

The generalization error of a classiﬁer measures the probability of misclassifying a random

example from this distribution D; equivalently, the generalization error is the expected test

error of the classiﬁer on any test set generated by D. The goal of learning can now be stated

succinctly as producing a classiﬁer with low generalization error.

To illustrate these concepts, consider the problem of diagnosing a patient with coronary

artery disease. For this problem, an instance consists of a description of the patient including

items such as sex, age, cholesterol level, chest pain type (if any), blood pressure, and results

of various medical tests. The label or class associated with each instance is a diagnosis

provided by a doctor as to whether or not the patient described actually suffers from the

剩余543页未读，继续阅读

Expectionimprove

粉丝: 1
资源: 8

Adaptive Boosting算法详解：机器学习与模式识别基石

Boosting :Foundations and Algorithms

Ensemble Methods Foundations and Algorithms

Ensemble Methods Foundations and Algorithms读书笔记

understanding machine learning theory-algorithms

Bishop Pattern Recognition and Machine Learning

Ensemble Methods详解：Boosting与Bagging算法理论与实践

Batch Normalization and Multilayer Perceptrons (MLPs): Enhancing Training Stability, Accelerating ...

Model Interpretability and Evaluation: Balancing Complexity with Interpretability

Understanding Accuracy and Recall: Key Metrics in Machine Learning

Time Series Autoregressive Models: In-depth Exploration and Practical Techniques

最新资源