没有合适的资源?快使用搜索试试~ 我知道了~
首页自适应学习算法和数据克隆
资源详情
资源评论
资源推荐

Adaptive Learning Algorithms
and
Data Cloning
Thesis by
Amrit Pratap
In Partial Fulllment of the Requirements
for the Degree of
Doctor of Philosophy
California Institute of Technology
Pasadena, California
2008
(Defended February, 11 2008)

ii
c
°
2008
Amrit Pratap
All Rights Reserved

iii
Acknowledgements
I would like to thank all the people who, through their valuable advice and support,
made this work possible. First and foremost, I am grateful to thank my advisor, Dr.
Yaser Abu-Mostafa for his support, assistance and guidance throughout my time at
Caltech.
I would also like to thank my colleagues at the Learning Systems Group, Ling
Li and Hsuan-Tien Lin, for many stimulating discussions and for their constructive
input and feedback . I also would like to thank Dr. Malik Magdon-Ismail, Dr. Amir
Atiya and Dr. Alexander Nicholson for their helpful suggestions.
I would like to thank the members of my thesis committee, Dr. Yaser Abu-
Mostafa, Dr. Alain Martin, Dr. Pietro Perona and Dr. Jehoshua Bruck, for their
time to review the thesis and all the helpful suggestions and guidance.
Finally I'd like to thank my family and friends for continuing love and support.

iv
Abstract
This thesis is in the eld of machine learning: the use of data to automatically learn
a hypothesis to predict the future behavior of a system. It summarizes three of my
research projects.
We rst investigate the role of margins in the phenomenal success of the Boosting
Algorithms. AdaBoost (Adaptive Boosting) is an algorithm for generating an ensem-
ble of hypotheses for classication. The superior out-of-sample performance of Ad-
aBoost has been attributed to the fact that it can generate a classier which classies
the points with a large margin of condence. This led to the development of many
new algorithms focusing on optimizing the margin of condence. It was observed
that directly optimizing the margins leads to a poor performance. This apparent
contradiction has been the topic of a long unresolved debate in the machine-learning
community. We introduce new algorithms which are expressly designed to test the
margin hypothesis and provide concrete evidence which refutes the margin argument.
We then propose a novel algorithm for Adaptive sampling under Monotonicity
constraint. The typical learning problem takes examples of the target function as
input information and produces a hypothesis that approximates the target as an
output. We consider a generalization of this paradigm by taking dierent types of
information as input, and producing only specic properties of the target as output.
This is a very common setup which occurs in many dierent real-life settings where
the samples are expensive to obtain. We show experimentally that our algorithm
achieves better performance than the existing methods, such as Staircase procedure
and PEST.
One of the major pitfalls in machine learning research is that of selection bias.

v
This is mostly introduced unconsciously due to the choices made during the learning
process, which often lead to over-optimistic estimates of the performance. In the third
project, we introduce a new methodology for systematically reducing selection bias.
Experiments show that using cloned datasets for model selection can lead to better
performance and reduce the selection bias.
剩余119页未读,继续阅读


















安全验证
文档复制为VIP权益,开通VIP直接复制

评论1