Boosting Chain Learning for Object Detection
Rong Xiao, Long Zhu, Hong-Jiang Zhang
Microsoft Research Asia
49 Zhichun Road, Beijing 100080, P.R. China
{t-rxiao, hjzhang}@microsoft.com
Abstract
A general classification framework, called boosting
chain, is proposed for learning boosting cascade. In this
framework, a “chain” structure is introduced to integrate
historical knowledge into successive boosting learning.
Moreover, a linear optimization scheme is proposed to
address the problems of redundancy in boosting learning
and threshold adjusting in cascade coupling. By this
means, the resulting classifier consists of fewer weak
classifiers yet achieves lower error rates than boosting
cascade in both training and test. Experimental
comparisons of boosting chain and boosting cascade are
provided through a face detection problem. The
promising results clearly demonstrate the effectiveness
made by boosting chain.
1. Introduction
Different from the traditional pattern classification
problem where decision is made between well-defined
classes, the detection problem requires discriminate
analysis between the object class and the rest of the world.
As a result, the detection algorithm must accommodate
the intra-class variance without compromising the
discriminability of locating object within cluttered scenes.
On the other hand, typical negative samples are usually
unavailable for building a training set due to large
variance of negative class. Moreover, as the location and
scale of target class are unknown, the computation cost
for exhaustive search can hardly be avoided. To conclude,
there are three issues which are critical for a detection
system: training strategy for negative sample collection,
robust learning algorithm, and computation cost for
evaluation.
Sung and Poggio [10] proposed training schema, called
bootstrap, was applied for negative samples collecting.
During bootstrap procedure, false detections are collected
iteratively into the training set, and a very low false
positive rate is achieved after several iterations of
learning.
Also, various learning algorithm has been applied to
the detection problem. Papageorgiou [1] built a detector
by training a Support Vector Machine (SVM) [12] on an
over-complete wavelet representation of object classes.
Rowley [3] presented a neural network-based face
detection system. Roth [2] used a network of linear units,
called SNoW learning architecture, which is specifically
tailored for learning in the presence of a very large
number of features. Schneiderman
[4] used naive Bayesian
classifier on multi-resolution features from different levels
of wavelet transform.
Although, some works, such as [2] and [4] have
achieved the best detection accuracy in the literature, both
of them are too slow to be applied in real-time
applications due to the computation complexity. Thereby,
hierarchical classification framework is wildly adopted to
build rapid detector. Serra [11] implemented a two-layer
detector. The first layer consists of a fast linear SVM that
removes large parts of the background. The second layer
consists of a more accurate polynomial SVM performs the
final face detection. Viola and Jones [7] built a cascade of
boosting classifiers on an over-complete set of Haar-like
features. In each layer of the cascade, AdaBoost [13] is
adapted to integrate the feature selection and classifier
design in one boosting procedure. By adopting
simple-to-complex strategy, most non-face candidates are
rejected in earlier layer of cascade with little computation
costs. This structure results in extremely rapid object
detector. However, AdaBoost is a sequential forward
search procedure using the greedy selection strategy. Its
heuristic assumption is the monotonicity. The premise
offered by the sequential procedure can be broken-down
when the assumption is violated. Stan Li [8] proposed
FloatBoost algorithm by incorporating the idea of Floating
Search into AdaBoost. Based on FloatBoost, a detector
for multi-view face detection [9] is implemented.
Although the new detector achieves the better
performance with fewer features, the FloatBoost is
unstable and computation extensive for learning
complicated problem.
Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV 2003) 2-Volume Set
0-7695-1950-4/03 $17.00 © 2003 IEEE