Rapid Object Detection using a Boosted Cascade
of
Simple
Features
Paul Viola
viola@merl.com
Mitsubishi Electric Research Labs
201
Broadway, 8th
FL
Cambridge,
MA
021
39
Michael Jones
michael.jones@compaq.com
Compaq Cambridge Research Lab
One
Cambridge Center
Cambridge,
MA
02142
Abstract
This paper describes a machine learning approach for vi-
sual object detection which is capable
of
processing images
extremely rapidly and achieving high detection rates. This
work is distinguished by three key contributions. The first
is
the introduction
of
a
new image representation called the
“Integral linage” which allows the features used by our de-
tector
to
be computed very quickly. The second is a learning
algorithm, based on AdaBoost, which selects a
small
num-
ber
of
critical visual features from a larger set and yields
extremely eflcient class@ers[5]. The third contribution is
a method for combining increasingly more complex classi-
jers in a “cascade” which allows background regions
of
the
image
to
be
quickly discarded while spending more compu-
tation on promising object-like regions. The cascade can be
viewed as an object specijic focus-of-attention mechanism
which unlike previous approaches provides statistical guar-
untees that discarded regions are unlikely to contain the ob-
ject of interest, In the domain
of
face detection the system
yields detection rates comparable to the best previous sys-
tems. Used in real-time applications, the detector runs at
15
frames per second without resorting to image differenc-
ing or skin color detection.
1.
Introduction
This paper brings together new algorithms and insights to
construct a framework for robust and extremely rapid object
detection. This framework is demonstrated on, and in part
motivated by, the task of face detection. Toward this end
we have constructed a frontal face detection system which
achieves detection and false positive rates which are equiv-
alent to the best published results
[
14, 11, 13,
10,
11.
This
face detection system is most clearly distinguished from
previous approaches in its ability to detect faces extremely
rapidly. Operating on 384 by
288
pixel images, faces are de-
tected at
I5
frames per second on a conventional
700
MHz
Intel Pentium 111. In other face detection systems, auxiliary
information, such as image differences in video sequences,
or pixel color
in
color images, have been used to achieve
high frame rates.
Our system achieves high frame rates
working only with the information present in a single grey
scale image. These alternative sources of information can
also be integrated with
our
system to achieve even higher
frame rates.
There are three main contributions of
our
object detec-
tion framework. We will introduce each of these ideas
briefly below and then describe them in detail in subsequent
sections.
The first contribution
of
this paper is a new image repre-
sentation called an
integral image
that allows for very fast
feature evaluation. Motivated
in
part by the work of Papa-
georgiou et al. our detection system does not work directly
with image intensities
[9].
Like these authors we use a set
of features which are reminiscent of Haar Basis functions
(though we will also use related filters which are more com-
plex than Haar filters). In order to compute these features
very rapidly at many scales we introduce the integral im-
age representation for images. The integral image can be
computed from an image using a few operations per pixel.
Once computed, any one of these Harr-like features can be
computed at any scale or location in
constant
time.
The second contribution of this paper is a method for
constructing a classifier by selecting a small number of im-
portant features using AdaBoost
[5].
Within any image sub-
window the total number of Harr-like features
is
very large,
far larger than the number
of
pixels. In order to ensure fast
classification, the learning process must exclude a large ma-
jority of the available features, and focus on a small set of
critical features. Motivated by the work
of
Tieu and Viola,
feature selection is achieved through a simple modification
of the Adal3oost procedure: the weak learner is constrained
so
that each weak classifier returned can depend on only a
single feature
[
151.
As a result each stage of the boosting
process, which selects a new weak classifier, can be viewed
as a feature selection process. Adal3oost provides an effec-
tive learning algorithm and strong bounds on generalization
performance [12,8,9].
The third major contribution of this paper is a method
for combining successively more complex classifiers in a
cascade structure which dramatically increases the speed of
0-7695-1272-0/01
$10.00
0
2001
EEE
1-5
11