ACCEPTED CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2001
Rapid Object Detection using a Boosted Cascade of Simple
Features
Paul Viola Michael Jones
viola@merl.com mjones@crl.dec.com
Mitsubishi Electric Research Labs Compaq CRL
201 Broadway, 8th FL One Cambridge Center
Cambridge, MA 02139 Cambridge, MA 02142
Abstract
This paper describes a machine learning approach for vi-
sual object detection which is capable of processing images
extremely rapidly and achieving high detection rates. This
work is distinguished by three key contributions. The first
is the introduction of a new image representation called the
“Integral Image” which allows the features used by our de-
tector to be computed very quickly. The second is a learning
algorithm, based on AdaBoost, which selects a small num-
ber of critical visual features from a larger set and yields
extremely efficient classifiers[6]. The third contribution is
a method for combining increasingly more complex classi-
fiers in a “cascade” which allows background regions of the
image to be quickly discarded while spending more compu-
tation on promising object-like regions. The cascade can be
viewed as an object specific focus-of-attention mechanism
which unlike previous approaches provides statistical guar-
antees that discarded regions are unlikely to contain the ob-
ject of interest. In the domain of face detection the system
yields detection rates comparable to the best previous sys-
tems. Used in real-time applications, the detector runs at
15 frames per second without resorting to image differenc-
ing or skin color detection.
1. Introduction
This paper brings together new algorithms and insights to
construct a framework for robust and extremely rapid object
detection. This framework is demonstrated on, and in part
motivated by, the task of face detection. Toward this end
we have constructed a frontal face detection system which
achieves detection and false positive rates which are equiv-
alent to the best published results [16, 12, 15, 11, 1]. This
face detection system is most clearly distinguished from
previous approaches in its ability to detect faces extremely
rapidly. Operating on 384 by 288 pixel images, faces are de-
tected at 15 frames per second on a conventional 700 MHz
Intel Pentium III. In other face detection systems, auxiliary
information, such as image differences in video sequences,
or pixel color in color images, have been used to achieve
high frame rates. Our system achieves high frame rates
working only with the information present in a single grey
scale image. These alternative sources of information can
also be integrated with our system to achieve even higher
frame rates.
There are three main contributions of our object detec-
tion framework. We will introduce each of these ideas
briefly below and then describe them in detail in subsequent
sections.
The first contribution of this paper is a new image repre-
sentation called an integral image that allows for very fast
feature evaluation. Motivated in part by the work of Papa-
georgiou et al. our detection system does not work directly
with image intensities [10]. Like these authors we use a
set of features which are reminiscent of Haar Basis func-
tions (though we will also use related filters which are more
complex than Haar filters). In order to compute these fea-
tures very rapidly at many scales we introduce the integral
image representation for images. The integral image can be
computed from an image using a few operations per pixel.
Once computed, any one of these Harr-like features can be
computed at any scale or location in constant time.
The second contribution of this paper is a method for
constructing a classifier by selecting a small number of im-
portant features using AdaBoost [6]. Within any image sub-
window the total number of Harr-like features is very large,
far larger than the number of pixels. In order to ensure fast
classification, the learning process must exclude a large ma-
jority of the available features, and focus on a small set of
critical features. Motivated by the work of Tieu and Viola,
feature selection is achieved through a simple modification
of the AdaBoost procedure: the weak learner is constrained
so that each weak classifier returned can depend on only a
1