ABSTRACT
Recently Viola et al. [5] have introduced a rapid object detection
scheme based on a boosted cascade of simple features. In this paper
we introduce a novel set of rotated haar-like features, which
significantly enrich this basic set of simple haar-like features and
which can also be calculated very efficiently. At a given hit rate our
sample face detector shows off on average a 10% lower false alarm
rate by means of using these additional rotated features. We also
present a novel post optimization procedure for a given boosted
cascade improving on average the false alarm rate further by 12.5%.
Using both enhancements the number of false detections is only 24
at a hit rate of 82.3% on the CMU face set [7].
1 Introduction
Recently Viola et al. have proposed a multi-stage classification
procedure that reduces the processing time substantially while
achieving almost the same accuracy as compared to a much slower
and more complex single stage classifier [5]. This paper extends
their rapid object detection framework in two important ways:
Firstly, their basic and over-complete set of haar-like feature is
extended by an efficient set of 45° rotated features, which add
additional domain-knowledge to the learning framework and which
is otherwise hard to learn. These novel features can be computed
rapidly at all scales in constant time. Secondly, we derive a new post-
optimization procedure for a given boosted classifier that improves
its performance significantly.
2 Feature Pool
The main purpose of using features instead of raw pixel values as the
input to a learning algorithm is to reduce the in-class while
increasing the out-of-class variability compared to the raw data and
thus making classification easier. Features usually encode
knowledge about the domain, which is difficult to learn from the raw
and finite set of input data. A very large and general pool of simple
haar-like features combined with feature selection therefore can
increase the capacity of the learning algorithm.
The speed of feature evaluation is also a very important aspect since
almost all object detection algorithms slide a fixed-size window at
all scales over the input image. As we will see, our features can be
computed at any position and any scale in the same constant time.
Only 8 table lookups are needed.
2.1 Feature Family
Our feature pool was inspired by the over-complete haar-like
features used by Papageorgiou et al. in [4,3] and their very fast
computation scheme proposed by Viola et al. in [5], and is a
generalization of their work.
Let us assume that the basic unit for testing for the presence of an
object is a window of pixels. Also assume that we have a very
fast way of computing the sum of pixels of any upright and 45°
rotated rectangle inside the window. A rectangle is specified by the
tuple with , , ,
, and and its pixel sum is denoted by
. Two examples of such rectangles are given in Figure 1.
Our raw feature set is then the set of all possible features of the form
,
where the weights , the rectangles , and N are arbitrarily
chosen.
This raw feature set is (almost) infinitely large. For practical reasons,
it is reduced as follows:
1. Only weighted combinations of pixel sums of two rectangles are
considered (i.e., ).
2. The weights have opposite signs, and are used to compensate for
the difference in area size between the two rectangles. Thus, for
non-overlapping rectangles we have
. Without restrictions we can set
and get .
3. The features mimic haar-like features and early features of the
human visual pathway such as center surround and directional
responses.
These restrictions lead us to the 14 feature prototypes shown in
Figure 2:
• Four edge features,
• Eight line features, and
• Two center-surround features.
These prototypes are scaled independently in vertical and horizontal
direction in order to generate a rich, over complete set of features.
Note that the line features can be calculated by two rectangles only.
Hereto it is assumed that the first rectangle encompasses the
black and white rectangle and the second rectangle represents the
black area. For instance, line feature (2a) with total height of 2 and
width of 6 at the top left corner (5,3) can be written as
.
Only features (1a), (1b), (2a), (2c) and (4a) of Figure 2 have been
used by [3,4,5]. In our experiments the additional features
significantly enhanced the expressional power of the learning
system and consequently improved the performance of the object
detection system. Feature (4a) was not used since it is well
approximated by feature (2g) and (2e).
N
UMBER OF FEATURES. The number of features derived from each
prototype is quite large and differs from prototype to prototype and
WH×
r xywhα,, ,,()=0xx w W≤+,≤
wh 0>,α0° 45°,{}∈
RecSum r()
Figure 1. Examples of an upright and 45° rotated rectangle.
upright rectangle
45 rotated rectangle
Window
w
h
w
h
W
H
h
w
feature
I
ω
i
RecSum r
i
()⋅
iI∈ 1 … N,,{}=
∑
=
ω
i
ℜ∈ r
i
N 2=
w
0
– Area r
0
()⋅ w
1
Area r
1
()⋅=
w
0
1–= w
1
Area r
0
()Area r
1
()⁄=
r
0
r
1
feature
I
1– R⋅ ecSum 53620°,,,,()3 R⋅ ecSum 73220°,,,,()+=
An Extended Set of Haar-like Features for Rapid Object Detection
Rainer Lienhart and Jochen Maydt
Intel Labs, Intel Corporation, Santa Clara, CA 95052, USA
Rainer.Lienhart@intel.com