
An Extended Set
of
Haar-like Features
for
Rapid Object Detection
Rainer Lienhart and Jochen
Maydt
Intel Labs, Intel Corporation, Santa Clara, CA
95052,
USA
Rainer. Lienhart@intel.com
ABSTRACT
Recently Viola et al. [5] have introduced a rapid object detection.
scheme based on a boosted cascade of simple feature classifiers. In
this
paper we introduce a novel set of rotated haar-like features.
These novel features significantly enrich the simple features of (51
and can also be calculated efficiently. With these new rotated
features our sample face detector shows
off
on average a
10%
lower
false alarm rate at a given hit rate. We also present a novel post
optimization procedure for
a
given boosted cascade improving on
average the false alarm rate further by 12.5%.
1
Introduction
Recently Viola et al. have proposed a multi-stage classification
procedure that reduces the processing time substantially while
achieving almost the same accuracy
as
compared to a much slower
and more complex single stage classifier (51.
This
paper extends
their rapid object detection framework in two important ways:
Firstly, their basic and over-complete set of haar-like feature is
extended by an efficient set of 45" rotated features, which add
additional domain-knowledge to the leaming framework and which
is otherwise hard to learn. These novel features can be computed
rapidly at
all
scales in constant time. Secondly, we derive a new post-
optimization procedure for a given boosted classifier that improves
its performance significantly.
2
Features
The main purpose of using features instead of raw pixel values
as
the
input to a leaming algorithm is to reducehncrease the in-clasdout-
of-class
variability
compared
10
the
raw
input data, and thus making
classification easier. Features usually encode knowledge about the
domain. which
is
difncult to
learn
from
a
raw finite set of input data.
The complexity of feature evaluation is
also
a very important aspect
since almost all object detection algorithms slide a fixed-size
window at all scales over the input image.
As
we will see. our
features can be computed at any position and any scale in the same
constant time. Only
8
table lookups are needed.
2.1
Feature
Pool
Our feature
pool
was inspired by the over-complete haar-like
features used by Papageorgiou
el
al.
in
14.31
and their very fast
computation scheme proposed by Viola el
al.
in (51, and
is
a
generalization of their work.
Let
us
assume that the basic unit for testing for presence of an object
is a window
of
M*H
pixels.
Also
assume that we have a very fast
way of computing the sum of pixels of any upright and 45" rotated
rectangle inside the window.
A
rectangle is specified by the tuple
r=(x,y,w,h,a) with
05x,x+tdW.
O5y,y+KH.
x,y>O.
w,h>O,
and
(IE
{0",45')
and its pixel sum
Is
denoted by RecSum(r), Two
examples of such rectangles are given in Figure
1.
Our raw feature set
is
then the set of all possible features of the form
featmi o,.RecSUm(r,),
IE
141.
...,NI
where the weights
op
CR
,
the rectangles
r,,
and
N
are arbitrarily
Window
upright rectangle
9.1.
Exampledanu~nand45'r~~reaangle.
chosen.
This raw feature set is (almost) infinitely large. For practical reasons,
it is reduced as follows:
1.
Only weighted combinations of pixel
sums
of
two
rectangles are
considered (i.e..
N=
2
).
2.
The weights have opposite signs, and are used to compensate for
the difference in area size between the two rectangles. Thus, for
non-overlapping rectangles we have
-wo.Area(ro)= wi.Area(rl)
.
Without restrictions we can set
wo=-l
and get wi=Area(roYArea(r,).
3.
The features mimic haar-like features and early features of the
human visual pathway such as center-surround and directional
responses.
These cestrictions lead
us
to
the
14
feature
prototypes shown
in
Figure
2:
Eight line features. and
*
Two center-surround features.
These prototypes are scaled independently in vertical and horizontal
direction in order to generate a rich, over complete set of features.
Note that the line features can be calculated by two rectangles only.
Hereto it is assumed that the first rectangle
ro
encompasses the
black and white rectangle and the second rectangle
rl
represents the
black area.
For
instance, line feature (Za) with total height of 2 and
width of
6
at the top left corner
(5.3)
can be written
as
Four edge features.
fea1ure~-1.RecSum(5,3,6,2,0")+3~RecSum(7,3,2,2,0')
Only features (la), (Ib). (2a). (23 and (43 of Figure
2
have been
used by (3.4.51. In our experiments the additional features
significantly enhanced the expressional power of the learning
system and consequently improved the performance
of
the object
detection system. Feature (4a) was not used since it is well
approximated by feature (29) and @e).
NUMBER
OF
FEATURES.
The number of features derived from each
prototype is quite large and differs from prototype to prototype and
can be calculated as follows. Let X=iWwj and
Y=LWhl
be the
maximum scaling factors in wand ydirection.
A
upright feature of
size
wxh
then generates
0-7803-7622-6/02/$17.00
a2002
IEEE
I
-
900
IEEE
ICIP
2002