using integral images. The calculation time therefore is
independent of the filter size. As shown in Section 5
and Fig. 3, the performance is comparable or better
than with the discretised and cropped Gaussians.
The 9 9 box filters in Fig. 2 are approximations of a
Gaussian with r ¼ 1:2 and represent the lowest scale (i.e.
highest spatial resolution) for co mputing the blob response
maps. We will denote them by D
xx
, D
yy
, and D
xy
. The
weights applied to the rectangular regions are kept simple
for computational efficiency. This yields
detðH
approx
Þ¼D
xx
D
yy
ðwD
xy
Þ
2
: ð3Þ
The relative weight w of the filter responses is used to bal-
ance the expression for the Hessian’s determinant. This is
needed for the energy conservation between the Gaussian
kernels and the approximated Gaussian kernels,
w ¼
j L
xy
ð1:2Þj
F
j D
yy
ð9Þj
F
j L
yy
ð1:2Þj
F
j D
xy
ð9Þj
F
¼ 0:912::: ’ 0:9; ð4Þ
where j xj
F
is the Frobenius norm. Notice that for theoret-
ical correctness, the weighting changes depending on the
scale. In practice, we keep this factor constant, as this did
not have a significant impact on the results in our
experiments.
Furthermore, the filter responses are normalised with
respect to their size. This guarantees a constant Frobenius
norm for any filter size, an important aspect for the scale
space analysis as discussed in the next section.
The approximated determinant of the Hessian repre-
sents the blob response in the image at location x. These
responses are stored in a blob response map over different
scales, and local maxima are detected as explained in Sec-
tion 3.4.
3.3. Scale space representation
Interest points need to be found at different scales, not
least be cause the search of correspondences often requires
their comparison in images where they are seen at different
scales. Scale spaces are usually implemented as an image
pyramid. The images are repeatedly smoothed with a
Gaussian and then sub-sampled in order to achieve a
higher level of the pyramid. Lowe [24] subtracts these pyr-
amid layers in order to get the DoG (Difference of Gaussi-
ans) images where edges and blobs can be found.
Due to the use of box filters and integral images, we do
not have to iteratively apply the same filter to the output of
a previously filtered layer, but instead can apply box filters
of any size at exactly the same speed directly on the original
image and even in parallel (although the latter is not
exploited here). Therefore, the scale space is analysed by
up-scaling the filter size rather than iteratively reducing
the image size, Fig. 4. The output of the 9 9 filter, intro-
duced in previous section, is considered as the initial scale
layer, to which we will refer as scale s ¼ 1:2 (approximating
Gaussian derivatives with r ¼ 1:2). The following layers
are obtained by filtering the image with gradually bigger
masks, taking into account the discrete nature of integral
images and the specific structure of our filters.
Note that our main motivation for this type of sampling
is its computational efficiency. Furthermore, as we do not
have to downsample the image, there is no aliasing. On
the downside, box filters preserve high-frequency compo-
nents that can get lost in zoomed-out variants of the same
scene, which can limit scale-invariance. This was however
not noticeable in our experiments.
The scale space is divided into octaves. An octave repre-
sents a series of filter response maps obtained by convolv-
ing the same input image with a filter of increasing size. In
total, an octave encompasses a scaling factor of 2 (which
implies that one needs to more than double the filter size,
see below). Each octave is subdivided into a constant num-
ber of scale levels. Due to the discrete nature of integral
images, the minimum scale difference between two subse-
quent scales depends on the length l
0
of the positive or neg-
ative lobes of the partial second order derivative in the
direction of derivation (x or y), which is set to a third of
the filter size length. For the 9 9 filter, this length l
0
is
3. For two successive levels, we must increase this size by
Fig. 3. Top: Repeatability score for image rotation of up to 180°. Hessian-
based detectors have in general a lower repeatability score for angles
around odd multiples of
p
4
. Bottom: Sample images from the sequence that
was used. Fast-Hessian is the more accurate version of our detector (FH-
15), as explained in Section 3.3.
Fig. 4. Instead of iteratively reducing the image size (left), the use of
integral images allows the up-scaling of the filter at constant cost (right).
H. Bay et al. / Computer Vision and Image Understanding 110 (2008) 346–359 349