3
(a) (b)
(c) (d)
Fig. 1. Random dots example. A shape is moving sideways, where both
the shape and the background are covered by a random pattern of black and
white dots. It is impossible to identify the moving object from each of the two
frames (a) and (b) (a stereo pair) alone. The occlusion detector (c) (higher
values of λ are darker) shows the outline of the object very clearly. Compare
to the ground truth (d).
Velocity-adapted detector: Although rotational invariance is
desirable in the spatial domain, non-spatial rotations in the spatio-
temporal domain have no physical meaning. It is preferable to
have invariance to spatially-fixed shear transformations, which
correspond to 2D relative translational motion between the camera
and the scene. As suggested in [15] by the reference to Galilean
diagonalization, one can use the velocity-adapted matrix
˜
G given
by
˜
G =
G
11
G
12
0
G
21
G
22
0
0 0 λ
T
where λ
T
=
det(G)
det(G
∗
)
(3)
(G
ij
denote the entries of G, and G
∗
denotes the 2 ×2 upper-left
submatrix of G containing only spatial information).
Definition 2: The operator λ
T
is the velocity-adapted occlu-
sion detector.
To justify this definition, observe that
˜
G is also invariant to
translation and spatial rotation. The entry λ
T
is an eigenvalue of
˜
G, and it has been suggested that it encodes the temporal varia-
tion, being the “residue” unexplained by pure-spatial information.
In practice, λ
T
gives results similar to λ, though it has certain
advantages, as discussed in Section 4. Throughout this paper we
use λ to denote either operator, unless stated otherwise.
Detector effectiveness: High values of λ indicate significant
deviation from (2), which is often due to the existence of a motion
boundary. Other sources of large deviations include changes in
illumination (violation of the brightness constancy assumption),
or when the motion varies spatially (motion is not constant in ω).
However, often these events lead to smaller λ values as compared
with motion boundaries (see Fig. 2), in which case the boundary
response can be distinguished from a false response (e.g., by
thresholding).
Low values of λ do not necessarily indicate that the motion
in ω is uniform. The rank of G is affected by spatial structure
as well as temporal structure, so λ may be low even at motion
boundaries, when certain spatial degeneracies exist. Specifically,
this occurs when there is local ambiguity, i.e., when the existence
of a motion boundary cannot be determined locally. This includes
(a) (b)
(c) (d)
Fig. 2. False λ response. The same example as in Fig. 1: (a) with 20%
white noise; (b) with illumination change of 5%; (c) with the object rotating
by 20
◦
; (d) with both object and background patterns deformed smoothly.
linear background
uniform background
same−color background
Fig. 3. Areas where the λ detector is likely to give low values despite the
existence of a local motion boundary.
areas where the occluding object and its background are of the
same color, areas where the background is uniform in color, and
areas where the background texture is uniform in the direction of
the motion (Fig. 3). In the first case the rank of G is 0, and in
the other cases the rank of G may be 1 or 2, depending on the
appearance of the occluding object (recall that the λ detector is
high when the rank of G is 3). In these cases, the background
may be interpreted as part of the moving object, since no features
in the background appear to vanish due to occlusion.
2.2. Extraction of Motion Boundaries and Scale Space Structure
The response of λ to occlusion occurs only where some
background features become occluded. Clearly boundary location
cannot always be inferred on the basis of local information alone.
However, while there may be no cues to indicate the location of
the boundary at a fine scale, there may be enough information at a
coarser scale (i.e., in a larger neighborhood) and λ may respond.
Thus we incorporate a multi-scale element in our algorithm, in
order to detect motion boundaries that are not detectable at fine
scales.
Defining scale: In order to define the notion of scale in
our algorithm, note that the evaluation of λ involves Gaussian
convolutions in two different stages – during the estimation of
the partial derivatives, and when taking the average over the
neighborhood ω. In both cases, larger Gaussians lead to coarser
structures, and we refer to the size of the Gaussian as the scale.
In this work we only consider the spatial scale. As we show in
Appendix I, these two scales are related, and we define a unified
scale dimension, and a scaling-invariant operator λ
(s)
at any scale
s > 0, using scale-normalization.