The Earth Mover’s Distance 103
Quadratic-form distance: this distance was suggested
in Niblack et al. (1993) for color based retrieval:
d
A
(H, K) =
p
(h − k)
T
A(h − k),
where h and k are vectors that list all the entries
in H and K. Cross-bin information is incorporated
via a similarity matrix A = [a
ij
] where a
ij
denote
similarity between bins i and j. Here i and j are
sequential (scalar) indices into the bins.
For our experiments, we followed the recom-
mendation in Niblack et al. (1993) and used
a
ij
=1 − d
ij
/d
max
where d
ij
is the ground distance
between bins i and j of the histogram, and d
max
=
max(d
ij
). Although in general the quadratic-form is
not a metric, it can be shown that with this choice of
A the quadratic-form is indeed a metric.
The quadratic-form distance does not enforce a
one-to-one correspondence between mass elements
in the two histograms: The same mass in a given
bin of the first histogram is simultaneously made to
correspond to masses contained in different bins of
the other histogram. This is illustrated in Fig. 1(b)
where the quadratic-form distance between the two
histograms on the left is larger than the distance be-
tween the two histograms on the right. Again, this
is clearly at odds with perceptual dissimilarity. The
desired distance here should be based on the corre-
spondences shown in part (d) of the figure.
Similar conclusions were obtained in Stricker and
Orengo (1995) where it was shown that using the
quadratic-form distance in image retrieval results
in false positives, because it tends to overestimate
the mutual similarity of color distributions without
a pronounced mode.
Match distance:
d
M
(H, K) =
X
i
|
ˆ
h
i
−
ˆ
k
i
| ,
where
ˆ
h
i
=
P
j≤i
h
j
is the cumulative histogram of
{h
i
}, and similarly for {k
i
}.
The match distance (Shen and Wong, 1983;
Werman et al., 1985) between two one-dimensional
histograms is defined as the L
1
distance between
their corresponding cumulative histograms. For one-
dimensional histograms with equal areas, this dis-
tance is a special case of the EMD which we present
in Section 4 with the important differences that the
match distance cannot handle partial matches, or
handle other ground distances. The match distance
does not extend to higher dimensions because the
relation j ≤ i is not a total ordering in more than
one dimension, and the resultingarbitrariness causes
problems.
Kolmogorov-Smirnov distance:
d
KS
(H, K) = max
i
(|
ˆ
h
i
−
ˆ
k
i
|).
Again,
ˆ
h
i
and
ˆ
k
i
are cumulative histograms.
The Kolmogorov-Smirnov distance is a common
statistical measure for unbinned distributions. Simi-
larly to the match distance, it is defined only for one
dimension.
2.3. Parameter-Based Dissimilarity Measures
These methods first compute a small set of parame-
ters from the histograms, either explicitly or implic-
itly, and then compare these parameters. For instance,
in Stricker and Orengo (1995) the distance between
distributions is computed as the sum of the weighted
distances of the distributions’ first three moments. In
Das et al. (1997), only the peaks of color histograms
are used for color image retrieval. In Liu and Picard
(1996), textures are compared based on measures of
their periodicity, directionality, and randomness, while
in Manjunath and Ma (1996) texture distances are de-
finedby comparing theirmeansandstandard deviations
in a weighted-L
1
sense.
Additionaldissimilarity measuresforimage retrieval
are evaluated and compared in Smith (1997) and
Puzicha et al. (1997).
3. Histograms vs Signatures
In Section 2 we defined a histogram as deriving from
a fixed partitioning of the domain of a distribution. Of
course, even if bin sizes are fixed, they can be different
in different parts of the underlying feature space. Even
so, however, for some images often only a small frac-
tion of the bins contain significant information, while
most others are hardly populated. A finely quantized
histogram is highly inefficient in this case. On the
other hand, for images that contain a large amount of
information, a coarsely quantized histogram would be
inadequate. In brief, because histograms are fixed-size
structures, theycannot achieve a good balance between
expressiveness and efficiency.
A signature {s
j
= (m
j
,w
m
j
)}, on the other hand,
represents a set of feature clusters. Each cluster is