14 Scharstein and Szeliski
estimates (Birchfield and Tomasi, 1998b; Scharstein,
1999). In our implementation we are not performing
such clean-up steps since we want to measure the per-
formance of the raw algorithm components.
3.5. Other Methods
Not all dense two-frame stereo correspondence algo-
rithms can be described in terms of our basic taxonomy
and representations. Here we briefly mention some ad-
ditional algorithms and representations that are not cov-
ered by our framework.
The algorithms described in this paper first enumer-
ate all possible matches at all possible disparities, then
select the best set of matches in some way. This is a use-
ful approach when a large amount of ambiguity may ex-
ist in the computed disparities. An alternative approach
is to use methods inspired by classic (infinitesimal) op-
tic flow computation. Here, images are successively
warped and motion estimates incrementally updated
until a satisfactory registration is achieved. These tech-
niques are most often implemented within a coarse-to-
fine hierarchical refinement framework (Quam, 1984;
Bergen et al., 1992; Barron et al., 1994; Szeliski and
Coughlan, 1997).
A univalued representation of the disparity map is
also not essential. Multi-valued representations, which
can represent several depth values along each line of
sight, have been extensively studied recently, especially
for large multiview data set. Many of these techniques
use a voxel-based representation to encode the recon-
structed colors and spatial occupancies or opacities
(Szeliski and Golland, 1999; Seitz and Dyer, 1999;
Kutulakos and Seitz, 2000; De Bonet and Viola, 1999;
Culbertson et al., 1999; Broadhurst et al., 2001). An-
other way to represent a scene with more complexity
is to use multiple layers, each of which can be repre-
sented by a plane plus residual parallax (Baker et al.,
1998; Birchfield and Tomasi, 1999; Tao et al., 2001).
Finally, deformable surfaces of various kinds have also
been used to perform 3D shape reconstruction from
multiple images (Terzopoulos and Fleischer, 1988;
Terzopoulos and Metaxas, 1991; Fua and Leclerc,
1995; Faugeras and Keriven, 1998).
3.6. Summary of Methods
Table 1 gives a summary of some representative
stereo matching algorithms and their corresponding
taxonomy, i.e., the matching cost, aggregation, and
optimization techniques used by each. The methods
are grouped to contrast different matching costs (top),
aggregation methods (middle), and optimization tech-
niques (third section), while the last section lists some
papers outside the framework. As can be seen from this
table, quite a large subset of the possible algorithm de-
sign space has been explored over the years, albeit not
very systematically.
4. Implementation
We have developed a stand-alone, portable C++ im-
plementation of several stereo algorithms. The imple-
mentation is closely tied to the taxonomy presented
in Section 3 and currently includes window-based al-
gorithms, diffusion algorithms, as well as global opti-
mization methods using dynamic programming, simu-
lated annealing, and graph cuts. While many published
methods include special features and post-processing
steps to improve the results, we have chosen to imple-
ment the basic versions of such algorithms, in order to
assess their respective merits most directly.
The implementation is modular and can easily be
extended to include other algorithms or their compo-
nents. We plan to add several other algorithms in the
near future, and we hope that other authors will con-
tribute their methods to our framework as well. Once a
new algorithm has been integrated, it can easily be com-
pared with other algorithms using our evaluation mod-
ule, which can measure disparity error and reprojection
error (Section 5.1). The implementation contains a so-
phisticated mechanism for specifying parameter values
that supports recursive script files for exhaustive per-
formance comparisons on multiple data sets.
We provide a high-level description of our code using
the same division into four parts as in our taxonomy.
Within our code, these four sections are (optionally)
executed in sequence, and the performance/quality
evaluator is then invoked. A list of the most important
algorithm parameters is given in Table 2.
4.1. Matching Cost Computation
The simplest possible matching cost is the squared or
absolute difference in color/intensity between corre-
sponding pixels (match
fn). To approximate the effect
of a robust matching score (Black and Rangarajan,
1996; Scharstein and Szeliski, 1998), we truncate
the matching score to a maximal value match
max.
When color images are being compared, we sum the