A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms
Steven M. Seitz Brian Curless
University of Washington
James Diebel
Stanford University
Daniel Scharstein
Middlebury College
Richard Szeliski
Microsoft Research
Abstract
This paper presents a quantitative comparison of several
multi-view stereo reconstruction algorithms. Until now, the
lack of suitable calibrated multi-view image datasets with
known ground truth (3D shape models) has prevented such
direct comparisons. In this paper, we first survey multi-view
stereo algorithms and compare them qualitatively using a
taxonomy that differentiates their key properties. We then
describe our process for acquiring and calibrating multi-
view image datasets with high-accuracy ground truth and
introduce our evaluation methodology. Finally, we present
the results of our quantitative comparison of state-of-the-art
multi-view stereo reconstruction algorithms on six bench-
mark datasets. The datasets, evaluation details, and in-
structions for submitting new models are available online
at http://vision.middlebury.edu/mview.
1. Introduction
The goal of multi-view stereo is to reconstruct a com-
plete 3D object model from a collection of images taken
from known camera viewpoints. Over the last few years,
a number of high-quality algorithms have been developed,
and the state of the art is improving rapidly. Unfortunately,
the lack of benchmark datasets makes it difficult to quan-
titatively compare the performance of these algorithms and
to therefore focus research on the most needed areas of de-
velopment.
The situation in binocular stereo, where the goal is to
produce a dense depth map from a pair of images, was until
recently similar. Here, however, a database of images with
ground-truth results has made the comparison of algorithms
possible and hence stimulated an even faster increase in al-
gorithm performance [1].
In this paper, we aim to rectify this imbalance by pro-
viding, for the first time, a collection of high-quality cal-
ibrated multi-view stereo images registered with ground-
truth 3D models and an evaluation methodology for com-
paring multi-view algorithms.
Our paper’s contributions include a taxonomy of multi-
view stereo reconstruction algorithms inspired by [1] (Sec-
tion 2), the acquisition and dissemination of a set of
calibrated multi-view image datasets with high-accuracy
ground-truth 3D surface models (Section 3), an evalua-
tion methodology that measures reconstruction accuracy
and completeness (Section 4), and a quantitative evaluation
of some of the currently best-performing algorithms (Sec-
tion 5). While the current evaluation only includes meth-
ods whose authors were able to provide us their results by
CVPR final submission time, our datasets and evaluation
results are publicly available [2] and open to the general
community. We plan to regularly update the results, and
publish a more comprehensive comparative evaluation as a
full-length journal publication.
We limit the scope of this paper to algorithms that re-
construct dense object models from calibrated views. Our
evaluation therefore does not include traditional binocular,
trinocular, and multi-baseline stereo methods, which seek
to reconstruct a single depth map, or structure-from-motion
and sparse stereo methods that compute a sparse set of fea-
ture points. Furthermore, we restrict the current evaluation
to objects that are nearly Lambertian, which is assumed by
most algorithms. However, we also captured and plan to
provide datasets of specular scenes and plan to extend our
study to include such scenes in the future.
This paper is not the first to survey multi-view stereo
algorithms; we refer readers to nice surveys by Dyer [3]
and Slabaugh et al. [4] of algorithms up to 2001. How-
ever, the state of the art has changed dramatically in the last
five years, warranting a new overview of the field. In addi-
tion, this paper provides the first quantitative evaluation of
a broad range of multi-view stereo algorithms.
2. A multi-view stereo taxonomy
One of the challenges in comparing and evaluating
multi-view stereo algorithms is that existing techniques
vary significantly in their underlying assumptions, operat-
ing ranges, and behavior. Similar in spirit to the binoc-
ular stereo taxonomy [1], we categorize existing meth-
ods according to six fundamental properties that differen-
tiate the major algorithms: the scene representation, photo-
consistency measure, visibility model, shape prior, recon-
struction algorithm, and initialization requirements.