PatchMatch Based Joint View Selection and Depthmap Estimation
Enliang Zheng, Enrique Dunn, Vladimir Jojic, and Jan-Michael Frahm
The University of North Carolina at Chapel Hill
{ezheng,dunn,vjojic,jmf}@cs.unc.edu
Abstract
We propose a multi-view depthmap estimation approach
aimed at adaptively ascertaining the pixel level data asso-
ciations between a reference image and all the elements of
a source image set. Namely, we address the question, what
aggregation subset of the source image set should we use to
estimate the depth of a particular pixel in the reference im-
age? We pose the problem within a probabilistic framework
that jointly models pixel-level view selection and depthmap
estimation given the local pairwise image photoconsistency.
The corresponding graphical model is solved by EM-based
view selection probability inference and PatchMatch-like
depth sampling and propagation. Experimental results on
standard multi-view benchmarks convey the state-of-the art
estimation accuracy afforded by mitigating spurious pixel-
level data associations. Additionally, experiments on large
Internet crowd sourced data demonstrate the robustness of
our approach against unstructured and heterogeneous im-
age capture characteristics. Moreover, the linear computa-
tional and storage requirements of our formulation, as well
as its inherent parallelism, enables an efficient and scalable
GPU-based implementation.
1. Introduction
Multi-view depthmap estimation (MVDE) methods
strive to determine a view dependent depthfield by leverag-
ing the local photoconsistency of a set overlapping images
observing a common scene. Applications benefiting from
high quality depthmap estimates include dense 3D model-
ing, classification/recognition [20] and image based render-
ing [6]. However, achieving highly accurate depthmaps is
inherently difficult even for well controlled environments
where factors such as viewing geometry, image-set color
constancy, and optical distortions are rigorously measured
and/or corrected. Conversely, practical challenges for ro-
bust depthmap estimation from non-controlled input im-
agery (i.e. Internet collected data) include mitigating het-
erogeneous resolution and scene illuminations, unstructured
viewing geometry, scene content variability and image reg-
istration errors (i.e. outliers). Moreover, the increasing
availability of crowd sourced datasets has explicitly brought
efficiency and scalability to the forefront of application re-
quirements, while implicitly increasing the importance of
data association management when processing such large
scale datasets.
The input for MVDE is commonly assumed to consist
of a convergent set of images along with reliable estimates
of their pose and calibration parameters. The extracted
depthmap will correspond to the pixel-wise 3D structure hy-
potheses that best explain the available image observations
in terms of some measure of visual similarity w.r.t. a ref-
erence image. Ironically, the potential robustness afforded
by having multiple available images is compromised by the
inherent variability in pairwise photoconsistency observa-
tions. In practice, correct depth hypotheses may provide
low photoconsistency in a source image subset (e.g. oc-
clusions or illumination aberrations), while incorrect depth
hypotheses may register high image similarity (e.g. repet-
itive structure or homogeneous texture). These technical
challenges render multi-view depth hypothesis evaluation
as a problem of robust model fitting, where a demarcation
among inlier and outlier photoconsistency observations is
required. We tackle this implicit data association problem
by addressing the question: What aggregation subset of the
source image set should be used to estimate the depth of a
particular pixel in the reference image.
We propose a probabilistic framework for depthmap es-
timation that jointly models pixel-level view selection and
depthmap estimation given pairwise image photoconsis-
tency. An overview is depicted in Figure 1. The cor-
responding graphical model is solved by EM-based view
selection probability inference and PatchMatch-like depth
sampling and propagation. Our approach iteratively alter-
nates between exploration of the depth search space and
updating our formulated probabilistic model. The insight
leveraged by our method is the spatial smoothness in the
photoconsistency at the correct depth hypothesis of a given
pixel w.r.t. the images in the source image dataset [22, 13].
Our expectation of having a high overlap of photoconsistent
source images among neighboring pixels in the reference
1