Recovering Consistent Video Depth Maps via Bundle Optimization
Guofeng Zhang
1
Jiaya Jia
2
Tien-Tsin Wong
2
Hujun Bao
1
1
State Key Lab of CAD&CG, Zhejiang University
2
The Chinese University of Hong Kong
{zhangguofeng, bao}@cad.zju.edu.cn {leojia, ttwong}@cse.cuhk.edu.hk
input sequence output video depth maps
Figure 1. High-quality depth reconstruction from the video sequence “Road” containing complex occlusions. Left: An input video sequence
taken by a moving camera. Right: Video depth maps automatically computed by our method. The thin posts of the traffic sign and street
lamp, as well as the road with graduate depth change, are accurately constructed in the recovered depth maps.
Abstract
This paper presents a novel method for reconstruct-
ing high-quality video depth maps. A bundle optimization
model is proposed to address the key issues, including im-
age noise and occlusions, in stereo reconstruction. Our
method not only uses the color constancy constraint, but
also explicitly incorporates the geometric coherence con-
straint associating multiple frames in a video, thus can nat-
urally maintain the temporal coherence of the recovered
video depths without introducing over-smoothing artifact.
To make the inference problem tractable, we introduce an
iterative optimization scheme by first initializing disparity
maps using segmentation prior and then refining the dis-
parities by means of bundle optimization. Unlike previ-
ous work estimating complex visibility parameters, our ap-
proach implicitly models the probabilistic visibility in a sta-
tistical way. The effectiveness of our automatic method is
demonstrated using challenging video examples.
1. Introduction
Stereo reconstruction of dense depths from real images
has long been a fundamental problem in computer vision.
The reconstructed depths can be used by a wide spectrum
of applications including 3D modeling, robot navigation,
image-based rendering, and video editing. Although stereo
problem [14, 8, 15, 23] has been extensively studied during
the past decades, obtaining high-quality dense depth data is
still a challenging problem due to many inherent difficulties,
such as image noise, textureless pixels, and occlusions.
Given an input video sequence taken by a freely moving
camera, we propose a novel method to automatically con-
struct high-quality and consistent depth maps for all frames.
Our main contribution is the development of a global opti-
mization model based on multiple frames, which we called
bundle optimization, to resolve most of the aforementioned
difficulties in disparity estimation.
Our method does not explicitly model the binary visi-
bility (occlusion). Instead, the visibility is encoded nat-
urally in the energy definition. Our model also does not
distinguish among image noise, occlusions and estimation
errors, so as to achieve a unified framework in modeling
matching ambiguities. The color constancy constraint and
geometric coherence constraint linking different views are
combined in an energy minimization framework, reliably
reducing the influence of image noise and occlusions in a
statistical way. This process makes our optimization not
produce over-smoothing or blending artifact.
In order to deal with the disparity estimation in texture-
less region and alleviate the problem of segmentation es-
pecially on fine object structures, we only use the image
segmentation prior in the disparity initialization. Then our
iterative optimization algorithm refines the segmented dis-
parities in a pixel-wise manner. Experiments show that this
is rather effective in estimating correct disparities in texture-
less regions while faithfully preserving the fine structures of
object silhouettes.
Our method is very robust against occlusions, matching
ambiguities, and noise. We have conducted experiments on
a variety of challenging examples. Automatically computed
depth maps contain very little noise. Clear object silhou-
978-1-4244-2243-2/08/$25.00 ©2008 IEEE
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology. Downloaded on August 12, 2009 at 23:05 from IEEE Xplore. Restrictions apply.