Novel View Synthesis using Feature-preserving Depth Map Resampling
Duo Chen, Jie Feng and Bingfeng Zhou
Institute of Computer Science and Technology, Peking University, Beijing, China
{chenduo, feng jie, cczbf}@pku.edu.cn
Keywords:
Novel View Synthesis, Depth Map, Importance Sampling, Image Projection.
Abstract:
In this paper, we present a new method for synthesizing images of a 3D scene at novel viewpoints, based on a
set of reference images taken in a casual manner. With such an image set as input, our method first reconstruct
a sparse 3D point cloud of the scene, and then it is projected to each reference image to get a set of depth
points. Afterwards, an improved error-diffusion sampling method is utilized to generate a sampling point set
in each reference image, which includes the depth points and preserves the image features well. Therefore the
image can be triangulated on the basis of the sampling point set. Then, we propose a distance metric based on
Euclidean distance, color similarity and boundary distribution to propagate depth information from the depth
points to the rest of sampling points, and hence a dense depth map can be generated by interpolation in the
triangle mesh. Given a desired viewpoint, several closest reference viewpoints are selected, and their colored
depth maps are projected to the novel view. Finally, multiple projected images are merged to fill the holes
caused by occusion, and result in a complete novel view. Experimental results demonstrate that our method
can achieve high quality results for outdoor scenes that contain challenging objects.
1 INTRODUCTION
Given a set of reference images of a scene, novel view
synthesis (NVS) methods aim to render the scene at
novel viewpoints. NVS is an important task in com-
puter vision and graphics, and is useful in areas such
as stereo display and virtual reality. Its applications
include 3DTV, Google Street View (Anguelov et al.,
2010), scene roaming and teleconferencing.
NVS methods can be divided into two categories:
small-baseline methods and large-baseline methods,
where “baseline” refers to the translation and rotation
between adjacent viewpoints.
In the case of small-baseline problems, some
methods focus on parameterizing the plenoptic func-
tion with high sampling density. They arrange the
camera positions in well-designed manners and sam-
ple the scene uniformly with reference images. Typ-
ical examples include light field (Levoy et al., 1996)
and unstructured lumigraphs (Buehler et al., 2001).
Some other methods (Mahajan et al., 2009; Evers-
Senne and Koch, 2003) were proposed to produce
novel views by interpolating video frames, where ad-
jacent video frames have close viewpoints. Some
methods based on optical flow also belong to the
small-baseline category.
On the other hand, large-baseline NVS is a chal-
lenging, under constrained problem due to the lack of
full 3D knowledge, scale changes and complex oc-
clusions. It is thus necessary to seek additional depth
and geometry information or constraints like photo-
consistency and color-consistency.
For example, Google Street View (Anguelov et al.,
2010) directly acquire depth information with laser
scanners to interpolate large-baseline images. Some
other methods utilize structure-from-motion (SFM)
and multi-view stereo (MVS) to recover sparse 3D
point cloud of the scene and synthesis novel views
based on them. For instance, the rendering algo-
rithm of Chaurasia et al. (Chaurasia et al., 2013) syn-
thesized depth for the poorly constructed regions of
MVS and provides a plausible image-based naviga-
tion. However, their approach is limited by the ca-
pabilities of the oversegmentation, and the very thin
structures in the novel view may be missing.
Recent works also address the problem of large-
baseline NVS by training neural networks in an end-
to-end manner (Flynn et al., 2016). These methods
only require sets of posed images as training dataset,
and are general since they can give good results on
test sets that are considerably different from the train-
ing set. These methods are usually slower than MVS
based methods, and detailed textures in the images are
usually blurred. Moreover, the relationship between
3D objects and their 2D projections has a clear for-
mulation, and requiring neural networks to learn this
Chen, D., Feng, J. and Zhou, B.
Novel View Synthesis using Feature-preserving Depth Map Resampling.
In Proceedings of the 14th Inter national Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019) - Volume 1: GRAPP, pages
193-200
ISBN: 978-989-758-354-4
Copyright © 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
193