Signal Processing: Image Communication 71 (2019) 56–65
Contents lists available at ScienceDirect
Signal Processing: Image Communication
journal homepage: www.elsevier.com/locate/image
Exemplar-based depth inpainting with arbitrary-shape patches and
cross-modal matching
Sen Xiang
a,b
, Huiping Deng
a,b,
*, Lei Zhu
a,b
, Jin Wu
a,b
, Li Yu
c
a
School of Inform. Sci. & Engn., Wuhan Univ. of Sci. & Tech., Wuhan, 430081, China
b
Engin. Research Center of Metallurgical Auto. and Measurement Tech., Ministry of Education, Wuhan, 430081, China
c
School of Electron. Inform. & Coummun., Huazhong Univ. of Sci. & Tech., Wuhan, 430074, China
A R T I C L E I N F O
Keywords:
Depth map
Inpainting
Exemplar
Edge-preserving
3D video
A B S T R A C T
Commodity RGB-D cameras can provide texture and depth maps in real-time, and thus have facilitated the
booming development of various depth-dependent applications. However, depth maps suffer from the loss of
valid values, which leads to holes and impairs both research and applications. In this paper, we propose a
novel exemplar based method to fill depth holes and thus to improve depth quality. This novel method is
based on the fact that a depth map has many similar even identical parts, and the lost depth values can be
restored by referring to valid ones. Considering the intrinsic property of depth maps, i.e., the sharpness of object
boundaries, we propose to use arbitrary-shape matching patches, instead of fixed squares, to avoid inter-depth-
layer distortion and thus improve the boundary. In addition, since depth values do not have distinct features,
cross-modal matching, where both depth and texture are involved, is utilized. Moreover, we also investigate
the similarity criteria in cross-modal matching, in order to improve the accuracy between the source patch and
the target patch. Experimental results demonstrate that the proposed method can accurately recover lost depth
information, especially at boundaries, which outperforms state-of-the-art exemplar-based inpainting methods.
1. Introduction
Depth information is a fundamental element in various applica-
tions, such as free-viewpoint video [1], 3D reconstruction [2] and face
recognition [3]. In recent years, commodity RGB-D cameras, based on
structured light [4] or time-of-flight technique [5], have made depth
acquisition easy and convenient. However, due to the limitation of depth
generation principles and hardware, the reported depth maps have
many holes inside, which impairs further research and applications. To
solve this problem, many researchers have studied the topic of depth
inpainting. In general, these methods can be categorized into filtering-
based ones and exemplar-based ones.
In filtering-based methods, special filters are designed to diffuse valid
depth values to invalid ones. Min et al. [6] proposed the weighted mode
filter. Instead of spatial and intensity similarity, this filter uses statistic
data of valid depth values, and it yields sharp depth edges. Yang et al. [7]
proposed to use an auto-regressive model to estimate the coefficients of
the filter, and thus the filtering can adapt to local context. Miao et al. [8]
and Xiang et al. [9] considered the homogeneity of depth gradient,
and the depth values are obtained under the constraint of gradients
by solving partial differential equations. Milani et al. [10] proposed to
*
Corresponding author at: School of Inform. Sci. & Engn., Wuhan Univ. of Sci. & Tech., Wuhan, 430081, China.
E-mail address: denghuiping@wust.edu.cn (H. Deng).
use a set of local differential equations to interpolate the missing depth
samples. Xue et al. [11] introduced a low gradient regularization method
where the gradual depth changes are allowed by reducing the penalty for
small gradients while penalizing the non-zero gradients. Zhao et al. [12]
proposed a two-stage filtering on blurred depth maps. The distorted
depth maps are successively processed with binary segmentation-based
depth filtering and MRF-based reconstruction.
Owing to the consistency between texture and depth, texture-
guided filtering is also quite popular. These filters incorporate texture
similarity, and thus different objects can be distinguished and depth
boundaries can be well preserved. The simplest examples are joint
bilateral filter [13,14] and trilateral filter [15], where the weighting
kernel has a texture similarity term. Kim et al. [16] modified the
color weights by considering the texture-depth map consistency. Bapat
et al. [17] proposed a novel iterative median filter which takes into
account the RGB components as well. The color similarity is measured
with the absolute difference of the neighboring pixels and their median
value. Chang et al. [18] proposed adaptive texture-similarity-based
hole filling, where luminance, instead of RGB, are used as guidance.
Bhattacharya et al. [19] focused on removing depth edge distortions.
https://doi.org/10.1016/j.image.2018.07.005
Received 7 December 2017; Received in revised form 5 July 2018; Accepted 5 July 2018
Available online 10 July 2018
0923-5965/© 2018 Published by Elsevier B.V.