54 Y. Zuo, Y. Fang and Y. Yang et al. / Information Sciences 495 (2019) 52–64
2. Related works
As the focus of early research, single depth map enhancement is the extension of color image super-resolution (SR)
which does not use guidance from the corresponding color image. Xie et al. [38] enhance the LR depth map by using joint
bilateral filter guided by the HR edge map. Such edge map is constructed from the edges of the LR depth map through
MRF inference. Following the success of sparse coding in color image SR [36,47] , Ferstl et al. [8] use an anisotropic diffusion
tensor to design the Markov Random Field (MRF) regularization term. Such tensor is extracted form the HR edge map
predicted by sparse coding. The method proposed by Xie et al. [39] is to simultaneously perform single depth map SR and
denoising. It trains a robust coupled dictionaries with locality coordinate constraints. The HR depth map is reconstructed by
using the sparse vector on the learned dictionaries, the adaptively regularized shock filter and L0 gradient smooth constraint.
Recently, motivated by the latest developments in color image SR which introduces DCNN [20] , Riegler et al. [34] propose
a novel variational method for single depth map enhancement of which the data term and regularization term are learned
by DCNN. Chen et al. [3] adopt DCNN to acquire a high-quality edge map from the LR depth map. The HR depth map
is reconstructed via MRF inference by embedding the acquired edge map. Song et al. [35] propose to represent the task
of depth map super-resolution as a series of novel view synthesis sub-tasks. Such type of methods performs well in the
case of small up-sampling factors, e.g., 2 × and 4 × . However, as the up-sampling factor rising, the drawback will be more
and more clear due to the limitation of single depth map enhancement. To improve the performance, the corresponding
HR color image is introduced to provide guidance. According to the usage of training data, the existing methods can be
classified into filter-based, optimization-based and learning-based categories. The filter-based methods and the optimization-
based methods explicitly exploit the co-occurrence of the edges between the color image and the depth map via predefined
models, while learning-based counterparts extract the guidance from the HR color image in data-driven ways. The following
subsections review three types of methods respectively.
2.1. Filter-based method
Filter-based methods only use the local information which independently compute the depth values of pixels. As the first
work, Kopf et at. [15] propose the Joint Bilateral Up-sampling (JBU) framework which uses HR color edges to refine the LR
depth edges through bilateral filter. Based on JBU, many variants further improve the performance. Liu et al. [21] compute
the weight based on geodesic to protect the depth edges. Yang et al. [43] construct a cost volume for depth candidates by
using JBU [15] . The coarsely upsampled depth map is iteratively refined within such cost volume. He et al. [9] enhance the
LR depth map by assuming the linear relation between the patch pair from the output and the guidance image. Min et al.
[27] use the joint histogram of the depth candidates to up-sample the LR depth map. Barron and Poole [1] propose a fast
bilateral solver which can be used for color-guided depth map enhancement.
2.2. Optimization-based method
Optimization can be used to solve the problems in many fields, e.g., human health [31] , image search [29] , hash code
learning [25] . For depth map enhancement, some hand-craft priors are introduced in optimization-based methods which
simultaneously compute the depth values for all the pixels. Compared with filter-based methods, they always have superior
performances in depth map de-noising. Diebel et al. [5] model depth map enhancement as the inference problem of Markov
Random Field (MRF) for the first time. By following it, Park et at. [32] integrate the edge, gradient and segmentation infor-
mation from the HR color image to design the anisotropic affinities of the regularization term. Ferstl et al. [7] regularize the
HR depth map by using a second order total generalized variation constraint which is guided by an anisotropic diffusion
tensor extracted from the HR color image. Liu et al. [22] implicitly mitigate the texture-copying artifacts and maintain the
depth edges by designing the regularization term based on robust M-estimator. Zuo et al. [49] explicitly evaluate the edge
inconsistency between the color image and the depth map which is further embedded into MRF inference. By considering
the structure of depth map, Zuo et al. [50] compute the anisotropic affinities in the distance space consisting of minimum
spanning trees. The edge inconsistency [49] is embedded into the edge weights of spanning trees. Li et al. [18] propose
a hierarchical global optimization framework where the depth map is literately refined by using the fast weighted least
squares solver [26] . Yu et al. [46] propose intensity-guided depth up-sampling using edge sparsity and weighted L0 gradient
minimization. In addition to the MRF model, Yang et al. [41,42] propose a novel color-guided depth map enhancement via
auto-regression model.
2.3. Learning-based method
As more and more RGB-D datasets are available, sparse coding which shows great success for low-level computer vision
is introduced into color-guided depth map enhancement. As a pioneer work, Li et al. [19] jointly train three dictionaries
for the corresponding patches from the LR depth map, the HR depth map and the HR color image. The sparse vector is
shared between the dictionaries to independently reconstruct the HR depth patches. Kwon et al. [16] further improve the
performance by using the multi-scale dictionary training scheme. In the reconstruction phase, the consistency constraint
is defined on the overlapping patches. In addition to the synthesis model of sparse coding [16,19] , based on the analysis