A CNN Cascade for Quality Enhancement
of Compressed Depth Images
Zhi JIN
∗
,LeiLUO
†
, Yi TANG
∗
, Wenbin ZOU
∗
,XiaLI
∗
∗
College of Information Engineering, Shenzhen University, Shenzhen, P.R. China.
E-mail: jinzhi
126@163.com; wzouszu@sina.com
†
College of Telecommunication and Information Engineering,
Chongqing University of Posts and Telecommunications, Chongqing, P.R. China.
Abstract—Transmitting depth images along with the corre-
sponding textures enables a wide range of receiver-side 3D
applications. Since each pixel on the depth images represents a
corresponding 3D scene geometric information, when compressed
during transmission the compression artifacts will lead to severe
geometry distortions and visual perceptual degradation. To solve
this problem, in this paper we proposed a convolutional neural
network (CNN) cascade for suppressing the compression artifacts
on depth images. According to the feature of depth images,
we furthermore, adopt a weighted loss function for network
training which can adaptively improve the learning efficiency
and accuracy. Meanwhile, in order to over come the limited
training data problem, we audaciously trained our network on
textures first and then finetune on the target depth images. To our
best knowledge, few works have applied CNN on depth images
targeting for compression artifacts reduction (CAR). Through
extensive experiments, our proposed solution achieves higher
quality for both reconstructed depth images and synthesized
virtual views than the state-of-the-art methods.
Index Terms—Convolutional neural network, Depth images,
JPEG compression, Compression artifacts reduction, Quality
enhancement.
I. INT RODUCT ION
Texture images associated with per-view depth image not
only can provide a depth perception of real scenes, but also can
support free navigation into other viewpoints by view synthesis
techniques, such as depth-image-based-rendering (DIBR) [1].
However, in comparison with traditional 2D images, th is
format still puts more pressure on the acq uisition, storage and
transmission units of multimedia systems. In this case, image
compression schemes are highly demanded both for texture
and depth images. On the one hand, lossy compression (e.g.
JPEG [2]) has been widely employed in social media networks
due to its high compression efficiency. On the other hand,
any lossy compression inevitably degrades the image’s quality.
Especially for depth image whose value presents 3D scene
geometric information, when encountered compression, severe
geometry distortions and visual p erceptual degradation over
discontinuous regions will occur, such as blocking artifacts and
blurring, which will affect quality of both depth image itself
and the synthesized views in stereoscopic image applications.
Recently, a lot of novel proposals focus on denoising
and super-resolu tion of depth images corrupted by estimation
noise and acquisition. In terms of adoption methods, they
can be classified into filter-based, model-based and currently
the most popular one, learning-based methods. Among filter-
based m ethods, one typical representation is joint bilateral
upsampling (JBU) [3] where the bilateral weights are based
on the guidance from textures. Start from it, more complex
and sophisticated filters have been proposed, for example, the
joint trilateral filter (JTF) [4]. Based on the structural similarity
between textures and depth images, filter-based methods are
used to transfer the salient structure from inten sity image to
the enhanced depth image, while for model-based methods,
the modeling dependency between texture and depth images
plays an important role, such as markov random fields (MRF)
[5] and nonlocal mean (NLM) [6] models. Motivated by the
success of deep learning on object detection and classification,
it also has been applied to low-level vision task. Dong et al. [7]
proposed a 3-layer CNN to implement image super-resolution
and by adding one more layer, they successfully reduce the
compression artifacts on textures [8]. Inspired by [7], Zhang
et al. [9] proposed a 3-layer light convolutional network with
textures’ assistance to implement depth denoising. Besides, he
also proposed to utilize weighted loss function to emphasize
the edges influence in depth image and this also inspires u s.
Compared with acquisition and estimation noise, lossy com-
pression caused artifacts are more complex, which includes not
only noise but also blocking and blurring effects. Therefore,
the reduction of compression artifacts is more challenging and
demanded. Xu et al. [10] presented a low complexity adaptive
depth truncation filter in which all edge pixels are replaced
by a mean value in each block to reduce the artifacts in a
compressed depth image. However, such a direct region-based
replacement often leads to some distortions in non-flat regions,
such as slop or curved surfaces. Zhao et al. [11] proposed a fast
candidate values based boundary filtering (CVBF) method to
reduce the boundary distortions of compressed depth images.
Motivated by the above methods, we consider that learning-
based methods would be helpful in extracting and mapping the
hidden information in the compressed depth images so as to
reduce co mpression artifacts. Meanwhile, in the literature, the
majority of depth enhancement methods require assist from
textures, however, it can not always guarantee that the aligned
textures are accessible. Hence, with this concern, we introduce
a cascaded fully convolutional network (FCN) which directly
978-1-5386-0462-5/17/$31.00 ©2017 IEEE.
VCIP 2017, Dec. 10 – 13, 2017, St Petersburg, U.S.A.