This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON BROADCASTING 1
A Virtual View PSNR Estimation Method for 3-D Videos
Hui Yuan, Member, IEEE, Sam Kwong, Fellow, IEEE, Xu Wang, Student Member, IEEE,
Yun Zhang,
Member, IEEE, and Fengrong Li
Abstract—In three-dimensional videos (3-DVs) with n-view texture
videos plus n-view depth maps, virtual views can be synthesized from
neighboring texture videos and the associated depth maps. To evaluate
the system performance or guide the rate-distortion-optimization pro-
cess of 3-DV coding, the distortion/PSNR of the virtual view should
be calculated by measuring the quality difference between the virtual
view synthesized by compressed 3-DVs with one synthesized by uncom-
pressed 3-DVs, which increases the complexity of a 3-DV system. In
order to reduce the complexity of 3-DV system, it is better to esti-
mate virtual view distortions/PSNR directly without rendering virtual
views. In this paper, the virtual view synthesis procedure and the dis-
tortion propagation from existing views to virtual views are analyzed
in detail, and then a virtual view distortion/PSNR estimation method
is derived. Experimental results demonstrate that the proposed method
could estimate PSNRs of virtual views accurately. The squared correla-
tion coefficient and root of mean squared error between the estimated
PSNRs by the proposed method and the actual PSNRs are 0.998 and
2.012 on average for all the tested sequences. Since the proposed method
is implemented row-by-row independently, it is also friendly for parallel
design. The execute time for each row of pictures with 1024×768 reso-
lution is only 0.079 s, while for pictures with 1920×1088 resolution it is
only 0.155 s.
Index Terms—
Distortion estimation, 3DV, video coding.
I. I
NTRODUCTION
W
ITH the improvements in high-speed networking, high-
capacity storage, and high-quality auto-stereoscopic display
technologies, extensive commercial applications of three-dimensional
Manuscript received June 18, 2015; revised September 30, 2015; accepted
October 13, 2015. This work was supported in part by the National Natural
Science Foundation of China under Grants 61571274, 61201211, 61471348,
and 61501299; in part by the Young Scholars Program of Shandong University
(YSPSDU) under Grant 2015WLJH39 in part by the Ph.D. Programs
Foundation, Ministry of Education of China under Grant 20120131120032;
in part by the Key laboratory of wireless sensor network and communica-
tion, Chinese Academy of Sciences under Grant 2013002; in part by the
Shenzhen Emerging Industries of Strategic Basic Research Project under
Grant JCYJ20150525092941043, in part by the City University of Hong Kong
Applied Research Grant 9667094; and in part by the City University of Hong
Kong Shenzhen Research Institute, Shenzhen, China.
H. Yuan is with the School of Information Science and
Engineering, Shandong University, Jinan 250100, China (e-mail:
yuanhui0325@gmail.com).
S. Kwong is with the Department of Computer Science, City University
of Hong Kong, Hong Kong, and also with the City University of Hong
Kong Shenzhen Research Institute, Shenzhen 5180057, China (e-mail:
cssamk@cityu.edu.hk).
X. Wang is with the College of Computer Science and Software
Engineering, Shenzhen University, Shenzhen 518060, China (e-mail:
wangxu@szu.edu.cn).
Y. Zhang is with the Shenzhen Institutes of Advanced Technology,
Chinese Academy of Sciences, Shenzhen 518055, China (e-mail:
yun.zhang@siat.ac.cn).
F. Li is with the Key Laboratory of Wireless Sensor Network
and Communication, Shanghai Institute of Microsystem and Information
Technology, Chinese Academy of Sciences, Shanghai 200050, China (e-mail:
lifengrongsim@mail.sim.ac.cn).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TBC.2015.2492461
video (3DV) are becoming reality, e.g., the well-known 3D tele-
vision (3DTV) [1] and free viewpoint television (FTV) [2]. In
a typical 3DV system, which includes capture, storage, transmission,
and display, a 3D scene should first be represented efficiently by
using a small amount of data [3]. Among 3D scene representation
technologies [3], representations made by n-view texture videos plus
n-view depth maps have been used extensively. For this kind of scene
representation, virtual views should be rendered from the acquired
n-view videos and their corresponding n-view depth maps by depth
image-based rendering (DIBR) [4].
In order to obtain the distortion or quality of the virtual view for
high-efficiency 3DV coding (3DVC) and 3DV quality assessment,
the distortion or PSNR of the virtual view can be calculated by com-
paring a virtual view synthesized by compressed 3DVs and the one
synthesized by uncompressed 3DVs, which increases the complexity
of a 3DV system. A more economical way is to estimate the value
of the virtual view’s distortion/PSNR directly. In [5], Zhang et al.
proposed a regional based virtual view distortion estimation method
for depth maps coding. In [6], a linear model based virtual view
distortion estimation method was proposed for depth maps cod-
ing. In our previous work [7], a planar model based virtual view
distortion estimation method was proposed for joint bit allocation
between texture videos and depth maps. Besides, similar distortion
models and applications can also be found in [8]–[10]. The existing
methods [5]–[10] can estimate the distortion variation tendency to
some extent, but the estimated virtual view distortion may not be close
to the actual distortion. In order to estimate the virtual view distor-
tion accurately, a synthesis distortion estimation method is proposed
in [11]. In this method, the effect of depth map distortion on synthesis
distortion is broken down into 2 parts, spatial variant region and spa-
tial invariant region, based on frequency domain analysis. However,
the model cannot be used easily due to its high computational
complexity.
From the basis of DIBR technology, it can be concluded that
the distortion of the virtual view depends only on the distortion
of the left and the right texture views and depth maps when the
camera systems are calibrated well [7]. Since the DIBR technol-
ogy is mathematically analytical, the distortion of virtual view can
also be mathematical derived from the distortion of the left and
the right texture views and depth maps. Motivated by this point,
a fast and accurate virtual view distortion/PSNR estimation method
is proposed by detailed analyzing of the virtual view synthesis
procedure.
To estimate the distortion/PSNR for virtual view accurately
with low complexity, DIBR procedure and distortion propagation
from existing views to virtual views are analyzed in detail in
Section II. During the analysis, it is necessary to mention that
the DIBR procedure is equal to disparity compensation when all
the cameras are well calibrated [7], thus a depth coding error
can only affect the horizontal position of the projected pixels
in the virtual view. In addition, for clear representation, a sum-
mary of some frequently used notations are given in Table I.
Experimental results and conclusions are given in Section III and IV
respectively.
0018-9316
c
2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.