Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
Copyright (c) 2010 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
Stereo Interleaving Video Coding with Content Adaptive Image
Subsampling
Yongbing Zhang, Xiangyang Ji, Haoqian Wang, and Qionghai Dai
Abstract—Stereo interleaving video coding, where both left and
right view frames are subsampled into half size and multiplexed
into one single frame before encoded by a traditional 2D video
encoder, is an efficient encoding scenario for stereoscopic video.
Many existing stereo interleaving video coding methods
subsample each frame utilizing fixed subsampling filter
coefficients. Such methods are easy to implement, however the
varying property of the frame signal is ignored. By jointly
considering the influences of subsampling and compression, a
rate and distortion analysis about stereo interleaving video
coding is proposed. The final distortion in stereo interleaving
video coding is the summation of errors caused by subsampling
(causing distortion between subsampling-interpolated image and
the original full resolution one) and by quantization during
compression. Based on the provided rate distortion analysis, a
content adaptive image subsampling (CAIS) is also proposed. In
CAIS, the half size frames are generated by the optimal
subsampling filters, which are calculated based on frame
contents and the targeted interpolation coefficients.
Experimental results demonstrate that the proposed CAIS is able
to greatly improve compression efficiency of stereo interleaving
video coding.
Index Terms—Stereo interleaving video, frame packing
arrangement, content adaptive, rate distortion analysis
I. INTRODUCTION
N recent years, stereoscopic video has drawn significant
attention with more and more products and services
becoming available in the consumer markets. Stereoscopic
video, a type of visual media that provides depth perception of
the observed scenery, creates a perception of 3D using two 2D
images [1]. Each 2D image is selectively targeted at either left
eye or the right eye in a way designed to recruit the brain’s
natural depth sensing abilities. The 3D depth perception can be
provided by 3D display systems which ensure that the user
observes a specific different view with each eye [2]. With the
Manuscript received March 10, 2012; revised June 19, 2012. This work was
partially supported by National Science Foundation of China (61170195), the
Joint Funds of National Science Foundation of China (U0935001), the
Upgrading Project of Shenzhen Key Laboratory ( CXB201005260071A), and
the Basic Research Plan in Shenzhen City (JC201005310709A and
JC201105201110A). This paper was recommended by Associate Editor
Levent Onural.
Y. Zhang and H. Wang are with Shenzhen Key Laboratory of Broadband
Network & Multimedia, Graduate School at Shenzhen, Tsinghua University ,
Shenzhen 518055, China. (e-mail: ybzhang@tsinghua.edu.cn;
wanghaoqian@tsinghua.edu.cn).
X. Ji and Q. Dai are with the Broadband Networks and Digital Media
Laboratory, Automation Department, Tsinghua University, Beijing 100084,
China (e-mail: xyji@tsinghua.edu.cn; qhdai@tsinghua.edu.cn).
Copyright (c) 2010 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending an email to pubs-permissions@ieee.org.
rapid increase of 3D contents emerging, stereoscopic video is
also an increasingly interesting technology for home user
living room and mobile 3D video services [3].
Compared to 2D video, stereoscopic video has doubled the
amount of data due to the existence of an extra view.
Consequently, additional bandwidth will be required for
transmission and storage [4], which imposes a high demand on
the efficient compression of stereoscopic video. There are
various ways to encode the stereoscopic video. The most direct
one is simulcast video coding, which encodes the left and right
view frames independently. This can be easily realized by
traditional 2D video coding system. However, the bit rate in
simulcast video coding will be doubled, which imposes a great
challenge on the existing transmission and encoding system.
The other way is to encode the stereoscopic video by exploring
the inter-view redundancies, namely inter-view prediction [5]
[6]. Under this encoding scenario, the left view video is
encoded by traditional 2D video encoder, for example
H.264/AVC [7], while the reconstructed left view frame can
also be referenced by the right view frames. This method is
able to significantly improve the encoding performance. To
efficiently compress the stereoscopic video, MPEG added
Stereo High Profile [8] in July 2009 to deal specially with the
case in which the multiple views of Multiview Video Coding
(MVC) were the left and right stereo views. The Stereo High
Profile limits the number of encoded views to two, and
includes support for interlaced coding tools – such that the
resulting profile supports the same set of coding tools as in the
prior high profile, but with stereo inter-view prediction
enabled. However, inter-view prediction encoding scenario
needs to upgrade the existing infrastructure and equipment,
since additional bandwidth cannot be avoided. Alternatively,
stereo interleaving video encoding scheme [9-12] can be
explored using existing 2D video coding methods.
Compared with the former two methods, stereo interleaving
video encoding scenario facilitates the introduction of
stereoscopic services without upgrading the existing
infrastructure and equipment. Besides, stereo interleaving
video encoding easily supports the synchronization between
the two views [3]. As a result, stereo interleaving video
encoding scheme receives considerable attention from
broadcast industry. Many pioneering works have been done to
improve the efficiency of stereo interleaving video coding. For
example, an adaptive interpolation has been proposed to
improve the efficiency of stereo interleaving video coding [11],
where segmentation is performed within each frame and a
common interpolation mode is applied for each segmentation
part. However, the information of interpolation modes should
be transmitted to the receiver, which is not compatible to the
existing 2D video coding standards. In addition, [12] proposed
an enhanced rate distortion optimization method, which
utilizes the distortion between upsampled reconstructed block