Scalable Coding of 3D Holoscopic Image by Using a Sparse Interlaced View Image Set and Disparity Map 3
However, in the above mentioned coding schemes, such high spatial correlation among the rendered view
images is not fully explored. Although some compression schemes are proposed to decompose the holoscopic
contents into multi-view sequences, MVC standard is simply utilized to reduce redundancy. Moreover, such
coding schemes can not provide a scalable coding of holoscopic contents. Regarding to the 3D holoscopic image
scalable coding method, a scalable coding method by using the rendered views as prediction references is pro-
posed in [22]. However, the inter-layer prediction process in [22] is based on the hypothesis that the disparities
of all the adjacent EI are approximatively equal, which is not always proper. Additionally, the coding bit rate
of the reference image is not included in the final bit stream, which may influence the decoded holoscopic image
quality.
In order to improve the coding efficiency by exploiting the existing correlation among the rendered view
images as well as provide coding scalability, a scalable coding scheme by using a sparse interlaced view image
set and disparity map is proposed in this paper. To descript the spatial correlation among the rendered view
images clearly, we prop ose to use the interlaced view image to represent the 3D holoscopic content. Some re-
dundancy of neighboring VIs in interlaced view image is firstly removed by using a sparse interlaced view image
set and corresponding disparity map before encoding. Then, based on the sparse interlaced view image set and
disparity map, a full interlaced view image can be reconstructed by using sifting with simple interpolation. The
reconstructed interlaced view image is finally utilized as a reference to predict the original interlaced view im-
age with a modified HEVC encoder. The proposed scalable coding method has a three-layer structure. Spatial
resolution scalability can be provided from first to second layer, and from second to the third layer, quality
scalability is available. The main contributions of this paper are: 1) interlaced view image is used to represent
the 3D holoscopic content to exploit the high spatial correlation among the rendered view images; 2) a sparse
interlaced view image and corresponding disparity map are used to coding the interlaced view image; 3) coding
scalability is enable in the proposed coding scheme. Note that this work is limited to the compression of the 3D
holoscopic images captured by the Plenoptic Camera 2.0 [26].
This paper is organized as follows. The common view rendering methods are illustrated in Section 2. The
proposed scalable coding method is described in Section 3. Experimental results are presented and analyzed in
Section 4, while the concluding remarks are given in Section 5.
2 View image rendering
The light rays emanating from the 3D scene can be expressed by using the complete seven dimensional
parametrization plenoptic function which is introduced by Adelson and Bergen [23]:
I = P
7
(x, y, z, θ, ϕ, λ, t) (1)
where (x, y, z) is the viewing position, (θ, ϕ) is the light ray directions, λ is the light ray wavelength and t
is the time. If we assume that the 3D scene is a static scene and the color is represented by RGB channels,
the plenoptic function can be reduced to five dimensions without λ and t. Moreover, if the regions are free
of occluders, the plenoptic function can be further simplified into four dimensions [24][25], which defines a
light ray by the coordinates of its intersections with two parallel planes. This means that the 3D holoscopic
image captures both spatial and angular information of a 3D scene. Therefore, VIs can be rendered from a 3D
holoscopic image, where the VIs represent the orthographic projections of the captured 3D scene in different
directions.
The simplest way to construct a single view image is to extract one pixel with the same relative position
from each EI of a given 3D holoscopic image and then stitch them together. However, extracting only one pixel
from each EI results in disappointingly low resolution and the rendered view image suffers from severely blocky
artifacts. Another common rendering method is to construct a VI by extracting a patch from each EI [26]. The
rendering process is shown in Fig.2. Suppose that a P × P patch is extracted from each EI of size n
x
× n
y
with
the same relative position. With N
x
× N
y
EIs in 3D holoscopic image, the final rendered view image is of size
P · N
x
× P · N
y
. By extracting a patch in each EI, the resolution of the rendered VI can be improved. However,
some artifacts still likely appear on part of the rendered view by using a fixed patch size [26].