February 10, 2010 / Vol. 8, No. 2 / CHINESE OPTICS LETTERS 151
Fast macroblock mode selection algorithm for multiview
depth video coding
Zongju Peng ($$$mmmÞÞÞ)
1,2,3
, Mei Yu ( rrr)
2
, Gangyi Jiang (öööfffÀÀÀ)
1,2∗
, Feng Shao ( ¶¶¶)
2
,
Yun Zhang (ÜÜÜ )
1,3
, and You Yang ( fff)
1,3
1
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
2
Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China
3
Graduate University of Chinese Academy of Sciences, Beijing 100049, China
∗
E-mail: jianggangyi@126.com
Received April 17, 2009
Huge computational complexity of multiview video plus depth (MVD) coding is an obstacle for putting
MVD into applications. A fast macroblock mode selection algorithm is proposed to reduce the computa-
tional complexity of multiview depth video coding. The proposed algorithm, implementing on a joint coding
scheme, combines an effective prediction mechanism and an object boundary discriminating method. The
prediction mechanism which is designed based on the macroblock mode similarities reduces the number of
macroblo ck mode candidates in depth video coding. The object boundary discriminating method extracts
the regions, which are with discontinuous depth values and important for virtual view rendering, by using
macroblo ck deviation factor. Exp erimental results show that the prop osed algorithm can significantly
promote the coding speed of depth video by 2.00–3.40 times, while maintaining high rate distortion (RD)
p erformance in comparison with the full search algorithm.
OCIS codes: 110.0110, 100.6890, 330.1690.
doi: 10.3788/COL20100802.0151.
With the fast development in the areas of integrated
optics with sensors and network infrastructures, three-
dimensional (3D) video systems will soon be used in a
great number of applications. Integral imaging technol-
ogy, one of the most promising methods for 3D scenes
representation, attracts a lot of research interests
[1,2]
.
Multiview video plus depth (MVD)
[3]
is an alternative
to integral imaging for representing 3D scenes. MVD
signals include multiple texture videos and associated
depth videos of the same scene. MVD signals are first
captured at different sparse viewpoints and compressed,
then transmitted to client. The MVD bit streams are
decoded and utilized to synthesize the virtual views with
depth-image-based rendering (DIBR) technique.
To efficiently compress MVD signals, Park et al. pro-
posed the view-temporal prediction structures that can
be adjusted to various characteristics of general multi-
view video
[4]
. In Ref. [5], an effective algorithm was
proposed to eliminate the color inconsistency between
multiview videos for better coding and rendering perfor-
mances. Yang et al. proposed an image region partition
and regional disparity estimation algorithm for mul-
tiview video coding
[6]
. For standardizing encoding of
MVD, the joint multiview video model (JMVM) was de-
veloped, based on the video coding standard H.264/AVC.
In JMVM, an exquisite view-temporal prediction struc-
ture based on hierarchical B pictures (HBP) is used to
exploit not only the temporal correlations within a single
view, but also the inter-view correlations among different
views
[7]
.
The JMVM has nine macroblock modes, including
SKIP, Inter 16×16, Inter 16 × 8, Inter 8 × 16, Inter 8 × 8,
Inter 8×8 Frext, Intra 16×16, Intra 8×8, and Intra 4×4.
These modes are probed by the full search algorithm
to determine the optimal macroblock mode for the best
rate distortion (RD) performance. The mode with the
minimal RD cost is then selected as the best mode for
Inter frame coding. Unfortunately, the full search algo-
rithm is time consuming. The computational complexity
of MVD coding can be approximately expressed as O
(η×α×β×θ), where η, α, β, and θ denote the number
of videos in each view, views, average reference frames,
and macroblock modes, respectively. It is an obstacle for
putting MVD into applications. To reduce the complex-
ity of MVD coding, the fast macroblock mode selection
algorithms were proposed to accelerate the coding speed
of multiview texture video
[8,9]
. However, the fast algo-
rithms for multiview depth video so far are marginal.
This letter focuses on reducing the computational com-
plexity of multiview depth video coding. Firstly, a joint
coding scheme is proposed based on macroblock mode
similarity between the texture videos and the associated
depth videos. Then, a fast depth video coding algorithm
is presented by combining an effective prediction mech-
anism and an object boundary discriminating method.
Finally, the fast algorithm is implemented and evaluated.
Figure 1 shows an MVD-based 3D video system. In tex-
ture video and associated depth video, boundaries of ob-
jects in the scene coincide and directions of object move-
ments are also very similar. Therefore, the macroblock
mode distributions of the texture image and its associ-
ated depth image will be similar. Figures 2(a) and (b)
show the mode distributions of the texture image and the
associated depth image of a frame in Ballet test sequence.
The blocks with red, green, and blue borders denote the
macroblocks encoded with SKIP, Inter, and Intra modes.
It can be found that the macroblock modes are similar
between these two images. The similarity can be utilized
to speed up the coding process. Based on the analy-
ses above, a joint MVD coding scheme is proposed and
1671-7694/2010/020151-04
c
° 2010 Chinese Optics Letters