DING et al.: CAPA WITH INTER-VIEW MODE DECISION FOR MVC 1555
Fig. 4. Illustration of an MVC structure. The arrows represent the prediction directions, and the gray regions are the search windows for
B
.
is no unique coding structure which is appropriate for every
video sequence. The selection of coding structures highly relies
on the video contents and the corresponding camera setup.
Fig. 4 shows the illustration of one coding structure, where
the prediction directions of ME and DE are represented by
arrows. For convenience of interpretation, the view channel
is regarded as the left channel, and the view channel is
regarded as the right channel. There are two types of compen-
sated blocks. They are the motion-compensated blocks and the
disparity-compensated blocks, which are illustrated as
and in Fig. 4, respectively. According to Lagrangian
mode decision, the best type of compensated blocks is selected.
For each macroblock in the current frame, the costs of ME and
DE are computed by (2)–(3), as shown at the bottom of the pre-
vious page, where
and are the minimum costs
of motion-compensated and disparity-compensated blocks,
respectively.
is the current block in the right channel.
is a block of the reference frame in the right channel. is a
block of the reference frame in the left channel.
and are the search windows for the current block
. After ME and DE, the best matched blocks in the two
search windows can be derived. Then the final prediction mode
can be decided by selecting the one with lower cost.
There are several prediction modes defined in H.264/AVC
standard. In our analysis of mode distribution, the prediction
modes are classified into four categories, that is, INTER_ME,
INTER_DE, INTRA, and SKIP modes. As shown in Fig. 5, the
current macroblock can be predicted by ME from the reference
frame in the same view channel, where INTER_ME mode can
remove temporal redundancy. On the other hand, INTER_DE
mode can remove inter-view redundancy by DE from the ref-
erence frame in the neighboring view channel. If the inter pre-
diction cannot predict well, INTRA mode can predict the cur-
rent macroblock by utilizing boundary pixels in the neighboring
macroblocks. Moreover, SKIP mode utilizes the motion vector
predictor to predict the current macroblock without performing
inter prediction. It not only reduces the computational com-
plexity but also saves the coding bits for motion vectors. The
mode decision between INTER_ME and INTER_DE is closely
Fig. 5. Current macroblock can be predicted by various prediction modes.
These prediction modes are classified into four categories: INTER_ME,
INTER_DE, INTRA, and SKIP modes.
related to video contents [19]. Therefore, the mode classifica-
tion can reflect the features of video contents.
According to the classification, Fig. 6 shows the mode
distribution with various quantization parameters (QPs). It
shows that the distribution of INTER_ME and SKIP mode has
larger variation with various QPs. SKIP mode is the dominant
mode at lower bit-rates (high QPs), while INTER_ME and
INTRA modes are dominant at higher bit-rates (low QPs). The
distribution is similar to that in mono-view video coding. The
main difference between mono- and multiview video coding is
INTER_DE mode, which is used in 5–10% macroblocks in a
frame. The percentage of INTER_DE-mode macroblocks relies
on the video contents. As shown in Fig. 6(c), the moving ob-
jects are usually predicted by INTER_DE because INTER_ME
cannot predict well in the areas.
It is observed that certain types of macroblocks which are
originally encoded by INTRA mode in mono-view video
coding are encoded by INTER_DE mode in MVC. Fig. 7
shows the statistics of the ratio that INTRA mode is replaced
Authorized licensed use limited to: National Taiwan University. Downloaded on January 16, 2009 at 00:18 from IEEE Xplore. Restrictions apply.