3D视频编码优化：内容感知预测算法与视图间模式决策

需积分: 7 10 浏览量更新于2024-09-16 收藏 1.78MB PDF 举报

"本文提出了一种基于内容感知的预测算法（CAPA）与视图间模式决策相结合的方法，用于多视图视频编码（MVC），旨在解决3D视频压缩中的高数据带宽需求问题。该算法利用视差估计来寻找不同视图间的对应块，从而实现编码信息的有效共享和重用，降低了MVC系统的内存带宽和计算复杂性需求。" 在多视图视频编码（MVC）中，由于3D视频的数据带宽需求极高，因此需要高效的压缩技术。MVC系统相对于单视图视频编码系统，对内存带宽和计算复杂性的要求更高。为了应对这一挑战，文章提出了内容感知预测算法（Content-Aware Prediction Algorithm，简称CAPA）并结合视图间模式决策，以实现更有效的编码效率。 CAPA算法的核心是利用视差估计（Disparity Estimation, DE）技术。视差估计是3D视频编码中的关键步骤，它通过分析不同视图之间的像素对应关系来估算视差，即两个视图中相同场景对象之间的水平偏移。这种视差信息有助于确定哪些视图的块可以作为其他视图的预测参考。在CAPA算法中，首先通过对不同视图的块进行视差估计，找到匹配的参考块。然后，算法利用这些匹配块的编码信息，如率失真成本（Rate-Distortion Cost）、编码模式和运动矢量等，进行共享和重用。率失真成本是衡量编码质量和码率之间权衡的一个重要指标，而编码模式和运动矢量则影响着视频压缩的效率和质量。通过这种方式，CAPA算法减少了需要独立处理的块数量，显著降低了计算复杂性。同时，由于参考了已编码的视图信息，预测精度得以提高，从而降低了编码后的视频失真，提高了压缩效率。此外，这种基于内容的策略也使得算法能够适应视频内容的变化，提供更加灵活和适应性的编码策略。这篇IEEE Transactions on Multimedia上的论文介绍了一种创新的MVC压缩方法，通过内容感知和视图间信息共享，解决了多视图视频编码中的性能瓶颈，为3D视频的高效传输和存储提供了新的解决方案。这种方法不仅对学术研究具有指导意义，也为实际的3D视频编码应用提供了重要的技术参考。

DING et al.: CAPA WITH INTER-VIEW MODE DECISION FOR MVC 1555

Fig. 4. Illustration of an MVC structure. The arrows represent the prediction directions, and the gray regions are the search windows for

is no unique coding structure which is appropriate for every

video sequence. The selection of coding structures highly relies

on the video contents and the corresponding camera setup.

Fig. 4 shows the illustration of one coding structure, where

the prediction directions of ME and DE are represented by

arrows. For convenience of interpretation, the view channel

is regarded as the left channel, and the view channel is

regarded as the right channel. There are two types of compen-

sated blocks. They are the motion-compensated blocks and the

disparity-compensated blocks, which are illustrated as

and in Fig. 4, respectively. According to Lagrangian

mode decision, the best type of compensated blocks is selected.

For each macroblock in the current frame, the costs of ME and

DE are computed by (2)–(3), as shown at the bottom of the pre-

vious page, where

and are the minimum costs

of motion-compensated and disparity-compensated blocks,

respectively.

is the current block in the right channel.

is a block of the reference frame in the right channel. is a

block of the reference frame in the left channel.

and are the search windows for the current block

. After ME and DE, the best matched blocks in the two

search windows can be derived. Then the ﬁnal prediction mode

can be decided by selecting the one with lower cost.

There are several prediction modes deﬁned in H.264/AVC

standard. In our analysis of mode distribution, the prediction

modes are classiﬁed into four categories, that is, INTER_ME,

INTER_DE, INTRA, and SKIP modes. As shown in Fig. 5, the

current macroblock can be predicted by ME from the reference

frame in the same view channel, where INTER_ME mode can

remove temporal redundancy. On the other hand, INTER_DE

mode can remove inter-view redundancy by DE from the ref-

erence frame in the neighboring view channel. If the inter pre-

diction cannot predict well, INTRA mode can predict the cur-

rent macroblock by utilizing boundary pixels in the neighboring

macroblocks. Moreover, SKIP mode utilizes the motion vector

predictor to predict the current macroblock without performing

inter prediction. It not only reduces the computational com-

plexity but also saves the coding bits for motion vectors. The

mode decision between INTER_ME and INTER_DE is closely

Fig. 5. Current macroblock can be predicted by various prediction modes.

These prediction modes are classiﬁed into four categories: INTER_ME,

INTER_DE, INTRA, and SKIP modes.

related to video contents [19]. Therefore, the mode classiﬁca-

tion can reﬂect the features of video contents.

According to the classiﬁcation, Fig. 6 shows the mode

distribution with various quantization parameters (QPs). It

shows that the distribution of INTER_ME and SKIP mode has

larger variation with various QPs. SKIP mode is the dominant

mode at lower bit-rates (high QPs), while INTER_ME and

INTRA modes are dominant at higher bit-rates (low QPs). The

distribution is similar to that in mono-view video coding. The

main difference between mono- and multiview video coding is

INTER_DE mode, which is used in 5–10% macroblocks in a

frame. The percentage of INTER_DE-mode macroblocks relies

on the video contents. As shown in Fig. 6(c), the moving ob-

jects are usually predicted by INTER_DE because INTER_ME

cannot predict well in the areas.

It is observed that certain types of macroblocks which are

originally encoded by INTRA mode in mono-view video

coding are encoded by INTER_DE mode in MVC. Fig. 7

shows the statistics of the ratio that INTRA mode is replaced

Authorized licensed use limited to: National Taiwan University. Downloaded on January 16, 2009 at 00:18 from IEEE Xplore. Restrictions apply.

剩余11页未读，继续阅读

apooqa

粉丝: 1
资源: 3

3D视频编码优化：内容感知预测算法与视图间模式决策

Seam Carving for Content-Aware Image Resizing（基于内容缩放图片的方法）

SGPPI: structure-aware prediction of protein–protein interaction

NARRMDA negative-aware and rating-based recommendation algorithm for miRNA–disease association prediction

A Connectivity-Aware Approximation Algorithm for Relay Node Placement in Wireless Sensor Networks

A QoS-aware scheduling algorithm for combined-input-crosspoint-queued switch (2012年)

Patch-Based-Image-Warping-for-Content-Aware-Retargeting

Social-aware mobile user location prediction algorithm in participatory sensing systems

Detail-Preserving and Content-Aware Variational Multi-View Stereo Reconstruction

Time-Aware-Link-Prediction:时间感知链路预测

Seam Carving for Content-Aware Image Resizing

最新资源