基于H.264的3D视频压缩算法研究

5星 · 超过95%的资源 需积分: 10 5 下载量 30 浏览量 更新于2024-07-27 收藏 2.74MB PDF 举报
“这篇硕士学位论文主要探讨了基于H.264的立体序列图像压缩算法,旨在解决3D视频系统中数据量庞大的问题。作者傅蕊在导师侯春萍的指导下,深入研究了立体视觉原理和H.264编码标准的关键技术,并提出了一种结合运动补偿预测和视差补偿预测的压缩方法。” 3D编码是一种用于处理和压缩三维图像和视频的技术,它允许观众体验到深度感知和立体效果。随着社会经济的进步和人们生活水平的提升,对图像技术尤其是立体视觉的需求日益增长。然而,3D视频系统相比传统的2D图像和视频,需要处理和传输的数据量巨大,这为存储和传输带来了挑战。 H.264是目前国际上最先进的视频压缩标准,以其高效的编码效率和良好的网络适应性而备受推崇。它采用了多种创新技术,如帧内预测、帧间预测、多参考帧和双向预测等,这些都极大地提高了编码效率,降低了码率,同时保持了图像质量。 论文中,作者傅蕊首先介绍了立体视觉的基本原理,即人类双眼通过观察物体的微小视角差异来判断距离和深度。接着,她详细阐述了H.264编码标准的关键技术,包括熵编码、多参考帧预测和双向预测等。在此基础上,傅蕊提出了一个针对立体序列图像的压缩算法。该算法以左路图像作为参考,使用H.264标准进行编码;对于右路图像,结合运动补偿预测(处理时间维度的变化)和视差补偿预测(处理空间维度的深度信息),以左路图像为参照,实现了更高效的压缩。 通过与JM工具的结合,论文对H.264的多参考帧预测、双向预测和熵编码等技术进行了深入研究,并通过仿真实验验证了提出的算法能有效压缩立体序列图像,生成高压缩比的码流和高质量的压缩图像。这不仅证明了算法的可行性和有效性,也显示了关键技术创新对提高编码效率和图像质量的显著作用。 使用H.264编码工具处理3D图像,不仅可以解决数据量大的问题,而且具有广阔的应用前景,对于3D视频的存储和传输难题提供了可能的解决方案。关键词涵盖了立体视频、视差、H.264视频压缩和视频编码等领域,表明了该研究的针对性和重要性。

3.4 Pair Interaction Feature The interaction pattern between two individuals is encoded by a spatial descriptor with view invariant relative pose encoding. Given the 3D locations of two individual detec- tions zi,zj and two pose features pi,pj, we represent the pairwise relationship using view normalization, pose co-occurrence encoding, semantic compression and a spatial histogram (see Fig. 5 for illustration). The view normalization is performed by rotating the two people in 3D space by θ with respect to their midpoint, making their connecting line perpendicular to the cam- era view point. In this step, the pose features are also shifted accordingly (e.g. if θ = 45‘, shift 1 dimension with a cycle). Then, the co-occurrence feature is obtained by building a 2-dimensional matrix in which each element (r, c) corresponds to min(pi(r), pj (c)). Although the feature is view invariant, there are still elements in the matrix that deliver the same semantic concepts (e.g. left-left and right-right). To reduce such unnecessary variance and obtain a compact representation, we perform another transformation by multiplying a semantic compression matrix Sc to the vector form of the co-occurrence feature. The matrix Sc is learned offline by enumerating all possible configurations of view points and grouping the pairs that are equivalent when rotated by 180 degrees. Finally, we obtain the pair interaction descriptor by building a spatial histogram based on the 3D distance between the two (bin centers at 0.2, 0.6, 2.0 and 6.5 m). Here, we use linear interpolation similarly to contextual feature in Sec. 3.3. Given the interac- tion descriptor for each pair, we represent the interaction feature φxx(xi,xj) using the confidence value from an SVM classifier trained on a dictionary of interaction labels Y.什么意思

2023-07-17 上传