CPU与GPU协同的HEVC高效帧内编码优化

143 浏览量更新于2024-08-27 收藏 131KB PDF 举报

本文探讨了在高效率视频编码(HEVC)中，由于采用了递归分割结构和多达35种的内插预测模式，帧内编码性能得到了显著提升。然而，这种改进伴随着计算复杂度的大幅增加，特别是在单个处理单元上。针对这一问题，研究人员提出了一个基于CPU和GPU协同工作的快速帧内编码方案。首先，该方案利用GPU的多核心并行处理能力，对可变大小的块进行内插预测。GPU的优势在于其并行计算能力强，能够同时处理多个预测任务，极大地提高了处理效率。这一步通过并行化处理，减少了编码时间，使得系统能够在保持高图像质量的同时，优化了硬件资源的利用。其次，为了进一步降低计算负担，文中提出了一种策略，即在GPU上执行预测后，选择具有最小绝对差（Sum of Absolute Difference, SAD）成本的内插模式。SAD是衡量预测误差的一个常用指标，选择最小SAD的模式可以减少后续处理中的像素误差，从而提升编码效率。这些最优预测结果被传输回主机CPU，CPU负责决策和处理其他复杂的编码逻辑，如熵编码和剩余变换等。此外，本文还可能讨论了如何在CPU和GPU之间有效地进行数据传输和同步，以及如何通过软件层面的优化来平衡任务分配，以避免性能瓶颈。同时，考虑到HEVC的编码效率与硬件性能之间的紧密关系，可能还涉及了针对不同CPU和GPU架构的优化策略，以确保在各种平台上都能实现高效且兼容的编码性能。这篇研究论文主要关注的是在现代高性能计算平台（如CPU和GPU）上，通过并行技术和智能策略优化，提高HEVC的帧内编码效率，以应对不断增长的视频处理需求。这种优化对于视频编码标准的实践应用和未来多媒体技术的发展具有重要意义。

Parallel Intra Coding for HEVC on CPU plus GPU

Platform

Juncheng Ma

, Falei Luo

, Shanshe Wang

, Nan Zhang

,Siwei Ma

1,4

Email: jcma@pku.edu.cn falei.luo@vipl.ict.ac.cn sswang@jdl.ac.cn zhangnan@ccum.edu.cn swma@pku.edu.cn

Institute of Digital Media & Cooperative Medianet Innovation Center, Peking University, Beijing, China

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

School of Biomedical Engineering, Capital Medical University, Beijing, China

Peking University Shenzhen Graduate School, Beijing, China

Abstract— In High Efficiency Video Coding (HEVC), the

intra coding performance is significantly improved due to the

recursive splitting structure and up to 35 intra prediction modes.

However, the computational complexity of intra coding increases

largely as well. In this paper, a fast intra coding scheme is

proposed based on CPU and GPU cooperation. Firstly, the intra

prediction of variable blocks is performed in parallel on multi-

cores GPU. Secondly, the intra prediction mode with minimum

Sum of Absolute Difference (SAD) cost is selected and

transmitted to the host CPU. Instead of exhaustively searching

all the intra modes in Rough Mode Decision (RMD) process, the

mode returned by the GPU is directly selected. Lastly, the

texture gradient of each coding unit (CU) is assessed during

parallel intra prediction, then used by the CPU for fast CU size

decision. Experiment results show that the proposed parallel

intra coding method achieves up to 62% complexity reduction

with acceptable coding performance loss.

Index Terms

— HEVC, intra prediction, GPU, gradient, CU

splitting

I. INTRODUCTION

High efficiency video coding (HEVC) is the latest video

coding standard developed by the Joint Collaborative Team on

Video Coding (JCT-VC), which is formed by tow

standardization teams: the ISO/IEC MPEG and ITU-T VCEG

[1]. With the newly introduced quadtree structure and richer

intra/inter prediction modes, HEVC greatly outperforms the

classic H.264/AVC standard [2] in terms of coding

performance. However, the coding complexity increment that

comes along makes it challenging for HEVC to be employed

in real-time applications.

For intra coding, the coding unit (CU) size varies from

64x64 to 8x8, and 4x4 prediction unit (PU) is used at the

largest CU depth. In rate distortion optimization (RDO)

process, the transform unit (TU) can be further partitioned to

smaller ones. Moreover, up to 35 intra prediction modes have

to be enumerated exhaustively for the optimal one. Compared

to the fixed 16x16 macroblocks and only 9 intra modes in

H.264/AVC, the intra coding of HEVC is much more time-

consuming [3].

To reduce the coding complexity of HEVC intra coding,

many fast intra prediction methods are proposed. Shen et al.

presented a fast CU size decision method for intra coding

based on texture homogeneity and spatial correlations,

achieving considerable coding time saving [4]. However, the

texture homogeneity measurement process might cause extra

complexity overhead. W. Jiang et al. considered gradient

assessment for intra coding blocks to support fast mode

decision with fixed threshold, which might be not suitable for

all test sequences [5].

In recent years, due to the rapid development of the

Graphic Processing Unit (GPU), it becomes a main trend to

use General Purpose GPU (GPGPU) for parallel speedup of

video coding. At the same time, the “Compute Unified Device

Architecture” (CUDA), published by NVIDIA [6], makes

GPU paralleling more programming-friendly. Therefore, it is

possible to implement parallel intra prediction for large scale

of coding blocks in the video sequence. However, for

parallelization in intra prediction using graphics hardware, it

is highly challenging. Specifically, there’s high reconstruction

dependence between the intra PU and its neighbour blocks,

causing frequent synchronization when dealing current and

reference samples at the same time. In this paper, an original

picture based intra coding scheme is proposed using graphic

hardware with the two following contributions as follows.

Firstly, a parallel and fast intra prediction scheme at the GPU

is proposed. Before intra coding for one slice, the best intra

prediction mode with minimum SAD cost for every possible

blocks of all CTUs is determined concurrently at the GPU,

and transmitted back to the host CPU. For each intra block,

instead of the time-consuming RMD process, the intra modes

returned by GPU along with the Most Probable Modes

(MPMs) are directly used for RDO decision. Secondly, a fast

CU splitting and pruning algorithm is proposed based on

parallel texture gradient measuring using classic Sobel

operator. With asynchronous workflow and effective thread

allocation, no extra computational overhead is brought in.

The rest of this paper is organized as follows. Section II

describes the overview of intra coding techniques in HEVC.

Section III details the proposed paralleling method for intra

prediction and gradient measuring, along with the fast mode

and CU size decision based on the GPU respectively.

Experimental results and analysis are presented in Section IV.

Finally Section V concludes this paper.

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38710127

粉丝: 5
资源: 921

CPU与GPU协同的HEVC高效帧内编码优化

An Overview of Tiles in HEVC.pdf

NGcodec谈FPGA编码在HEVC和AV1上现状与未来.docx

HEVC Deblocking Filter

行业分类-设备装置-HEVC解码器在异构计算平台上的设计及节能算法研究.zip

基于KLT和HEVC的嵌入式高光谱图像实时压缩算法设计.pdf

基于FPGA的HEVC后处理CNN硬件加速器研究.pdf

网络游戏-用于天地网络视频通信的多路视频并行显示方法.zip

GPU_Player

Linux GPU 視頻解碼實例

基于通用可编程GPU的视频编解码器——架构、算法与实现

最新资源