Parallel Intra Coding for HEVC on CPU plus GPU
Platform
Juncheng Ma
1
, Falei Luo
2
, Shanshe Wang
1
, Nan Zhang
3
,Siwei Ma
1,4
Email: jcma@pku.edu.cn falei.luo@vipl.ict.ac.cn sswang@jdl.ac.cn zhangnan@ccum.edu.cn swma@pku.edu.cn
1
Institute of Digital Media & Cooperative Medianet Innovation Center, Peking University, Beijing, China
2
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
3
School of Biomedical Engineering, Capital Medical University, Beijing, China
4
Peking University Shenzhen Graduate School, Beijing, China
Abstract— In High Efficiency Video Coding (HEVC), the
intra coding performance is significantly improved due to the
recursive splitting structure and up to 35 intra prediction modes.
However, the computational complexity of intra coding increases
largely as well. In this paper, a fast intra coding scheme is
proposed based on CPU and GPU cooperation. Firstly, the intra
prediction of variable blocks is performed in parallel on multi-
cores GPU. Secondly, the intra prediction mode with minimum
Sum of Absolute Difference (SAD) cost is selected and
transmitted to the host CPU. Instead of exhaustively searching
all the intra modes in Rough Mode Decision (RMD) process, the
mode returned by the GPU is directly selected. Lastly, the
texture gradient of each coding unit (CU) is assessed during
parallel intra prediction, then used by the CPU for fast CU size
decision. Experiment results show that the proposed parallel
intra coding method achieves up to 62% complexity reduction
with acceptable coding performance loss.
Index Terms
— HEVC, intra prediction, GPU, gradient, CU
splitting
I. INTRODUCTION
High efficiency video coding (HEVC) is the latest video
coding standard developed by the Joint Collaborative Team on
Video Coding (JCT-VC), which is formed by tow
standardization teams: the ISO/IEC MPEG and ITU-T VCEG
[1]. With the newly introduced quadtree structure and richer
intra/inter prediction modes, HEVC greatly outperforms the
classic H.264/AVC standard [2] in terms of coding
performance. However, the coding complexity increment that
comes along makes it challenging for HEVC to be employed
in real-time applications.
For intra coding, the coding unit (CU) size varies from
64x64 to 8x8, and 4x4 prediction unit (PU) is used at the
largest CU depth. In rate distortion optimization (RDO)
process, the transform unit (TU) can be further partitioned to
smaller ones. Moreover, up to 35 intra prediction modes have
to be enumerated exhaustively for the optimal one. Compared
to the fixed 16x16 macroblocks and only 9 intra modes in
H.264/AVC, the intra coding of HEVC is much more time-
consuming [3].
To reduce the coding complexity of HEVC intra coding,
many fast intra prediction methods are proposed. Shen et al.
presented a fast CU size decision method for intra coding
based on texture homogeneity and spatial correlations,
achieving considerable coding time saving [4]. However, the
texture homogeneity measurement process might cause extra
complexity overhead. W. Jiang et al. considered gradient
assessment for intra coding blocks to support fast mode
decision with fixed threshold, which might be not suitable for
all test sequences [5].
In recent years, due to the rapid development of the
Graphic Processing Unit (GPU), it becomes a main trend to
use General Purpose GPU (GPGPU) for parallel speedup of
video coding. At the same time, the “Compute Unified Device
Architecture” (CUDA), published by NVIDIA [6], makes
GPU paralleling more programming-friendly. Therefore, it is
possible to implement parallel intra prediction for large scale
of coding blocks in the video sequence. However, for
parallelization in intra prediction using graphics hardware, it
is highly challenging. Specifically, there’s high reconstruction
dependence between the intra PU and its neighbour blocks,
causing frequent synchronization when dealing current and
reference samples at the same time. In this paper, an original
picture based intra coding scheme is proposed using graphic
hardware with the two following contributions as follows.
Firstly, a parallel and fast intra prediction scheme at the GPU
is proposed. Before intra coding for one slice, the best intra
prediction mode with minimum SAD cost for every possible
blocks of all CTUs is determined concurrently at the GPU,
and transmitted back to the host CPU. For each intra block,
instead of the time-consuming RMD process, the intra modes
returned by GPU along with the Most Probable Modes
(MPMs) are directly used for RDO decision. Secondly, a fast
CU splitting and pruning algorithm is proposed based on
parallel texture gradient measuring using classic Sobel
operator. With asynchronous workflow and effective thread
allocation, no extra computational overhead is brought in.
The rest of this paper is organized as follows. Section II
describes the overview of intra coding techniques in HEVC.
Section III details the proposed paralleling method for intra
prediction and gradient measuring, along with the fast mode
and CU size decision based on the GPU respectively.
Experimental results and analysis are presented in Section IV.
Finally Section V concludes this paper.
978-1-4673-7314-2/15/$31.00 ©2015 IEEE IEEE VCIP 2015