Block-Based Parallel Intra Prediction Scheme for
HEVC
Jie Jiang
Institute of Intelligence Control and Image Engineering, Xidian University, Xi’an, China
Email: supergirl.jj@gmail.com
Baolong Guo and Wei Mo
Institute of Intelligence Control and Image Engineering, Xidian University, Xi’an, China
Email: {blguo, wmo}@xidian.edu.cn
Kefeng Fan
Guilin University of Electronic Technology, Guilin 541004, China
Email:fankf@cesi.ac.cn
Abstract — Advanced video coding standards have become
widely deployed in numerous products, such as multimedia
service, broadcasting, mobile television, video conferences,
surveillance systems and so on. New compression techniques
are gradually included in video coding standards so that a 50%
compression rate reduction is achievable every ten years.
However, dramatically increased computational complexity is
one of the many problems brought by the trend. With recent
advancement of VLSI (the Very Large Scale Integration)
semiconductor technology contributing to the emerging digital
multimedia word, this paper intends to investigate efficient
parallel architecture for the emerging high efficiency video
coding (HEVC) standard to speed up the intra coding process,
without any prediction modes ignored. Parallelism is achieved
by limiting the reference pixels of the 4 × 4 subblocks, allowing
the subblocks to use different direction modes to predict the
residuals. Experimental implementations of the proposed
algorithm are demonstrated by using a set of video test
sequences that are widely used and freely available. The results
show that the proposed algorithm can achieve a satisfying intra
parallelism without any significant performance lose.
Index Terms — HEVC, intra coding, parallel architecture,
multiple directions.
I. INTRODUCTION
Continuous emergence of video coding standards and the
growth in development and implementation technology for
them have undoubtedly created a completely new world of
multimedia. So far, contributions to video coding
technology have mainly focused on improving coding
efficiency. The challenges remain: not only to find efficient
coding algorithms which require high performance but also
to speed up the coding process.
The ongoing video coding standard, High Efficiency
Video Coding (HEVC) [1], is getting more attention due to
its high compression efficiency. However, the
computational complexity of HEVC would be 2-10 times
higher than its counterpart, which is considered an obstacle
to implement it in real-time. Therefore, many research
works focus on how to reduce the computational complexity.
The purpose of these works is to design and evaluate the
performance of new methods to reduce encoder complexity,
while keeping the quality of reconstructed video sequences
for intra coding. The works generally fall into two
categories.
1. Fast mode decision approaches with early termination
using adaptive thresholds or optimized Lagrangian rate
distortion optimization (RDO) function [2-4].
2. Parallel architectures to speed up the intra prediction
process [5-14].
With recent advancement of VLSI (the Very Large Scale
Integration) semiconductor technology contributing to the
emerging digital multimedia word, research on parallel
architectures gets more attention. In this paper, we focus on
the second case, and present a block based parallel
architecture to speed up the intra prediction for HEVC.
The remaining parts of this paper are organized as follows:
Section II reviews the state of art within the field of parallel
architectures. Section III introduces the spatial prediction in
HEVC. Section IV presents the proposed scheme, including
2X parallel intra prediction and its expansion to 4X
parallelism. Experimental results are presented within
Section V. Finally, we conclude this paper in section VI.
II. RELATED WORK
The main image and intra frame of video compression
extensively adopts the block-based structure from prediction
and transform to entropy coding, where the coding of one
block is dependent on the availability of its left, upper-left,
and upper-right blocks. Such a highly dependent structure is
not quite suitable for parallelization, especially for ASIC
(Application Specific Integrated Circuit) solutions. Even so,
when dual-core and quad-core computers are available,
there are still many efforts on parallelizing the encoding and
decoding from different aspects, as described below.
1. GOP (Group of Pictures) approaches: Barbosa [5] and
Vander [6] propose to partition a sequence into some GOPs.
The correlation between GOPs is low, and it can not only
limit error propagation, but also support parallel coding
processing. However, it needs to get the data of all the
pictures in a GOP before parallelism. When the GOP has too
JOURNAL OF MULTIMEDIA, VOL. 7, NO. 4, AUGUST 2012
doi:10.4304/jmm.7.4.289-294