Effective H.264/AVC to HEVC Transcoder based on
Prediction Homogeneity
Fei
yang Zheng, Zhiru Shi, Xiaoyun Zhang, Zhiyong Gao
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
{zhfysheep, zhiru.shi, xiaoyun.zhang, zhiyong.gao}@sjtu.edu.cn
Abstract—The new video coding standard, High Efficiency
Video Coding (HEVC), has been established to succeed the
widely used H.264/AVC standard. However, an enormous
amount of legacy content is encoded with H.264/AVC. This
makes high performance AVC to HEVC transcoding in great
need. This paper presents a fast transcoding algorithm based on
residual and motion information extracted from H.264 decoder.
By exploiting these side information, regions’ homogeneity
characteristic are analysed. An efficient coding unit (CU) and
prediction unit (PU) mode decision strategy is proposed combing
regions’ prediction homogeneity and current encoding
information. The experimental results show that the proposed
transcoding scheme can save up to 55% of encoding time with
negligible loss of coding efficiency, when compared to that of the
full decoding and full encoding transcoder.
Index Terms
—HEVC, transcoding, fast partition decision,
residual homogeneity, motion homogeneity.
I. INTRODUCTION
H.264/AVC video coding standard has achieved great
success and been widely used for video streaming and storage.
Meanwhile, HEVC standard was standardized in 2013 to meet
the growing demands for ultra-high-definition videos
applications [1]. With more than 50% of encoding gains,
HEVC is expected to be the successor to H.264/AVC in the
near future. The demand for transcoding is rising, since large
amount of content encoded in H.264 can achieve better rate-
distortion performance by converting to HEVC format.
The H.264 to HEVC format transcoding is heterogeneous
transcoding, which mainly changes the bitstream format and
reduces the bitrate. Trivial solution is achieved by completely
decoding the source stream, and then completely re-encoding
it in the target format. This full decoding and full encoding
(FDFE) method can maximum the Rate-Distortion
performance, but not efficiency in terms of computational
complexity. This is the main problem faced by practical
H.264/AVC to HEVC transcoding service.
For transcoding, HEVC encoder is still using the block-
based hybrid coding framework, which is similar to
H.264/AVC. Many encoding sections are sharing similar
techniques, such as motion estimation, transform, quantization,
and entropy coding. It makes the fast H.264/AVC to HEVC
transcoding meaningful and possible. For the transcoder,
much information can be extracted from H.264/AVC decoder
to reduce HEVC encoding complexity and improve efficiency.
There are many features in the source bitstream which can be
used, such as motion vector (MV), mode information,
transform coefficients, CBP index and pixel residuals. The
key problem for fast transcoding is that how to effectively
explore these information to achieve a good trade-off between
RD performance and complexity reduction.
In the early works of transcoding, many solutions are
focused on making full use of motion vector information.
These solutions attempt to reduce the complexity of the ME
module in the codec. Motion vectors mapping and motion
refinement are applied to improve the motion estimation (ME)
[2] [3]. It is believed that fast ME and MV refinement make
less contribution in fast AVC-to-HEVC transcoding. This is
because the main complexity increasing of HEVC encoder is
coming from CU, PU and TU mode decision, which involve
large number of RDO computation. In this case, more
attentions are dropped on CU and PU partition mode decision.
To further reduce the transcoding complexity, a power
spectrum based rate-distortion optimization (PS-RDO) model
[4] is proposed for transcoding inter pictures. The cost of a
motion vector in the transcoder is calculated by MV variation
and power-spectrum of the prediction result. The PS-RDO
model is used to determine the CU partition, PU mode
decision and MV estimation. In [5], a metric named MV
variance distance (MVVD) is calculated, which is defined as
the square root of the variance of all collocated H.264/AVC
MVs in the CU area. Two thresholds, T
low
and T
high
, are
employed for the purpose of deciding whether the CU will be
split and which PU partitions will be searched. In [6], dynamic
thresholding is used to overcome the limitations of fixed
thresholds. Meanwhile another algorithm using content
modelling with linear discriminant functions (LDFs) is
proposed in [6]. Several features from H.264/AVC decoder
are gathered to conduct machine learning, such as, number of
H.264/AVC partitions, MV variance, non-zero coefficients
number and DCT coefficients energy. However, these
algorithms are not able to process the CUs containing intra
prediction blocks since MV information does not exist.
To overcome this disadvantage and fully use the prediction
information, a hybrid fast transcoding scheme is proposed in
this paper. A concept of prediction homogeneous
characteristic is raised not only to help making CU split
decision, but also to support fast PUs partition decision. By
exploring MV and residual information from H.264/AVC
decoder, the prediction homogeneity is determined for current
inter prediction coded block. The unnecessary CU and PU
partition candidates can be excluded efficiently. Since the
residual homogeneous can be obtained for both inter and intra
blocks, the proposed algorithm has a wider applicability.
978-1-4799-6139-9/14/$31.00 ©2014 IEEE