An LSTM Method for Predicting CU Splitting in
H.264 to HEVC Transcoding
Yanan We i
#
, Zulin Wang
#∗
,MaiXu
#
, Shuhao Qiao
#
#
School of Electronic and Information Engineering, Beihang University, Beijing, China
∗
Collaborative Innovation Center of Geospatial Technology, Wuhan, China
Corresponding Author: Mai Xu(maixu@buaa.edu.cn)
Abstract—For H.264 to high efficiency video coding (HEVC)
transcoding, this paper proposes a hierarchical Long Short-
Term Memory (LSTM) method to predict coding unit (CU)
splitting. Specifically, we first analyze the correlation between
CU splitting patterns and H.264 features. Upon our analysis, we
further propose a hierarchical LSTM architecture for predicting
CU splitting of HEVC, with regard to the explored H.264
features. The features of H.264, including residual, macroblock
(MB) partition and bit allocation, are employed as the input
to our LSTM method. Experimental results demonstrate that
the proposed method outperforms the state-of-the-art H.264
to HEVC transcoding methods, in terms of both complexity
reduction and PSNR performance.
Index Terms—H.264, HEVC, Transcoding, LSTM, CU splitting
I. INTRODUCTION
Transcoding is a technique which converts video stream
from one encoding into another. Alongside the evolution
of video coding standards, compression efficiency has been
gradually improved. As a result, several video coding stan-
dards (e.g., MPEG-1, MPEG-2, MPEG-4, H.263, H.264 and
high efficiency video coding (HEVC)) co-exist in a certain
range of applications, which makes transcoding desirable.
Video transcoding is a proper solution that bridges the gap
in sharing multimedia contents across various types o f mul-
timedia devices (e.g., television, computer, laptop, tablet and
smart phone). Therefore, transcoding has attracted increasing
attention [1].
In the past two decades, many transcoding algorithms have
been proposed with promising performance. However, the lat-
est video coding standard HEVC, which achieves outstanding
coding efficiency at the cost of large computational com-
plexity, still challenges the existing transcoding algorithms.
As the state-of-the-art video coding standard, HEVC offers
excellent rate-distortion performance and supports higher res-
olution video coding. As a result, a large number of videos
are encoded by HEVC over the past few y ears. Meanwhile,
more and more terminals tend to adopt this new standard. On
the other hand, extensive video streams encoded by previous
H.264 standard need to be transcoded into HEVC domain. To
ϭϬϵĨƌĂŵĞ
ϭϭϬĨƌĂŵĞ
ϭϭϯĨƌĂŵĞ
ϭϭϳĨƌĂŵĞ
ϭϮϭĨƌĂŵĞ
ϭϲϵĨƌĂŵĞ
ϭϳϬĨƌĂŵĞ
ϭϳϯĨƌĂŵĞ
ϭϳϳĨƌĂŵĞ
ϭϴϭĨƌĂŵĞ
^ĂŵĞhƐƉůŝƚƚŝŶŐ
Fig. 1. Two e xamples of the temporal similarity of CU partition.
this end, efficient transcoding from H.264 to HEVC receives
a great deal of research effort.
In fact, H.264 to HEVC transcoding can be accomplished
by a fully H.264 decoding process and then a fully HEVC
encoding process. However, such procedures result in in-
efficiency as HEVC encoding is rather time-consuming. In
particular, coding tree unit (CTU) pa rtition of HEVC takes up
high computational time [2], as all possible splitting patterns
of coding unit (CU) need to be traversed for rate-distortion
optimization. Thus, it is important to predict CU partition
of HEVC according to H.264 bitstreams, when designing an
efficient transcoding method. The methods for H.264 to HEVC
transcoding can be divided into two categories: either heuristic
or data-driven. Heuristic methods normally leverage or extract
some specific knowledge in compressed bitstream, combining
with human knowledge, to accomplish the transcoding from
H.264 to HEVC. For example, in [3], the variance of motion
vectors (MVs) of four H.264 macroblocks (MBs) is used
to explore the possibility of merging to form larger CU in
HEVC. Mor a et al. [4] applied motion similarity of H.264
MBs to build a f usion map, which is u sed to limit the depth
of CU in HEVC code d frames. Compared with heuristic
methods, data-driven methods make full use of training data
to accomplish CU splitting in H.264 to HEVC transcoding,
which achieves better performance than heuristic methods.
In [2], [5], [6], [7], linear discriminant is applied to map
the MB in H.264 to 64 × 64 or 32 × 32 CUs in HEVC.
Decision tree is utilized in [8] for fast CU splitting d ecision
during H.264 to HEVC transcoding, in light of a mining
978-1-5386-0462-5/17/$31.00 ©2017 IEEE.
VCIP 2017, Dec. 10 – 13, 2017, St Petersburg, U.S.A.