ARTICLE IN PRESS
JID: NEUCOM [m5G; December 10, 2019;11:50 ]
Neurocomputing xxx (xxxx) xxx
Contents lists available at ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Video intra prediction using convolutional encoder decoder network
R
Zhipeng Jin
a
,
∗
, Ping An
b
,
∗
, Liquan Shen
b
a
School of Communication and Information Engineering, Shanghai University, Shanghai 2004 4 4, China
b
Jiaxing Vocational and Technical College, Jiaxing 314036, China
a r t i c l e i n f o
Article history:
Received 11 April 2018
Revised 8 December 2018
Accepted 1 February 2019
Available online xxx
Keywords:
Video coding
Intra prediction
Image inpainting
Convolutional encoder-decoder network
(CED)
High Efficiency Video Coding (HEVC)
a b s t r a c t
Intra prediction is an effective method for video coding to remove the spatial redundancy of content.
Classical intra prediction method usually creates a prediction block by extrapolating the encoded pixels
surrounding the target block. However, existing methods cannot guarantee the prediction efficiency for
rich textural structure, especially when weak spatial correlation exists between the target block and refer-
ence pixels. To remedy this issue, this paper proposes a novel intra prediction method via convolutional
encoder-decoder network, which we term IPCED. IPCED can learn and extract the internal representa-
tion of reference blocks, and progressively generate a prediction block from this representation. IPCED is
a data-driven method, which represents an improvement over hand-crafted methods, and is capable of
improving the accuracy of intra prediction. Extensive experimental results demonstrate that IPCED can
generate higher-quality intra prediction results, achieves 3.41%, 3.07% and 3.44% bitrate saving for the
Y/Cb/Cr channel compared with HEVC baseline, which is significantly beyond existing methods.
©2019 Elsevier B.V. All rights reserved.
1.
Introduction
Intra prediction methods play an important role in current
state-of-the-art video coding standards [1] , as they provide an
efficient solution to reduce signal energy by prediction from
spatial neighboring encoded pixels. In order to capture finer edge
directions presented in natural images, High Efficiency Video
Coding (HEVC) employs 35 intra prediction modes, which include
planar mode, DC mode, and 33 angular prediction modes [2] .
Furthermore, in the developing Joint Exploration Model (JEM) [3] ,
the number of angular prediction modes has been extended to
65. This kind of fine-grained modes can provide more accurate
prediction when compared with the intra prediction in H.264/AVC,
in which there are only 9 modes [4] .
Video intra prediction is a well-studied and challenging task,
and its classical method is to create a prediction block by extrapo-
lating the reference pixels surrounding the target block, as shown
in Fig. 1 . For angular prediction, each pixel in the current block
will be projected to the nearest reference line along the angular
direction, and the projected pixel is used as the prediction. A linear
R
This work was supported in part by the National Natural Science Foundation of
China under Grants 61571285 and 61801006 , and Shanghai Science and Technology
Commission under Grant 17DZ2292400 and 18XD1423900 , and Zhejiang Provincial
Natural Science Foundation of China under Grant No. LGF20F020 0 03.
∗
Corresponding authors.
E-mail addresses: 364043283@qq.com (Z. Jin), anping@shu.edu.cn (P. An),
jsslq@shu.edu.cn (L. Shen).
interpolation filter with 1/32 pixel accuracy is used to generate the
reference line. And, the filter coefficient is the inverse proportion of
the two distances between the projected fraction position and its
two adjacent integer positions. In essence, the angular prediction
in HEVC is a copying based process with the assumption that im-
age content follows a pure direction of propagation. Besides, for DC
mode, the prediction is the average of all the reference pixels. For
planar mode, a bi-linear interpolation is used to create a predic-
tion block. However, all these modes together are still too simple
to fully characterize the complex non-linear relationship between
the reference pixels and the target block.
There are many works to further improve intra prediction ef-
ficiency. Kamisli et al. [5,6] models the correlation between adja-
cent pixels as a first order 2D Markov process, where each pixel
is predicted by linearly weighing several adjacent pixels. Lai et al.
[7] propose an error diffused intra prediction algorithm for HEVC.
In addition, Chen et al. [8] incorporating ordered dither technique
into intra prediction instead of error diffusion, to reduce compu-
tational complexity. Chen et al. [9] propose a copying-based im-
proving intra prediction method. Lucas et al. [10] propose a intra
prediction framework based on adaptive linear filters with sparsity
constraints. Dias et al. [11] propose an improved combined intra
prediction (CIP) method, which both use the reference pixels and
the prediction pixels generated by the intra prediction modes. Li
et al. [12] propose a piece-wise linear projection method based on
canonical correlation analysis (CCA), to better exploit the local spa-
tial correlations. However, these aforementioned works are single
https://doi.org/10.1016/j.neucom.2019.02.064
0925-2312/© 2019 Elsevier B.V. All rights reserved.
Please cite this article as: Z. Jin, P. An and L. Shen, Video intra prediction using convolutional encoder decoder network, Neurocomputing,
https://doi.org/10.1016/j.neucom.2019.02.064