1051-8215 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2019.2929317, IEEE
Transactions on Circuits and Systems for Video Technology
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
CU without analyzing the actual CU content, the proposed
DeepSCC jointly analyzes the optimal mode maps of the
collocated CTU and the content of the current CTU to avoid
error propagation. 4) The proposed DeepSCC contains many
trainable parameters and learns extensive features, so that it
directly performs the mode decision for Intra, IBC, and PLT
rather than the simple CU type classification in [20], [22]–[24].
As a result, the decision for IBC and PLT modes can be
different, and many SCBs only check one mode from IBC and
PLT to further reduce the computational complexity.
The rest of this paper is organized as follows. Section II
presents the review and analysis of intra prediction in SCC.
Section III presents the proposed fast network DeepSCC. The
experimental results are presented in Section IV to verify the
performance of the proposed DeepSCC. Finally, Section V
concludes the paper.
II. REVIEW AND ANALYSIS OF INTRA PREDICTION IN SCC
A. Review on Intra Prediction in SCC
A CTU is a basic processing unit in SCC. To find the optimal
CTU coding structure, a CTU is recursively partitioned into
CUs in four different depth levels, i.e., depth level d∈{0,1,2,3}.
As shown in Fig. 1, a CTU of 64×64 pixels is partitioned into
four CUs of 32×32 pixels, and then each CU of 32×32 pixels
is further partitioned into four smaller CUs, until CUs of 8×8
pixels are reached. Therefore, a CTU contains 85 CU partitions
(1 + 4 + 16 + 64). In each CU, an exhaustive mode search is
performed to find its sub-optimal mode, as shown in Fig. 2.
Besides the Intra mode in HEVC that is used to encode the
traditional NIBs, SCC additionally adopts two new modes, IBC
and PLT, to improve the coding efficiency of SCBs. IBC mode
is developed based on the observation that there are many
repeated patterns for SCBs in the same frame. When encoding
the current CU, IBC searches in the reconstructed region of the
current frame to find the best-matched block for it, and the
location of the best-matched block is denoted by a block vector.
PLT mode is developed based on the observation that a SCB
usually contains the limited number of distinct colors. PLT
predicts a palette table based on the previously coded CUs,
which contains several representative sample values. Then, an
index map is sent to the decoder to denote the position of each
representative sample value in a CU. In the exhaustive mode
search, a Lagrange RD cost J
x
is calculated for a mode x
J
x
= D
x
+ × R
x
(1)
where x∈{Intra, IBC, PLT}, is a Lagrange multiplier, D
x
and
R
x
are the distortion and bit cost of the CU coded with a mode
x. The sub-optimal mode for a CU is selected as the one with
the smallest value of J
x
. After calculating the RD cost J
x
, the
optimal CTU coding structure is selected as the one with the
smallest value of the total RD cost. Then the corresponding sub-
optimal modes of those CUs become their optimal modes, and
they are involved in the final encoding bitstream.
As shown in Fig. 1, a CTU contains 85 CU partitions, and
each CU needs to check three mode candidates, except that CUs
only check IBC and Intra modes in the depth level of 0.
Therefore, the RD cost J
x
is calculated for 254 mode candidates
in a CTU (1×2 + 84×3). Although the hierarchical CTU
partitioning structure and the exhaustive mode search achieve
the best coding performance, it brings significant computational
burden to a SCC encoder. Since only parts of those modes are
involved in the final encoding bitstream, which are from 1 to
64, precise prediction of the optimal modes in a CTU leads to
great encoding time reduction.
B. Analysis of Intra Prediction in SCC and Motivation of
DeepSCC
To analyze the intra prediction in SCC, experiments were
performed for sequences in YUV 4:4:4 format based on the
HEVC-SCC reference software, Screen Content Model version
8.3 (SCM-8.3) [25]. The testing sequences were selected by the
experts in the JCT-VC group, and they were encoded with
quantization parameters (QPs) of 22, 27, 32, and 37 using SCM-
8.3 under All Intra (AI) configuration defined in the common
test conditions (CTC) [26]. Those sequences are classified into
four categories according to their content: text and graphics with
motion (TGM), mixed content (M), animation (A) and camera-
captured content (CC). Fig. 3 shows the examples of testing
sequences in four categories. Since sequences in TGM and M
show mixed content of NIBs and SCBs, while sequences in A
and CC only contain NIBs, we will show the average results for
sequences in TGM+M and A+CC in the following sections.
Table I shows the mode distribution of each sequence, which
is calculated as the percentages of Intra, IBC, and PLT coded
areas in a sequence. Since sequences in A+CC only contain
NIBs, it is observed that 97.46% areas of them are encoded by
Intra mode on average. Therefore, the CU type classification in
[20], [22]–[24] is efficient for NIBs by skipping both IBC and
PLT modes. However, it is observed that the mode distributions
of sequences in TGM+M are much more complicated, where
all modes take up large percentages. Even although
“ChineseEditing”, “Console”, “Desktop” and “FlyingGraphics”
only contain SCBs, Intra mode still takes up 10.06%-14.56% in
those sequences. Besides, IBC and PLT modes are not evenly
distributed. For example, IBC mode takes up 70.93% while
PLT mode only takes up 16.72% in “FlyingGraphics”.
Comparatively, SCBs in “Map” are more likely to select PLT
Fig. 2. Exhaustive mode search in a CU.
MissionControlClip3 (M) Desktop (TGM)
Robot (A) Kimono1 (CC)
Fig. 3. Examples of testing sequences in four categories.