1318 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 5, A UGUST 2014
Variable Size Transforms and transform skipping: 4x4
and 8x8 DCT transforms are adopted in H.264/AVC to remove
the redundancy of residual signal [22]. Since video content with
high resolution shows up stronger correlatio n than that of lower
resolution, HEVC extends the largest DCT transform size to
32x32 to fully ex plo it the correlation in video signal.
However blocks with screen content contain complex struc-
tures and sharp edges, which cannot be compactly represented
in DCT domain. Th us the coding efficiency of HEVC on screen
contents is compromised. To code the screen content more ef-
ficiently, transform skipping modes proposed in [16] an d [17]
are incorporated into HEVC, which skip the transform process
to minim ize the R-D cost [23].
Loop Filters: Duetothehybridblockcoding structure, block
artifacts and inter-block inconsistence can b e observed at block
boundaries. Loop filter is adopted to smo oth neighboring coding
blocks and impro ves the visual quality of reconstru
cted blocks.
In HEV C, both strong filter and weak filter are adopted to re-
move the blocking artifacts. However, when applied t o screen
content, the l oop fi lter may downgrade the visual quality o f
screen content when neighboring blocks are inconsistent with
each other at boundaries. S o a clipping operation is applied
on the strong filter to avoid averaging pi
xels with big differ-
ence [24].
III. P
ROPOSED CODING S
CHEME BASED ON HEVC
HEVC introduces several new techniques to improve the
coding efficiency of natural video, however its coding effi-
ciency on screen content is compromised. On one hand, blocks
with complex structures in screen content cannot be efficiently
compressed. Although the transform skipping (TS) modes can
be used for screen content, they change the d istribution of
the residual ene rgy rather than r edu cing it. On the other hand,
non-translational motions of screen content video cannot be
handled efficiently in HEVC. To solve the above problems,
the multi-stage directional mode (MDM) and the multi-stage
temporal mode (MTM) are proposed and incorporated into
HEVC for screen content coding. The proposed modes are
based on the base co lor representation. The main d ifference
between them is how the i ndex map is coded.
A. Base Color Representation for Screen C ontents
Screen content can be regarded as a combination of two parts:
colors and structure. From the histogram of screen content, we
can find that the colors can be compactly represented using sev-
eral base colors. The structure can be represented using an index
map, which specifies the color of each pixel. As shown in Fig. 3,
the proposed representation first decomposes the input image
block into base colors and index map using colo r quantization.
Then the base colors and index map will be entropy coded using
CABAC [25] to achieve better coding performance. At the de-
coder side, the base colors and in dex map will be obtained and
then combined together to generate the recons tru c ted block.
There are several advan tages in the base color representation.
First the color component and structure component are sepa-
rated, which makes it possible to exploit t he color redundancy
and structure redundancy using different schemes. Second , due
Fig. 3. Base color representation.
Fig. 4. Multi-stage index pr ed ictio n scheme.
to the high degree flexibility of the in dex map, the complex
structures can be compactly represented to achieve better coding
efficiency. Last, the tr a nsfo rm is completely skipped, th us the
energy will not be scattered into m any coefficients.
B. Multi-Stage Directional Mode (MDM)
The indexes in the index map are highly correlated with their
spatial neighbors. In our p reviou s work [21], we proposed a
multi-stage index coding scheme to exploit the correlations
among indexes. Fig. 4 shows the prediction process of the
multi-stage index coding scheme. T he current index is first
compared with the first prediction and the comparison result
is stored to the first matching table. If the current index is not
matchedbythefirst prediction, it will be further compared with
the second prediction. If the current index is not matched by all
predictions, it will be stored to the un matched ind ex map. In ou r
previous work [21 ], t he two indexes with minim al differences
along the four texture directions (vertical, horizontal, diagonal
and negative-diagonal) are selected as predictions. The cost of
theindexmatchedbythefirst p rediction is one bit. Tw o bits are
needed for the index matched by the second prediction. Tw o
bits and additional Log(K-2) bits are needed to encode each
unmatched index, where
denotes the number of base colors.
In this paper, we propose a multi-stage directional mode
based on the two-stage index decomposition, which employs
asimplified directional index prediction scheme to reduce