1722 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012
Fig. 2. Different partitions allowed for inter-picture prediction in a CB used
by HEVC.
better capturing the distinct types of motion in the scene.
Ideally, the boundary of each region should coincide with
the motion discontinuities in the given video signal. Using
quadtree-based block partitioning combined with merging, the
picture can be subdivided into smaller and smaller blocks,
thereby approximating the motion boundary and minimizing
the size of partitions containing the boundary. Subsequently,
each created quadtree leaf block can be merged into a region
on either side of the boundary.
It should be noted that a scheme merging spatially neighbor-
ing blocks is conceptually similar to spatial prediction modes
as, e.g., the spatial direct mode in H.264/AVC [16]. This
mode also tries to reduce coding cost by using redundancies
of motion parameters in neighboring blocks. However, the
improvements over H.264/AVC shown in previous work [4],
[6] suggest that the merging concept is superior in exploiting
these redundancies. This is also confirmed by the experiments
presented in this paper, where the proposed block merging
algorithm is compared to a direct mode similar to that of
H.264/AVC, which we have reintegrated into HEVC just for
the purpose of analysis.
III. Quadtree-Based Partitioning in HEVC
The first part of this section gives an overview of the
quadtree-based partitioning in HEVC and introduces terminol-
ogy used throughout the rest of this paper. As we are particu-
larly concerned about the motion model and its parameters, we
also introduce the prediction employed for differential coding
of motion vectors (MVs). The following description of HEVC
is based on the Main profile, which is the only profile defined
in the DIS.
A. Quadtree Structure
For HEVC, a quadtree-based coding approach was intro-
duced such that each picture is divided into square coding tree
blocks (CTBs). Each CTB is the root of a coding tree, which
is used to further divide the CTB into coding blocks (CBs).
Their size can be adaptively chosen by using a quadtree-based
partitioning with the leaves of the quadtree representing the
CBs [7]. Each CB is a root for a prediction and a transforma-
tion tree. The prediction tree has only one level and describes
Fig. 3. HEVC quadtree structures. (a) CTB (solid block) partitioned into
CB (solid) and transform blocks (dashed) of variable size. (b) Corresponding
nested quadtree structure.
how a CB can be further split into so-called prediction blocks
(PBs), for each of which prediction parameters are specified.
Fig. 2 depicts all different ways allowed by the current Main
profile to split a CB into inter-PBs. For transform coding of
the prediction residual signal, each CB can also be split into
smaller transform blocks (TBs) using another quadtree called
the residual quadtree (RQT) [7], [17]. Fig. 3 illustrates this
nested quadtree structure, i.e., the coding quadtree with the
CTB as root (solid bold line) and the CBs as leaves (solid
lines), each of which is the root of the nested RQT with the
TBs as leaves (dashed lines).
All the blocks in different trees (coding, prediction, or trans-
form tree) correspond to specific sample arrays with different
sizes. Depending on which tree they are related to, these
blocks are associated with a specific syntax structure and form
together the so-called units. The TB luma and chroma sample
arrays and associated syntax elements, e.g., coded block flags
or transform coefficient levels, are grouped together in a trans-
form unit (TU). A prediction unit (PU) encapsulates everything
that is related to prediction, i.e., the PB sample arrays and as-
sociated syntax elements, e.g., MVs or intra-picture prediction
modes. The CB sample arrays, the associated syntax elements
like the mode information whether intra- or inter-picture pre-
diction are used and the associated PUs and TUs are grouped
together in a coding unit (CU). Consequently, the CTB sample
arrays, associated coding tree syntax and associated CUs are
considered as coding tree unit (CTU). Thus, it can be said that
the CTU generalizes the concept of a macroblock as the basic
processing unit in standardized video coding.
B. Prediction of MVs
The H.264/AVC standard has only one single MV predictor
to differentially code the MVs, computed as the median of