IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, NO. 6, DECEMBER 2013 969
An Overview of Tiles in HEVC
Kiran Misra,Member,IEEE, Andrew Segall, Member, IEEE, Michael Horowitz, Shilin Xu, Arild Fuldseth, and
Minhua Zhou
Abstract—Tiles is a new feature in the High Efficiency Vi
deo
Coding (HEVC) standard that divides a picture into independent,
rectangular regions. This division provides a number of advan-
tages. Specifically, it increases the “parallel fri
endliness” of the
new standard by enabling improved c oding efficiency for parallel
architectures, as compared to previous sliced based m ethods.
Additionally, tiles facilitate improved m
aximum transmission unit
(MTU) size matching, reduced line buffer memory, an d add itional
region-of-interest functionality. In this paper, w e introduce the
tiles feature and survey the performance
of the tool. Coding effi-
ciency is reported for different parallelization factors and MTU
size requirements. Additionally, a tile-based region of interest
coding method is developed.
Index Terms—Video coding, multico
re processing, high effi-
ciency video coding, tiles.
I. INTRODUCTION
T
HE ISO/IE C’s Movin g Pictures Experts Group (MPE G)
and the International Telecommunications Union’s
(ITU-T) Video Coding Experts Group (VCEG) have recently
concluded work on the first edition of the High Efficiency
Video Co din g (HEVC) standard [3]–[5]. This standard was
developed collaboratively by the Joint Collaborative Team on
Video Coding (JCT-V C). For consumer applications, HEVC
has been reported to achieve 50% improvement in coding
efficiency when compared to previous coding standards such
as MPEG-4 AVC/IT U- T H.264 [1], [5]. These coding gains
are achieved through a number of improvements that result in
an increase in compu tational complexity for both encoder and
decoder.
Here, computatio nal complexity refers to a com bination of
algorithmic operations and memory transfers. Algorithmic op -
erations correspond to the calculations required in a decoder to
convert bit-stream in formation to reconstructed pixel values or
in an encoder to convert the original p ixel values to a bit-stream.
For hardware, this corresponds to logic gates; for software, this
corresponds to calculations on a CPU, GPU, or other processing
units. Memory transfers represent the amoun t of data that must
Manuscript received February 01, 2013; revised May 10, 2013; acce pte d June
12, 2013. Date of publication June 27, 2013; date of current version November
18, 2013. The guest editor c oordinating the review of this manuscript and ap-
proving it for publication was Prof. Oscar C. Au.
K. Misra and A. Segall are with Sharp L ab oratories of A me rica, Inc., C amas,
WA 98607 USA (e-mail: misrak@sharplabs.com; asegall@sharplabs.com).
M.HorowitzandS.XuarewitheBriskVideo, Inc., Vancouver, BC V6E 2E9,
Canada (e-mail: michael@ebriskvideo.com; shilin@ebriskvideo .com).
A. Fuldseth is with Cisco Systems, Oslo 1367, Norway (e-mail:
arild.fuldseth@cisco.com).
M. Zhou is with Tex as Instruments, Inc., Dallas, TX 75243 USA (e-mail:
zhou@ti.com).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.i eee.org.
Dig
ital Object Identifier 10.1109/JSTSP.2013.2271451
be stored and accessed to perform the required calculat
ions.
Typical architectures contain multiple memory types
,ranging
from high speed memory that is on-chip (including cac
hes n ear
a CPU core) to lower speed memory that is off-chip or f
ar-
ther from the core. In general, on-chip memory is mo
re expen-
sive and therefore relatively small. Addit
ionally, for many ar-
chitectures, the critical bottleneck is t
he bandwidth necessary to
transfer data from off-chip to on-chip m em
oryintimetocom-
plete the required calculations.
The increase in computational com p lexity
in HEVC com-
pared with earlier standards directly im
pacts the im plem enta-
tion an d design. F or systems with a s
ingle-core processor, the
increased complexity requires hig
her clock speeds. This has the
additional cost of increased pow e
r consumption and heat dis-
sipation. Fo r many applications
of interest today, the increased
clock rate is not desirable.
An alternative solution for addr
essing the increased computa-
tional com plexity is par alleli
sm. Parallelism in a vid eo system is
not a new concept. For examp l
e, today’s software based video
conferencing systems that
operate at resolutions up to 1080 p
(1920
1080 pixels) and fra
me rates of 60 frames per second
(fps) rely on high-level p
arallelism (i.e., encoders and decoders
that can process differe
nt portions of a video picture in a rela-
tively independent fas
hion) despite using the less computation-
ally complex H.264/A
VC and its scalable extension SVC. With
previous standards
, high-level parallelism within a picture may
be realized by parti
tioning the source frames using slices and
assigning each sli
ce to one of several processing cores. Slices
were o riginally d
esigned to map a bit-stream into smaller in-
dependently deco
dable ch unk s for transmission. The size of a
coded slice was t
ypically determined by the network character-
istics; for ex
ample, the size is often selected to be less than the
maximum trans
mission u nit (MTU) s ize of the network being
considered.
In practice, u
sing slices for parallelization results in a
number of dis
advantages. For example, the pixel segm entation
achievedbys
lices using only network constraints often result
in partiti
oning where the correlation existing in the pixel data is
reduced. T
his lowers the achieva ble coding efficiency. More-
over, sli
ces contain header information to facilitate independent
processi
ng of pixel data. With th e high er coding efficiency of
HEVC, th
is becomes problematic—it is p ossible to transmit
high re
solution video at low bit rates such that the overhead
introd
uced by a slice header is not negligible. Finally, for
appli
cations that require both parallelization and packetization,
it is
difficult to use slices to achieve an optimal partitioning for
both
goals.
Tile
s provide an alternative partitioning that divides a pictu re
into
rectangular sections that are processed in a relatively inde-
pen
dent fashion. Fig. 1 illustrates an exam ple where a picture
1932-4553 © 2013 IEEE