没有合适的资源?快使用搜索试试~ 我知道了~
首页率失真优化论文:Rate-Distortion Optimization for Video Compression.pdf
资源详情
资源评论
资源推荐

he rate-distortion efficiency of today‘s video
compression schemes is based
on
a
sophisti-
cated interaction between various motion rep-
resentation possibhties, waveform codmg of
dfferences, and waveform coding of various refreshed
re-
gions. Hence, a ley problem
in
high-compression video
coding
is
the operational control of the encoder.
This
problem is compounded by the widely varyltlg content
and motion found
in
typical video sequences, necessitating
the selection between dfferent representation possibilities
with vaniing rate-distortion
effi-
Video
Compression
Basics
Motion video data consists essentially
of
a
time-ordered
sequence of pictures, and cameras typically generate ap-
proximately
24,25,
or
30
pictures (orjkwes) per second.
This
results
in
a large amount
of
data that demands the
use of compression. For example,
assume
that each pic-
ture has a relatively low “QCIF” (quarter-com-
mon-intermediate-format)
resolution (i.e.,
176
x
144
samples) for which each sample is digitally represented
with
8
bits, and assume that we slup two out of every
three Dictures in order
to
cut
II
ciency.
This
article
addresses
the
problem of video encoder optimi-
down
the
bit
rate.
For
color pic-
tures. three color comDonent
L
zation and discusses its conse-
quences on the compression
architecture of the overall coding
samples are necessary to repre-
sent a sufficient color space for
each Dixel.
In
order to transmit
system. Based on the well-laown
hybrid video coding structure,
Lagrangian optimization tech-
niques are presented that
try
to answer the question:
‘“hat part of the video signal should be coded
using
what
method and parameter
settings?”
even
this relatively low-fidelity
sequence of pictures, the raw
source data rate is
still
more than
6
Mbit/s. However, today’s low-cost transmission chan-
nels often operate
at
much lower data rates
so
that the
data rate of
the
video signal needs
to
be further com-
74
IEEE SIGNAL PROCESSING MAGAZINE
1053-5888/98/$10.000
1998IEEE
NOVEMBER
1998

A
History
of
Existing
Visual Coding Standards
H.
120:
The first international digital video coding stan-
dard [3]. It may have even been the first international digital
compression standard for natural continuous-tone visual
content of any kind (whether video or still picture). H.120
was developed by the ITU-T organization (the International
Telecommunications Union-Telecommunications Stan-
dardization Sector, then called the CCIIIT), and received
fi-
nal approval in 1984.
It
originally was a conditional
replenishment (CR) coder with differential pulse-code mod-
ulation (DPCM), scalar quantization, and variable-length
coding, and it had an ability
to
switch
to
quincunx
sub-sampling for bit-rate control. In 1988, a second version
of H. 120 added motion compensation and background pre-
diction. (None of the later completed standards have yet in-
cluded background prediction again, although a form of
it
is
in the draft
of
the future MPEG-4 standard.) Its operational
bit rates were 1544 and 2048 Kbit/s. H.120 is essentially
no
longer in use today, although a few H.120 systems are ru-
mored
to
still be in operational condition.
H.261:
The first widespread practical success-a video
codec capable of operation at affordable telecom bit rates
(with 80-320 Kbit/s devoted
to
video) [4,5].
It
was the first
standard
to
use
the basic typical structure we find still pre-
dominant today
(16
x
16 macroblock motion compensation,
8
x
8
block DCT, scalar quantization, and two-dimensional
run-level variable-length entropy coding). H.261 was ap-
proved by the ITU-T in early 1991 (with technical content
completed in late 1990).
It
was later revised in 1993
to
in-
clude a backward-compatible high-resolution graphics
transfer mode. Its target bit-rate range was 64-2048 Kbit/s.
JPEG:
A highly successful continuous-tone, still-picture
coding standard named after the Joint Photographic Experts
Group that developed it
[
1,2]. Anyone who has browsed the
world-wide web has experienced JPEG. JPEG
(IS
10918-l/ITU-T
T.81)
was originally approved in 1992 and
was developed as an official joint project of both the
ISO/IEC JTCl and ITU-T organizations. In its typical use,
it is essentially H.261
INTRA
coding with prediction of aver-
age values and an ability
to
customize the quantizer recon-
struction scaling and the entropy coding
to
the specific
picture content. However, there is much more in the JPEG
standard than what is typically described or used. In particu-
lar, this includes progressive coding, lossless coding, and
arithmetic coding.
MPEG-1:
A widely successful video codec capable of ap-
proximately VHS videotape quality or better at about 1.5
Mbit/s and covering a bit rate range of about 1-2 Mbit/s [6,
71. MPEG-1 gets its acronym from the Moving Pictures Ex-
perts Group that developed it [6,
71.
MPEG-1 video
(IS
11
172-2) was a project of the ISO/IEC JTCl organization
and was approved in 1993. In terms of technical features, it
added bi-directionally predicted frames (known as
B-frames) and half-pixel motion. (Half-pixel motion had
been proposed during the development of H.261, but was
apparently thought
to
be
too
complex at the time.)
It
pro-
vided superior quality than H.261 when operated at higher
bit rates. (At bit rates below, perhaps,
1
Mbit/s, H.261 per-
forms better, as MPEG-1 was not designed
to
be capable of
operation in this range.)
-
MPEG-2:
A step higher in bit rate, picture quality, and
popularity. MPEG-2 forms the heart of broadcast-quality
digital television for both standard-definition and
high-definition television (SDTV and HDTV)
[
7-91.
MPEG-2 video
(IS
13818-2/ITU-TH.262)
was
designed
to
encompass MPEG-1 and
to
also provide high quality with
interlaced video sources at much higher bit rates. Although
usually thought of as an
IS0
standard, MPEG-2 video was
developed as an official joint project of both the ISO/IEC
JTCl and ITU-T organizations, and was completed in late
1994. Its primary new technical features were efficient han-
dling of interlaced-scan pictures and hierarchical bit-usage
scalability. Its target bit-rate range was approximately 4-30
Mbit/s.
H.263:
The first codec designed specifically
to
handle
very low-bit-rate video, and its performance in that arena
is still state-of-the-art [lo,
111.
H.263 is the current best
standard for practical video telecommunication. Its orig-
inal target bit-rate range was about 10-30 Kbit/s, but this
was broadened during development
to
perhaps at least
10-2048 Kbit/s as it became apparent that
it
could be
su-
perior
to
H.261 at any bit rate. H.263 (version
1)
was a
project of the ITU-T and was approved in early 1996
(with technical content completed in 1995). The key new
technical features of H.263 were variable block-size mo-
tion compensation, overlapped- block motion compensa-
tion (OBMC), picture-extrapolating motion vectors,
three-dimensional run-level-last variable-length coding,
median MV prediction, and more efficient header infor-
mation signaling (and, relative
to
H.261, arithmetic cod-
ing, half-pixel motion, and bi-directional
prediction-but the first of these three features was also
found in JPEG and some form of the other two were in
MPEG-1). At very low bit rates (e.g., below 30 Kbit/s),
H.263 can code with the same quality as H.261 using
half or
less
than half the bit rate
[
121. At greater bit rates
(e.g., above
80
Kbit/s) it can provide a more moderate
degree of performance superiority over H.261. (See also
H.263
+
below.)
H.263+:
Technically a second version of H.263
[
10,
131. The H.263+ project added a number of new op-
tional features
to
H.263. One notable technical advance
over prior standards is that H.263 version 2 was the first
video coding standard
to
offer a high degree of error re-
silience for wireless or packet-based transport networks.
H.263+ also added a number of improvements in com-
pression efficiency, custom and flexible video formats,
scalability, and backward-compatible supplemental en-
hancement information. It was approved
in
January of
1998
by the ITU-T (with technical content completed in
September 1997). It extends the effective bit-rate range
of H.263
to
essentially any bit rate and any progres-
sive-scan (noninterlace) picture formats and frame rates,
and H.263+ is capable of superior performance relative
to
any existing standard over this entire range. The first
author was the editor of H.263 during the H.263
+
pro-
ject and is the Rapporteur (chairman) of the ITU-T Ad-
vanced Video Coding Experts Group (SG16/Q15),
which developed it.
NOVEMBER
1998
IEEE SIGNAL PROCESSING MAGAZINE
75

The most successful
cli
video compression des
called hybrid codecs.
I
I
I
I
I
I
I
I
I
I
I
I
ISS
of
igns are
Inverse DCT
I
I
I
I
I
Motion
Compensated
I
Prediction
0
I
Approximated
I
Input Frame
I
I
(b-
1-
(To Display)
I
Prior Coded
pressed. For instance, using V.34 modems that transmit
at most 33.4 I<bit/s over dial-up analog phone lines, we
still need
to
compress the video bit rate further by a factor
of about 200 (more if audio is consuming
6
Kbit/s of that
same channel or if the phone line is
too
noisy for achiev-
ing the full bit rate of V.34).
One way of compressing video content is simply
to
compress each picture, using an image-coding syntax
such as
JPEG
[
1,2]. The most common “baseline”
JPEG
scheme consists of breaking up the image into equal-size
blocks. These blocks are transformed by a discrete cosine
transform (DCT), and the DCT coefficients are then
quantized and transmitted using variable-length codes.
We will refer
to
this kind of coding scheme as
INTRA-frame coding, since the picture is coded without
referring
to
other pictures in the video sequence. In fact,
such INTRA coding alone (often called “motion
JPEG)
is
in common
use
as a video coding method today in pro-
duction-quality editing systems that demand rapid access
to
any frame of video content.
However, improved compression performance can be
I
Approximated
attained by talung advantage
of the large amount of tempo-
ral redundancy in video con-
tent. We will refer
to
such
techniques as INTER-frame
coding.
Usually,
much of
the
depicted scene is essentially
just repeated in picture after
picture without any signifi-
cant change. It should be
ob-
vious then that the video can
be
represented more effi-
ciently by coding only the
changes in the video content,
rather than coding each entire
picture repeatedly. This abil-
ity
to
use the tempo-
ral-domain redundancy
to
improve coding efficiency is
what fundamentally distin-
guishes video compression
from still-image compression.
A simple method of im-
proving compression by cod-
ing only the changes in a
video scene is called condi-
tional replenishment (CR),
I I
redundancy reduction method
used
in the first digital
video coding standard, ITU-T Rec. H.120 [3]. CR cod-
ing consists of sending signals
to
indicate which areas ofa
picture can just be repeated, and sending new coded in-
formation
to
replace the changed areas. CR thus allows a
choice between one of two modes of representation for
each area, which are called the
SKIP
mode and the
INTRA
mode. However, CR coding has
a
significant shortcom-
ing, which is its inability
to
refine an approximation. Of-
ten the content of an area of a prior picture can
be
a good
approximation of the new picture, needing
only
a minor
alteration
to
become a better representation. Hut CR cod-
ing allows only exact repetition or complete replacement
of each picture area. Adding a third tvpe of “prediction
mode,” in which a refining
pame
dzference
approxima-
tion can be sent, results in a further improvement of com-
pression performance.
The concept
of
frame difference refinement can also be
taken a step further, by adding
motion-compensated
predic-
tion (MCP). Most changes in video content are typically
due
to
the motion of objects in the depicted scene relative
to
the imaging plane, and a small amount of motion can
result in
a
large difference in the values of the pixels in a
picture area (especially near the edges of an object). Of-
ten,
displacing
an area of the prior picture by a few pixels
in spatial location can result in a significant reduction in
the amount of information that needs
to
be sent as a frame
difference approximation. This use of spatial displace-
ment
to
form an approximation is known as motion com-
:
1
I
Input
__
Motion Frame
Compensated
v
I
I
Frame Buffer
.,
-
I
Prediction (Delay)
I
I
Encoded Residual
(To
Channel)
DCT,
Quantization,
Entropy Code
-
ame
.
A
0
Motion
9
Estimation and
-
I
(Dotteq Box
ShowslDecoder)
I
I
‘___--___---_____--____-___________-____I
Motion Vector and
Prediction Mode Data
(To Channel)
I
I
Entropy Decode,
I
.
Inverse Quantize,
I I
and
it
was the only temporal
A
1.
Typical motion-compensated DCT video coder.
76
IEEE SIGNAL PROCESSING MAGAZINE
NOVEMBER
1998

In practice, a number
of
interactions between coding
decisions must be neglected in
video coding optimization.
pensation and the encoder’s search for the best spatial
displacement approximation to
use
is known as motion
estimation. The coding of the resulting difference signal
for the refinement of the
MCP
signal is linown as dis-
placed frame difference (DFD) coding.
Hence, the most successful class of video compression
designs are called hybrid codecs. The naming ofthis coder
is due
to
its construction as a hybrid of motion-handling
and picture-coding techniques, and the term codec is used
to refer
to
both the coder and decoder of a video compres-
sion system. Figure
1
shows such a
kybyid
coder. Its de-
sign and operation involve the optimization of a number
of decisions, including
1.
How to segment each picture into areas,
2.
Whether or not to replace each area
of
the picture
3. If not replacing an area with new
INTRA
content
(a) How
to
do motion estimation; i.e, how to select
the spatial shifting displacement
to
use for INTEK-picture
predictive coding (with a zero-valued displacement being
an important special case),
(b)
How
to
do
DFD
coding; i.e., how
to
select the ap-
proximation
to
use as a refinement of the
INTER
predic-
tion (with a zero-valued approximation being an
important special case), and
4.
If replacing an area with new
INTIU
content, what
approximation
to
send as the replacement content.
At this point, we have introduced a problem for the en-
gineer who designs such a video coding system, which is:
Whatpaaof the imapeshould
be
coded
using what method?
If
the possible modes of operation are restricted to
INTRA
coding and
SKIP,
the choice is relatively simple. However,
hybrid video codecs achieve their compression perfor-
mance by employing several inodes of operation that are
adaptively assigned to parts of the encoded picture, and
there is a dependency between the effects of the motion
estimation and
DFD
coding stages of
INTER
coding. The
modes of operation are generally associated with sig-
n a
1
-
d e pe tide n t rate
-
d
i
s
t
o
r
t
i
on
character is
t
i
cs
,
and
rate-distortion trade-offs are inherent in the design
of
each ofthese aspects. The second and third items above in
particular are unique
to
motion video coding. The opti-
mization of these decisions in the design and operation of
a
video coder is the primary topic of this article. Some fur-
ther techniques that
go
somewhat beyond this model will
also be discussed.
with completely new INTRA-picture content,
Motion-Compensated
Video Coding Analysis
Consider the nth coded picture of size
W
x
H
in a video
sequence, consisting of an array
I
,,
(s)
of color component
values (e.g.,
T,,
(s),Cbn
(s),
andG,,
(s))
for each pixel lo-
cation
s
=
(x,
y),
in which
x
and
y
are integers such that
0
5
x
<
W
and
0
I
y
<
H.
The decoded approximation
of
this picture will be denoted as
T,,
(s).
The typical video decoder (see Fig.
1)
receives a repre-
sentation of the picture that is segmented into some num-
ber
K
of distinct regional areas
{al3
jl
}
$‘I,
.
For each area, a
prediction-mode signal
p,,
Jj
E
{OJ}
is received indicating
whether or not the area is predicted from the prior pic-
ture. For the areas that are predicted from the prior pic-
ture, a motionvector
(MV),
denoted
v,.!~,
is received. The
MV
specifies a spatial displacement for motion compen-
sation
of
that region. Using the prediction mode and
An Overview of Future Visual
Coding Standardization Projects
MPEG-4:
A future visual coding standard for both still
and moving visual content. The ISO/IEC SC29 WG11 or-
ganization is currently developing
two
drafts, called ver-
sion 1 and version 2 of MPEG-4 visual. Final approval
of
version 1 is planned in January 1999 (with technical con-
tent completed in October 1998), and approval
of
version
2 is currently planned for approximately one year later.
MPEG-4 visual (which will become
IS
14496-2) will in-
clude most technical features of the prior video and
still-picture coding standards, and will also include a num-
ber of new features such as zero-tree wavelet coding ofstill
pictures, segmented shape coding ofobjects, and coding of
hybrids ofsynthetic and natural video content. It will cover
essentially all bit rates, picture formats, and frame rates, in-
cluding both interlaced and progressive-scan video pic-
tures. Its efficiency for predictive coding
of
normal
camera-view video content will be similar
to
that of H.263
for noninterlaced video sources and similar
to
that
of
MPEG-2 for interlaced sources. For some special purpose
and artificially generated scenes,
it
will provide signifi-
cantly superior compression performance and new ob-
ject-oriented capabilities. It will also contain a still-picture
coder that has improved compression quality relative to
JPEG at low bit rates.
H.263+
+:
Future enhancements
of
H.263. The
H.263+
+
project is considering adding more optional cn-
hancements
to
H.263 and is currently scheduled for com-
pletion late in
the
year 2000. It is a project of the ITU-T
Advanced Video Coding Experts Group (SG 16/Q15).
JPEG-2000:
A
hture new still-picture coding stan-
dard. JPEG-2000 is a joint project of the ITU-T
SG8
and
ISO/IEC JTC1 SC29 WGl organizations. It is scheduled
for completion late in the year 2000.
H.26L:
A
future new generation of video coding stan-
dard with improved efficiency, error resilience, and stream-
ing support. H.26L
is
currently scheduled for approval in
2002.
It
is a project ofthe ITU-T Advanced Video Coding
Experts Group (SG16/Q15).
NOVEMBER
1998
IEEE SIGNAL PROCESSING MAGAZINE
77
剩余16页未读,继续阅读



















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0