TECHNICAL OVERVIEW OF VP8, AN OPEN SOURCE VIDEO CODEC FOR THE WEB
Jim Bankoski, Paul Wilkins, Yaowu Xu
Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA, USA
{jimbankoski, paulwilkins, yaowu}@google.com
ABSTRACT
VP8 is an open source video compression format supported by a
consortium of technology companies. This paper provides a
technical overview of the format, with an emphasis on its unique
features. The paper also discusses how these features benefit VP8
in achieving high compression efficiency and low decoding
complexity at the same time.
Index Terms—VP8, WebM, Video Codec, Web Video
1. INTRODUCTION
In May 2010, Google announced the start of a new open media
project “WebM”, which is dedicated to developing a high-quality,
open media format for the web that is freely available to everyone.
At the core of the project is a new open source video compression
format, VP8. The VP8 format was originally developed by a small
research team at On2 Technologies, Inc. as a successor of its VPx
family of video codecs. Compared to other video coding formats,
VP8 has many distinctive technical features that help it to achieve
high compression efficiency and low computational complexity for
decoding at the same time. Since the WebM announcement, not
only has VP8 gained strong support from a long list of major
industry players, but it has also started to attract broad interest in
the video coding research community from both industry and
academia.
This paper aims to provide a technical overview of the VP8
compression format, with an emphasis on VP8’s unique features.
Section 2 briefly reviews VP8’s design assumptions and overall
architecture; section 3 to section 7 describes VP8’s key technical
features: transform and quantization scheme, reference frame
types, prediction techniques, adaptive loop filtering, entropy
coding and parallel processing friendly data partitioning; section 8
provides a short summary with experimental results and some
thoughts on future work.
2. DESIGN ASSUMPTIONS AND FEATURE HIGHLIGHTS
From the very beginning of VP8’s development, the developers
were focused on Internet/web-based video applications. This focus
has led to a number of basic assumptions in VP8’s overall design:
Low bandwidth requirement: One of the basic design
assumptions is that for the foreseeable future, available network
bandwidth will be limited. With this assumption, VP8 was
specifically designed to operate mainly in a quality range from
“watchable video” (~30dB in the PSNR metric) to “visually
lossless” (~45dB).
Heterogeneous client hardware: There is a broad spectrum of
client hardware connected to the web, ranging from low power
mobile and embedded devices to the most advanced desktop
computers with many processor cores. It must, therefore, be
possible to create efficient implementations for a wide range of
client devices.
Web video format: VP8 was designed to handle the image
format used by the vast majority of web videos: 420 color
sampling, 8 bit per channel color depth, progressive scan (not
interlaced), and image dimensions up to a maximum of
16383x16383 pixels.
The push for compression efficiency and decoder simplicity
under these design assumptions led to a number of distinctive
technical features in VP8 [1], relative to other known video
compression formats, such as MPEG-2 [2], H.263 [3] and
H.264/AVC [4]. The following list highlights the technical
innovations in VP8:
Hybrid transform with adaptive quantization: VP8 uses 4x4
block-based discrete cosine transform (DCT) for all luma and
chroma residual signal. Depending on the prediction mode, the DC
coefficients from a 16x16 macroblock may then undergo a 4x4
Walsh-Hadamard transform.
Flexible reference frames: VP8 uses three reference frames
for inter prediction, but the scheme is somewhat different from the
multiple reference motion compensation scheme seen in other
formats. VP8’s design limits the buffer size requirement to three
reference frame buffers and still achieves effective de-correlation
in motion compensation.
Efficient intra prediction and inter prediction: VP8 makes
extensive uses of intra and inter prediction. VP8’s intra prediction
features a new “TM_PRED” mode as one of the many simple and
effective intra prediction methods. For inter prediction, VP8
features a flexible “SPLITMV” mode capable of coding arbitrary
block patterns within a macroblock.
High performance sub-pixel interpolation: VP8’s motion
compensation uses quarter-pixel accurate motion vectors for luma
pixels and up to one-eighth pixel accurate motion vectors for
chroma pixels. The sub-pixel interpolation of VP8 features a single
stage interpolation process and a set of high performance six-tap
interpolation filters.
Adaptive in-loop deblocking filtering: VP8 has a highly
adaptive in-loop deblocking filter. The type and strength of the
filtering can be adjusted for different prediction modes and
reference frame types.
Frame level adaptive entropy coding: VP8 uses binary
arithmetic coding extensively for almost all data values except a
few header bits. Entropy contexts are adaptive at the frame level,
striking a balance between compression efficiency and
computational complexity.
Parallel processing friendly data partitioning: VP8 can pack
entropy coded transform coefficients into multiple partitions, to
facilitate parallel processing in decoders. This design improves
decoder performance on multi-core processors, with close to zero
impact to compression efficiency and no impact to decoding
performance on single core processors.