USB Device Class Definition for Audio Devices
Release 2.0 May 31, 2006 16
2 Management Overview
The USB is very well suited for transport of audio ranging from low fidelity voice connections to high
quality, multi-channel audio streams. The USB has become a ubiquitous connector on modern PC’s and is
well-understood by most consumers today. As such, it has become the connector of choice for many
peripherals and is indeed the simplest and most pervasive digital audio connector available today. With the
advent of the High Speed USB, consumers can count on this medium to meet all of their audio needs today
and into the future. Many applications from communications, to entertainment, to music recording and
playback, can take advantage of audio features of the USB.
In principle, a versatile bus specification like the USB provides many ways to propagate and/or control
digital audio. For the industry, however, it is very important that audio transport mechanisms be well
defined and standardized on the USB. Only in this way can interoperability be guaranteed among the many
possible audio devices on the USB. Standardized audio transport mechanisms also help to keep software
drivers as generic as possible. The Audio Device Class described in this document satisfies those
requirements. It is written and revised by experts in the audio field. Other device classes that address audio
in some way should refer to this document for their audio interface specification.
An essential issue in audio is synchronization of the data streams. Indeed, the smallest artifacts are easily
detected by the human ear. Therefore, a robust synchronization scheme on isochronous transfers has been
developed and incorporated in the USB Specification. The Audio Device Class definition adheres to this
synchronization scheme to transport audio data reliably over the bus.
This document contains all necessary information for a designer to build a USB-compliant device that
incorporates audio functionality. It specifies the standard and class-specific descriptors that must be present
in each USB audio function. It further explains the use of class-specific requests that allow for full audio
function control. A number of predefined data formats are listed and fully documented. Each format
defines a standard way of transporting audio over the USB. Provisions have been made so that vendor-
specific audio formats and compression schemes can be handled.
Many of the changes introduced in Version 2.0 of the USB Specification for Audio Devices take advantage
of the new features provided in the USB 2.0 Specification. With the additional bandwidth made available,
high speed USB operation allows the transport of multiple channels of high bit rate audio. This expands the
range of solutions provided by USB audio devices but also challenges the way in which they operate. In
addition to supporting the additional bandwidth, the specification supports new codec types for consumer
audio applications, provides numerous clarifications of the original specification and extensions to support
various changes in the core specification. The changes are not generally backwards compatible to 1.0
because that would too severely limit this new class of devices.
2.1 Overview of Key Differences between ADC v1.0 and v2.0
The following list is not an exhaustive list of all changes that have been introduced. For complete
information, refer to the full specification. Pay special attention to Sections 1 through 6!
• Complete support for high speed operation - no longer are audio class devices limited to full speed
operation.
• The notion of physical and logical Audio channel clusters.
• The number of predefined spatial locations has increased. In addition, a virtual spatial location
called Raw Data was introduced.
• Use of the interface association descriptor - The standard Interface Association mechanism is used
to describe an Audio Interface Collection. The former class specific mechanism was deprecated.
• Descriptor updates: fixed offsets associated with many descriptors and enlarged three byte fields
into four bytes.
• Extensive support for interrupts to inform the host about dynamic changes that occur on the
different addressable Entities (Clock Entities, Terminals, Units, interfaces and endpoints) inside
the audio function.
• More clarification text on the audio function.