CHROMA TOOLBOX: MATLAB IMPLEMENTATIONS FOR EXTRACTING
VARIANTS OF CHROMA-BASED AUDIO FEATURES
Meinard M
¨
uller
Saarland University
and MPI Informatik
meinard@mpi-inf.mpg.de
Sebastian Ewert
Computer Science III
University of Bonn
ewerts@iai.uni-bonn.de
ABSTRACT
Chroma-based audio features, which closely correlate to the
aspect of harmony, are a well-established tool in processing
and analyzing music data. There are many ways of comput-
ing and enhancing chroma features, which results in a large
number of chroma variants with different properties. In this
paper, we present a chroma toolbox [13], which contains
MATLAB implementations for extracting various types of
recently proposed pitch-based and chroma-based audio fea-
tures. Providing the MATLAB implementations on a well-
documented website under a GNU-GPL license, our aim is
to foster research in music information retrieval. As an-
other goal, we want to raise awareness that there is no sin-
gle chroma variant that works best in all applications. To
this end, we discuss two example applications showing that
the final music analysis result may crucially depend on the
initial feature design step.
1. INTRODUCTION
It is a well-known phenomenon that human perception of
pitch is periodic in the sense that two pitches are perceived
as similar in “color” if they differ by an octave. Based on
this observation, a pitch can be separated into two com-
ponents, which are referred to as tone height and chroma,
see [19]. Assuming the equal-tempered scale, the chromas
correspond to the set {C, C
♯
, D, . . . , B} that consists of the
twelve pitch spelling attributes
1
as used in Western music
notation. Thus, a chroma feature is represented by a 12-
dimensional vector x = (x(1), x(2), . . . , x(12))
T
, where
x(1) corresponds to chroma C, x(2) to chroma C
♯
, and so
1
Note that in the equal-tempered scale different pitch spellings such C
♯
and D
♭
refer to the same chroma.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page.
c
2011 International Society for Music Information Retrieval.
CP
CLP
CENS
CRP
Audio
representation
Pitch
representation
Chroma
representation
Tuning
estimation
Multirate
pitch
filterbank
Smoothing
Logarithmic
compression
Quantization
Reduction
Normalization
Figure 1. Overview of the feature extraction pipeline.
on. In the feature extraction step, a given audio signal is
converted into a sequence of chroma features each express-
ing how the short-time energy of the signal is spread over
the twelve chroma bands.
Identifying pitches that differ by an octave, chroma fea-
tures show a high degree of robustness to variations in
timbre and closely correlate to the musical aspect of har-
mony. This is the reason why chroma-based audio fea-
tures, sometimes also referred to as pitch class profiles, are
a well-established tool for processing and analyzing music
data [1, 5, 12]. For example, basically every chord recog-
nition procedure relies on some kind of chroma represen-
tation [2, 4, 11]. Also, chroma features have become the
de facto standard for tasks such as music synchronization
and alignment [7, 8, 12], as well as audio structure analy-
sis [16]. Finally, chroma features have turned out to be a
powerful mid-level feature representation in content-based
audio retrieval such as cover song identification [3, 18] or
audio matching [10, 15].
There are many ways for computing chroma-based audio
features. For example, the conversion of an audio record-
ing into a chroma representation (or chromagram) may be
performed either by using short-time Fourier transforms in
combination with binning strategies [1] or by employing
suitable multirate filter banks [12]. Furthermore, the prop-
erties of chroma features can be significantly changed by