Energy Classification-assisted
Fingerprint System For
Content-based Audio Copy Detection
Yongchao Zhang(Corresponding author)
*
, Mingxing Xu and Emlyn Pratt
Tsi
nghua University, Department of Computer Science and Technology, Beijing, China
*
Corresponding author (E-mail : zhangyc1984@gmail.com)
AbstractIn recent years, fingerprint systems have been
applied more and more widely in content-based audio copy
detection. However, in the face of harsh conditions, the
performance of traditional systems is insufficient. In this paper,
we present an improved fingerprint system oriented towards the
Content Based Copy Detection (CBCD) task in TRECVID. We
extracted the Sign of Energy Band Differences feature for each
audio clip and apply energy binary classification for each frame
of the clip, and then applied this feature in the matching
algorithm. We also refined the index mechanism and added post-
processing to increase speed and eliminate false alarms. The
experimental results show that in the worst case scenario, the
recall rate reaches 94.40%, with a precision rate of 100%. The
results also indicate that the system is robust against several
distortions, and can process audio in real time with its index
mechanism.
Keywords-Sign of Energy Band Differences; audio fingerprint;
content-based audio copy detection.
I. INTRODUCTION
Due to the explosive growth of the Internet and the rapid
proliferation of digital devices, a large amount of audio data is
generated daily, including music, songs, advertisements,
lectures, and so on. While audio media brings convenience and
joy to people, it also gives rise to rampant piracy which makes
audio copy detection increasingly important.
Content-based audio copy detection refers to the following
[4]: Given an excerpt of an audio recording (the query clip), the
task is to automatically retrieve all excerpts from a given audio
database which contain either the query itself or audio
sufficiently similar to the query. Detection is made difficult
when the copy occurs only in part of the audio, or when it
contains distortion. In TRECVID 2009, the content-based audio
copy detection task was defined as follows [3]: Given a test
collection of audios and a set of queries, determine for each
query the place, if any, that some part of the query occurs, with
possible transformations, in the test collection. The set of
possible transformations is based to the extent possible on
actually occurring transformations, such as MP3
compression. In audio copy detection, speed and accuracy are
two significant metrics. In particular, TREVCID demands a 0%
false alarm rate.
The Sign of Energy Band Differences (SEBD) feature is
widely employed in audio copy detection for generating
fingerprints due to its simplicity and efficiency. There are three
different ways to extract this feature [2] [5] [6], but no
theoretical analysis exists on which best fits copy detection, so
we first analyze and evaluate each of the three approaches.
We use the energy classification information to assist the
fingerprint feature when matching two audio clips. In our
experiments, for speed purpose we just do binary classification.
We use the unsupervised classifying strategy in which all
frames are classified into two categories according to their
energy value. This results in a category sequence for each
separate audio clip. In addition, analysis of the offline index
table suggested, and tests confirmed, that by skipping the search
for certain kinds of fingerprints, we would obtain a speed boost.
The organization of the paper is as follows. In Section 2 we
describe the system composition. The fingerprint extraction
method is presented in Section 3. Our retrieval algorithm can be
found in Section 4. In the last two sections we present our
experimental results and conclusions.
II. S
YSTEM COMPOSITION
As shown in Fig. 1, the traditional fingerprint system is
divided into two stages: Feature extraction and index building is
performed off-line (top); in the identification mode, unlabeled
audio is presented to the system to look for a match (bottom)[5].