Hindawi Publishing Corporation
EURASIP Journal on Image and Video Processing
Volume 2008, Article ID 246309, 10 pages
doi:10.1155/2008/246309
Research Article
Evaluating Multiple O bject Tracking Performance:
The C LEAR MOT Metrics
Keni Bernardin and Rainer Stiefelhagen
Interactive Systems Lab, Institut f
¨
ur Theoretische Informatik, Universit
¨
at Karlsruhe, 76131 Karlsruhe, Germany
Correspondence should be addressed to Keni Bernardin, keni@ira.uka.de
Received 2 November 2007; Accepted 23 April 2008
Recommended by Carlo Regazzoni
Simultaneous tracking of multiple persons in real-world environments is an active research field and several approaches have
been proposed, based on a variety of features and algorithms. Recently, there has been a growing interest in organizing systematic
evaluations to compare the various techniques. Unfortunately, the lack of common metrics for measuring the performance of
multiple object trackers still makes it hard to compare their results. In this work, we introduce two intuitive and general metrics to
allow for objective comparison of tracker characteristics, focusing on their precision in estimating object locations, their accuracy
in recognizing object configurations and their ability to consistently label objects over time. These metrics have been extensively
used in two large-scale international evaluations, the 2006 and 2007 CLEAR evaluations, to measure and compare the performance
of multiple object trackers for a wide variety of tracking tasks. Selected performance results are presented and the advantages and
drawbacks of the presented metrics are discussed based on the experience gained during the evaluations.
Copyright © 2008 K. Bernardin and R. Stiefelhagen. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
1. INTRODUCTION
The audio-visual tracking of multiple persons is a very active
research field with applications in many domains. These
range from video surveillance, over automatic indexing, to
intelligent interactive environments. Especially in the last
case, a robust person tracking module can serve as a powerful
building block to support other techniques, such as gesture
recognizers, face or speaker identifiers, head pose estimators
[1], and scene analysis tools. In the last few years, more and
more approaches have been presented to tackle the problems
posed by unconstrained, natural environments and bring
person trackers out of the laboratory environment and into
real-world scenarios.
In recent years, there has also been a growing inter-
est in performing systematic evaluations of such tracking
approaches with common databases and metrics. Examples
are the CHIL [2]andAMI[3] projects, funded by the
EU, the U.S. VACE project [4], the French ETISEO [5]
project, the U.K. Home Office iLIDS project [6], the CAVIAR
[7] and CREDS [8] projects, and a growing number of
workshops (e.g., PETS [9], EEMCV [10], and more recently
CLEAR [11]). However, although benchmarking is rather
straightforward for single object trackers, there is still no
general agreement on a principled evaluation procedure
using a common set of objective and intuitive metrics for
measuring the performance of multiple object trackers.
Li et al. in [12] investigate the problem of evaluating
systems for the tracking of football players from multiple
camera images. Annotated ground truth for a set of visible
players is compared to the tracker output and 3 measures
are introduced to evaluate the spatial and temporal accuracy
of the result. Two of the measures, however, are rather
specific to the football tracking problem, and the more
general measure, the “identity tracking performance,” does
not consider some of the basic types of errors made by
multiple target trackers, such as false positive tracks or
localization errors in terms of distance or overlap. This limits
the application of the presented metric to specific types of
trackers or scenarios.
Nghiem et al. in [13] present a more general framework
for evaluation, which covers the requirements of a broad
range of visual tracking tasks. The presented metrics aim
at allowing systematic performance analysis using large