W. Luo, J. Xing, A. Milan et al. Artificial Intelligence 293 (2021) 103448
Table 3
Comparison
between DBT and DFT. Adapted from [51].
Item DBT DFT
Initialization automatic, imperfect manual, perfect
# of objects varying fixed
Applications specific type of objects (in most cases) any type of objects
Advantages ability to handle varying number of objects free of object detector
Drawbacks performance depends on object detection manual initialization
Fig. 2. An illustration of online (left) and offline (right) tracking. (For interpretation of the references to color in this figure, the reader is referred to the
web version of this article.)
Table 4
Comparison
between online and offline tracking.
Item Online tracking Offline tracking
Input Up-to-time observations All observations
Methodology
Gradually extend existing trajectories
with current observations
Link observations into trajectories
Advantages Suitable for online tasks Obtain global optimal solution theoretically
Drawbacks Suffer from shortage of observation Delay in outputting final results
object detector is trained in advance, the majority of DBT focuses on specific kinds of targets, such as pedestrians, vehicles
or faces. Second, the performance of DBT highly depends on the performance of the employed object detector.
Detection-free tracking. As shown in Fig. 1 (bottom), DFT [54–57]requires manual initialization of a fixed number of
objects in the first frame, then localizes these objects in subsequent frames.
DBT is more popular because new objects are discovered and disappearing objects are terminated automatically. DFT
cannot deal with the case that objects appear. However, it is free of pre-trained object detectors. Table 3 lists the major
differences between DBT and DFT.
2.2.2. Processing mode
MOT can also be categorized into online tracking and offline tracking. The difference is whether observations from future
frames are utilized when handling the current frame. Online, also called causal, tracking methods only rely on the past
information available up to the current frame, while offline, or batch tracking approaches employ observations both in the
past and in the future.
Online tracking. In online tracking [54,58,55,56,59,60,165,160], the image sequence is handled in a step-wise manner,
thus online tracking is also named as sequential tracking. An illustration is shown in Fig. 2 (top), with three objects (different
circles) a, b, and c. The green arrows represent observations in the past. The results are represented by the object’s location
and its ID. Based on the up-to-time observations, trajectories are produced on the fly.
Offline tracking. Offline tracking [53,61,49,62,48,1,63–66]utilizes a batch of frames to process the data. As shown in Fig. 2
(bottom),
observations from all the frames are required to be obtained in advance and are analyzed jointly to estimate the
final output. Note that, due to computational and memory limitation, it is not always possible to handle all the frames at
once. An alternative solution is to split the data into shorter video clips, and infer the results hierarchically or sequentially
for each batch. Table 4 lists the differences between the two processing modes.
2.2.3. Type of output
This criterion classifies MOT methods into deterministic ones and probabilistic ones, depending on the randomness of
output. The difference between these two types of methods primarily results from the optimization methods adopted as
mentioned in Section 2.1.
Stochastic tracking. The output results of stochastic tracking vary from time to time. For example, in the case of
detection-free tracking, the bounding box results are different if we utilize particle filter for inference. The difference results
from the randomness of the generation of particles in the processing. Even in the case of detection-based tracking, some
5