Target Tracking with Kalman Filtering, KNN and LSTMs
Dan Iter
daniter@stanford.edu
Jonathan Kuck
kuck@stanford.edu
Philip Zhuang
pzhuang@stanford.edu
December 17, 2016
Abstract
Tracking an unknown number of targets
given noisy measurements from multiple sen-
sors is critical to autonomous driving. Rao-
Blackwellized particle filtering is well suited
to this problem. Monte Carlo sampling is
used to determine whether measurements are
valid, and if so, which targets they originate
from. This breaks the problem into single
target tracking sub-problems that are solved
in closed form (e.g. with Kalman filtering).
We compare the performance of a traditional
Kalman filter with that of a recurrent neu-
ral network for single target tracking. We
show that LSTMs outperform Kalman filter-
ing for single target prediction by 2x. We
also present a unique model for training two
dependent LSTMs to output a Gaussian dis-
tribution for a single target prediction to be
used as input to multi-target tracking. We
evaluate the end to end performance of an
LSTM and a Kalman filter for simultaneous
multiple target tracking. In the end to end
pipeline, LSTMs do not provide a significant
improvement.
1. Introduction
We address the problem of tracking an unknown num-
ber of targets given measurements from multiple noisy
sensors. Target tracking is a critical problem for au-
tonomous driving. Combining information from dif-
ferent types of sensors (e.g. radar and cameras) is im-
portant for reliable and accurate tracking performance
in real world settings.
It has been shown that a Rao-Blackwellized particle
filter can be used in the multi-target tracking setting
[19]. In this framework, Monte Carlo sampling is used
to determine whether measurements are valid, and if
so, which targets they originate from. This breaks the
problem into single target tracking subproblems that
are solved in closed form by Kalman filtering.
In the real world cars do not follow linear motion as-
sumptions of the traditional Kalman filter. It is possi-
ble to learn non-linear motion models from data using
a recurrent neural network [1, 17, 15, 11]. We imple-
ment a unique way to train two LSTMs to both pre-
dict the future position of a target based on motion
and to output a distribution of the prediction’s likeli-
hood. The distribution is input into the framework of
a Rao-Blackwellized particle filter.
To show that these methods are effective in a real
world end to end pipeline, we also incorporate several
out of the box methods for object detection (MSCNN
and Regionlets). Predicting target motion from noisy
measurements output by the object detectors is a crit-
ical challenge in this tracking task.
We test our algorithm on the KITTI object track-
ing benchmark [9]. This dataset is composed of video
taken from a car mounted camera while driving around
Karlsruhe, Germany. We compare our RNN’s location
predictions with the naive Kalman filter predictions.
Additionally, we incorporate the RNN predictions into
a Rao-Blackwellized particle filter to evaluate end to
end tracking performance.
2. Related Work
A variety of solutions to the multi-target tracking
problem have been presented, including joint prob-
abilistic data association (JPDA) [2], multiple hy-
pothesis tracking (MHT) [2], and finite set statistics
(FISST) [14]. The key technical difficulty when track-
ing multiple targets is determining which target (or
clutter) each measurement originated from, referred
to as the measurement-target association problem.
Under the general framework of multiple hypothesis
tracking, probabilities are calculated for every possible
combination of measurement-target associations. This
quickly becomes intractable , which leads to sequen-
tial Monte Carlo approaches. We build on the work of
[18, 19], who applied Rao-Blackwellized particle filters
to the multiple target tracking problem.
Tracking-by-detection is a common approach when
tracking objects in video [5, 3, 12, 16, 6]. Object detec-
tion is performed on a frame by frame basis and detec-
1