Distributed Kalman filter-based speaker tracking in microphone array
networks
Ye Tian, Zhe Chen, Fuliang Yin
⇑
School of Information and Communication Engineering, Dalian University of Technology, Dalian 116023, China
article info
Article history:
Received 8 February 2014
Received in revised form 15 August 2014
Accepted 3 September 2014
Available online 28 September 2014
Keywords:
Distributed Kalman filter
Microphone array network
Time delay of arrival
abstract
Using a microphone array network, a speaker tracking method based on distributed Kalman filter (DKF)
in a noisy and reverberant environment is proposed. Firstly, the time delay of arrival (TDOA) in each
microphone pair is estimated by the generalized cross-correlation (GCC) method. Next, the Langevin
model is used as state equation to model the speaker’s movement, meanwhile the measurement
equations with true TDOA are deduced by linearizing the TDOA model. Finally, the moving speaker’s
positions are estimated by distributed Kalman filtering in a microphone array network. The proposed
method is scalable. It can obtain a trajectory of the speaker’s movement smoothly with excellent tracking
accuracy. Simulation results verify the effectiveness of the proposed method.
Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction
Speaker localization and tracking with microphone arrays is
useful in many applications, including audio/video conference sys-
tem [1], smart video monitor system [2], robot, human–machine
interface, far distance speech capture and recognition, etc.
The topics of speaker localization [3–5] and speaker tracking
[6–11] have been studied for many years. However, traditional
methods usually require dedicated devices, and need to know the
positions and geometry structure of microphone arrays.
In practice, it is possible that the geometry structure of
microphone arrays is irregular and the positions of them are also
distributed randomly. The geometry structure and the positions
of microphone arrays can be obtained by self-calibration methods
[12,13]. To determine speaker’s positions in spatially irregular
microphone arrays, the distributed speaker localization methods
[14,15] were proposed recently. In [16], the global coherence field
(GCF) method was proposed, which was defined over the space of
possible sound source locations to represent the plausibility that a
sound source was active at a given point. In [17,18], the GCF was
extended to Oriented GCF (OGCF) which was allowed to estimate
both the position and the head orientation of a single active
speaker. In [19], multiple speaker localization with the GCF based
on acoustic map de-emphasis was proposed. In [14,20], the steered
response power–phase transform (SRP–PHAT) method and its
modification were proposed, which steered the microphone array
to all potential source positions to search for the candidate source
position. In [21,22], the localization performance of the SRP–PHAT
method was significantly improved by the selection of suitable
microphone pairs in a microphone array network. In [15], Canclini
et al. proposed a distributed speaker localization algorithm by min-
imizing a cost function, which was a fourth-order polynomial
obtained by combining hyperbolic constrains from multiple sen-
sors. However, these distributed speaker localization methods only
depend on signals in the current frame. They are not yet robust
against high room reverberation, and even fail under impulse noise
conditions, such as door shutting. Further, in these localization
methods spurious sources may be generated in noisy and reverber-
ant environments, sometimes stronger than true speech sources.
To deal with these problems, the speaker tracking methods are
used to estimate speaker’s positions, which depend on not only
the current measurement but also a series of past measurements.
In this way, a smoothed trajectory of the speaker’s movement
can be obtained robustly.
Distributed state estimate algorithms such as distributed
Kalman filter (DKF) [23,24] have received great attention recently.
In the DKF, each node in sensor networks is required to estimate
the state of a linear dynamic system by sharing data only with
its neighboring nodes each time. Being advantageous over the cen-
tralized state estimation algorithms, the DKF do not require a fuse
center and is hence robust against its failure.
In this paper, the DKF theory is introduced into a distributed
microphone array network and a DKF-based speaker tracking
method in a noisy and reverberant environment is proposed.
http://dx.doi.org/10.1016/j.apacoust.2014.09.004
0003-682X/Ó 2014 Elsevier Ltd. All rights reserved.
⇑
Corresponding author.
E-mail addresses: y.tian@mail.dlut.edu.cn (Y. Tian), zhechen@dlut.edu.cn
(Z. Chen), flyin@dlut.edu.cn (F. Yin).
Applied Acoustics 89 (2015) 71–77
Contents lists available at ScienceDirect
Applied Acoustics
journal homepage: www.elsevier.com/locate/apacoust