A steered response powe r approach with trade-off prewhitening
for acoustic source localizat ion
Hongsen He,
a)
Xueyuan Wang, Yingyue Zhou, and Tao Yang
School of Information Engineering and Robot Technology Used for Special Environment Key Laboratory
of Sichuan Province, Southwest University of Science and Technology, Mianyang, 621010, China
(Received 12 September 2017; revised 31 December 2017; accepted 25 January 2018; published
online 16 February 2018)
This paper proposes a steered response power (SRP) approach with trade-off prewhitening to
acoustic source localization. To obtain effective compromise prefiltering of microphone signals, the
sparsity of speech amplitude spectrum is used to establish a convex-constraint linear prediction
model, which is solved by a split Bregman method. The presented approach unifies the traditional
SRP and steered response power via phase transform prefiltering methods and achieves a good
compromise between them from the perspective of localization performance. The superiority of the
proposed method is demonstrated in noisy and reverberant environments.
V
C
2018 Acoustical Society of America. https://doi.org/10.1121/1.5024652
[KGS] Pages: 1003–1007
I. INTRODUCTION
Acoustic source localization, which is to estimate the
position coordinates or direction of arrivals (DOAs) of sound
sources, is critical in most acoustic applications such as
sonar detection, hands-free voice communication, human-
computer interface, and industrial damage detection systems.
Microphone arrays serve as the spatial aperture needed to
process the auditory scene and yield source location esti-
mates. In acoustic source localization techniques based on
microphone arrays, the strategy based upon maximizing the
steered response power (SRP) of a beamformer
1
is an impor-
tant source localization approach. It has been experimentally
proved that the SRP technique is immune to noise, but sensi-
tive to reverberation.
To improve the robustness of SRP in room acoustic
environments, the phase transform (PHAT) prefiltering
2
has
been applied before computing the cross-correlations, and so
the resulting algorithm, which is termed as steered response
power-phase transform (SRP-PHAT),
1,3
obtains the immu-
nity to reverberation since the PHAT weighting whitens
microphone signals to equally emphasize all frequencies.
1
To promote the real-time operation of SRP-PHAT, an
inverse mapping method,
3
which transforms three-
dimensional candidate locations into one-dimensional rela-
tive delays, and a modified SRP-PHAT method with scalable
spatial sampling
4
are presented to reduce the computational
cost, respectively. To further enhance the spatial resolution
of SRP-PHAT, an extended strategy based on an iterative
grid decomposition procedure
5
and a geomet rically sampled
grid method
6
are also proposed from a grid search perspec-
tive, respectively. The localization performance of SRP-
PHAT, however, degenerates under noisy conditions.
In a recent work, the sparsity of the coefficient vector
of a linear predictor is used to construct an ‘
2
=‘
1
-norm
optimization model to prewhiten microphone signals for
time delay estimation (TDE).
7
The sparsity penalty gives
rise to an effective compromise of TDE performance
between noise and reverberation. In this work, we propose
an alternative sparse linear prediction model to prewhiten
microphone signals for acoustic source localization rather
than TDE. We introduce the sparsity of speech spectrum
to the least-squares criterion to form a mixed norm optimi-
zation model, which is solved by a split Bregman method.
The prediction error signals are then used to establish a
trade-off prewhitening based steered response power
(TOP-SRP) estimator to measure the DOA of a sound
source. This new means unifies the SRP and SRP-PHAT
methods from a DOA estimation performance perspective.
The effectiveness of the developed algorithm is validated
in noisy and reverberant environments.
II. ACOU STIC SOURCE LOCALIZATION VIA TOP-SRP
A. Optimization model
Assume that there is a broadband sound source in the far
field which radiates a plane wave. A microphone array with
M elements is exploited to capture the sound signals. We
employ a linear predictor to prefilter microphone signals for
acoustic source localization. To this end, we use the past
samples of channel m ðm ¼ 1; 2; …; MÞ to predict its current
sample x(n) as follows:
xðnÞ¼
X
K
k¼1
a
k
xðn kÞþeðnÞ; (1)
where a
k
; k ¼ 1; 2; …; K, are prediction coefficients, K is the
length of the predictor, and e (n ) is the prediction error. Note
that we have dropped the subscript m for the simplicity of
notation. In a vector/matrix form, the signal in Eq. (1) can be
written as
xðnÞ¼X ðnÞa þ eðnÞ; (2)
a)
Also at: State Key Laboratory of Acoustics, Institute of Acoustics, Chinese
Academy of Sciences, Beijing 100190, China. Electronic mail:
hongsenhe@gmail.com
J. Acoust. Soc. Am. 143 (2), Febr uary 2018
V
C
2018 Acoustical Society of America 10030001-4966/2018/143(2)/1003/5/$30.00