834 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005
assumption A3 guarantees that is invertible for all
. Although there is no physical justification for the
second part of A3, we use it to resolve the inherent scaling am-
biguities that exist in our algorithm to identify
.
Let
represent the true cross-spectral density matrix
of the observed signal at frequency
and time epoch . Based
on the above assumptions, we have
(3)
For
, is given as the smallest eigenvalue of the matrix
. Therefore a noise-free cross-spectral density matrix
can be obtained as follows:
(4)
In practice, we use the discrete frequency variable
instead of the continuous variable to calculate an estimate
of . The estimation of is dis-
cussed in the Appendix.
III. J
OINT DIAGONALIZATION
PROBLEM
The first stage of the proposed algorithm employs joint diag-
onalization of the set of estimated cross power spectral density
matrices
, at each frequency ,
over
epochs, to estimate the mixing system up to a permuta-
tion and diagonal scaling ambiguity at each frequency bin.
The joint diagonalization problem was first introduced by
Flurry [17] and later on was used as a tool for solving the BSS
problem by [4], [18]–[22]. The problem is expressed as finding
a single matrix
that jointly (approximately) diagonalizes the
set of matrices
. The most common criterion used
for joint diagonalization is the one given as
(5)
where
for an arbitrary matrix is defined as the sum
squares of the off-diagonal values of the matrix
. Another
common criterion is the following least-squares cost function
(6)
where
is diagonal for all , and denotes the Frobe-
nius norm.
By using a joint diagonalization procedure, the mixing
system
can be estimated up to a frequency dependent
permutation
and frequency dependent scale ambi-
guity
. That is, at each frequency we substitute the
set of matrices
in (6) with the set ,
,defined by (4). (Note that this sequence of
matrices is a consequence of the nonstationarity of the sources).
Then, if we find a matrix
and diagonal matrices
such that
with the scale constraint , where is the
column of , then the following relation holds:
(7)
where the diagonal entries of
are of the form , where
is a phase [15]. A procedure to resolve the frequency depen-
dent permutation ambiguity is given in Section V. A partial so-
lution to the frequency-dependent phase ambiguity problem is
discussed in Section IV-A.
We now discuss the procedure for determining the unmixing
system
given the , . This proce-
dure has been previously used, e.g., in [23]. For this presenta-
tion, we assume the permutations and scale ambiguities are cor-
rectly resolved. At each frequency, we calculate
from
(8)
where
is the pseudo inverse of the matrix .For
, [24], where is the
identity matrix. Since the composite multi-dimensional mixing-
unmixing system is an identity at each frequency, the sources
are recovered at the outputs at each frequency. The unmixing
system
is then formed as the inverse DFT of ,
. In general, the is neither causal nor of finite
length.
4
may be made causal by imposing a suitable delay.
The effect of time-domain aliasing error in the inverse DFT,
induced by the infinite length of
, can be made negligible
by choosing
, which is the number of frequency bins and also
the length of
, to be much greater than , the length of
.
IV. A
LGORITHM
Based on the joint diagonalization principle, we propose
the following least-squares joint diagonalization criterion by
analogy to (6), for the case when a sample estimate
of each is available:
(9)
where
is a diagonal matrix representing the unknown
cross-spectral density matrix of the sources at epoch
.
In [4], a similar criterion has been proposed that directly es-
timates the separating matrix
. Using the criterion in
(9) allows us to implement the ALS algorithm as will be de-
scribed later in this section. In [4], an additional FIR constraint
on the length of the un-mixing matrix is required to prevent ar-
bitrary frequency dependent permutations. However, as shown
in [25] and [26], such a constraint is not effective for long re-
verberant environments and the performance of the algorithm
may degrade as the length of the separating filter increases. In
the proposed method we do not require an FIR length constraint
on the mixing model, mainly because we use a different dyadic
approach for resolving the permutation problem, that exploits
the inherent nonstationarity of the sources.
For the first stage of the algorithm we optimize the criterion
given by (9) using an alternating least-squares (ALS) approach.
The basic idea behind the ALS algorithm is that in the opti-
mization process we divide the parameter space into multiple
4
However, since the energy in
W
(
t
)
must be bounded, this signal must decay
toward zero as time increases.