810 W.-T. Zhang et al. / Digital Signal Processing 22 (2012) 808–819
where I
n
r
is
n
r
× n
r
identity matrix, the semi-unitary constraint
U
H
U = I
n
r
imposedin(5)stemsfromthederivationoftheJBD
NU
algorithm in which the partitions of the block diagonalizer are ob-
tained from an eigenvalue decomposition (EVD), so they are semi-
unitary. Now let us consider the matrix B given by
B =[U
1
, U
2
,...,U
r−1
, U
r−1
], (6)
where U
i
∈ U
W
, ∀i ∈ I
r−1
. It is readily to show that
C(B) =
K
k=1
OffB
B
H
R
k
B
2
F
=
K
k=1
r
i=1
r
j=1, j=i
B
H
i
R
k
B
j
2
F
=
K
k=1
r
−1
i=1
r
−1
j=1, j=i
U
H
i
R
k
U
j
2
F
+
K
k=1
r
−1
i=1
U
H
r
−1
R
H
k
U
i
2
F
+
K
k=1
r
−1
j=1
U
H
j
R
k
U
r−1
2
F
. (7)
Since ∀i ∈ I
r−1
, ∀m ∈ I
n
r
,wehaveu
mH
i
W = 0 according to (5),
where u
m
i
denotes the
m
-th column of U
i
.Sothefirstterminthe
last equation at the right-hand side of (7) reads
K
k=1
r
−1
i=1
r
−1
j=1, j=i
U
H
i
R
k
U
j
2
F
=
r−1
i=1
r
−1
j=1, j=i
n
r
m=1
n
r
n=1
K
k=1
u
mH
i
WD
k
W
H
u
n
j
2
= 0. (8)
Similarly, it can be shown that C(B) = 0, which implies that the
matrix given by (6) is a global optimal solution of JBD
NU
algorithm.
But obviously, this solution is a singular solution because the last
two partitions of the block diagonalizer given by (6) are identical.
Actually, it should be pointed out that even if the matrix W
is nonsingular, the JBD
NU
algorithm may converge to an ill-
conditioned solution. This is because that the minimization of cost
function (3) cannot guarantee the independence between the par-
titions of block diagonalizer.
In addition, it should be stressed that the inherent ambiguities
exist in JD algorithms also appear in JBD algorithms. If
and
designate, respectively, a block permutation matrix and a nonsin-
gular block diagonal matrix, where a block permutation matrix is
also a permutation matrix whose structure is stated as follow. In
each block row and block column, there is only one nonzero square
block and the nonzero block is filled by an arbitrary permutation
matrix, e.g.
⎡
⎢
⎢
⎢
⎢
⎢
⎣
000100
001000
100000
010000
000010
000001
⎤
⎥
⎥
⎥
⎥
⎥
⎦
.
Thus the relation
R
k
= AD
k
A
H
= (A)
H
−1
D
k
−H
H
H
A
H
(9)
shows that A (or the corresponding B) and the block diagonal ma-
trix set
D are not uniquely defined. Therefore, for the JBD problem,
what we can do is to estimate a joint block diagonalizer, say
ˆ
B,
such that
ˆ
B = B. (10)
In the sequel, we will derive our new algorithm to eliminate the
ill-conditioned solutions.
2.3. Minimization algorithm
As stated previously, ill-conditioned solutions obtained by min-
imizing criteria (3) stem from the lack of any constraint on the
joint block diagonalizer B. Therefore, the resulting algorithm can-
not guarantee that the partitions of B are uncorrelated with each
other. This motivates the idea that some penalty term should be
used to eliminate the ill-conditioned solutions. To this end, we pro-
pose to consider the following cost function:
J (B) = J
1
(B) − J
2
(B), (11)
where
J
1
(B) =
K
k=1
OffB
B
H
R
k
B
2
F
(12)
represents the off-diagonal-block-error term and
J
2
(B) = log
det(B)
(13)
represents the penalty term, which is expected to rectify B to some
well-conditioned solution.
Note that the penalty term (13) will tend to negative infin-
ity when B is close to singular, this correspondingly cause the
cost function (11) tend to positive infinity. So minimizing (11) at
least guarantees that the joint block diagonalizer B is updated in
a direction apart from singular, which is expected to result in a
well-conditioned solution.
In order to minimize (11) with respect to B,weadoptacyclic
minimizer technique [29] to complete the optimization task. The
basic idea behind this strategy is that in the optimization process
we divide the parameter space into multiple sets. At each itera-
tion of the algorithm we minimize the criterion with respect to
one set conditioned on the previously estimated sets of parame-
ters. The newly estimated set is then used to update the remaining
sets. This process continues until convergence is achieved. The ad-
vantage of using cyclic minimizer technique is that the resulting
algorithms usually have fast convergence and there are no param-
eters to adjust.
Let us first divide B into r partitions as B
=[B
1
, B
2
,...,B
r
],
where
∀i ∈ I
r
, B
i
=[b
1
i
, b
2
i
,...,b
n
i
i
] is a block matrix of dimension
M
× n
i
. Now we rewrite (12) as
J
1
(B) =
K
k=1
r
i=1
r
j=1, j=i
B
H
i
R
k
B
j
2
F
=
K
k=1
r
i=1
r
j=1, j=i
tr
B
H
i
R
k
B
j
B
H
j
R
H
k
B
i
=
r
i=1
tr
B
H
i
K
k=1
R
k
r
j=1, j=i
B
j
B
H
j
R
H
k
B
i
.
(14)
Let B
(i)
=[B
1
,...,B
i−1
, B
i+1
,...,B
r
] be the M × (M − n
i
) matrix
obtained by deleting the i-th block from B.Wehave
r
j=1, j=i
B
j
B
H
j
= BB
H
− B
i
B
H
i
= B
(i)
B
(i)H
. (15)
Substituting (15) into (14), then (14) can be written as