STUDER, FATEH, AND SEETHALER 3
Fig. 2. PER versus SNR performance of a coded MIMO system using various
iterative (using I ∈ {2, 4, 8} number of iterations) and non-interative (I = 1)
MIMO detection algorithms.
enables to achieve 10% PER at 12 dB SNR for both SISO
STS-SD and SISO MMSE-PIC, compared to the more than
18 dB SNR that is required by hard-output SD and the k-Best
algorithm. Hence, four iterations improve the performance by
more than 6 dB SNR at IEEE 802.11n-relevant PERs. Note
that the SNR-performance gain from four to eight iterations
is only 0.25 dB, which indicates that performing more than
four iterations does not pay off in practical systems. We
emphasize that the SNR-performance advantage of iterative
MIMO decoding ultimately leads to an increased system
throughput (as higher data-rates can be used reliably at the
same SNR), better coverage, and improved range (since the
lowest data-rate can be decoded reliably at lower SNR).
We finally note that SISO STS-SD outperforms SISO
MMSE-PIC for a small number of iterations. However, the
SISO STS-SD algorithm requires i) roughly 8× higher compu-
tational complexity than SISO MMSE-PIC (cf. Section V-C3)
and ii) SD-based algorithms exhibit—in contrast to the other
considered algorithms—a non-constant throughput strongly
depending on the SNR and the channel realization. Since
practical MIMO receivers need to cope with varying channel
conditions and transmission rates, the non-constant throughput
renders implementations of SD extremely difficult (see [7] for
a corresponding discussion). Both drawbacks associated with
SD finally led to our decision to favor the SISO MMSE-PIC
algorithm for implementation.
C. SISO MMSE-PIC Algorithm
Even for a small number of spatial streams (say M
T
> 2),
exact computation of the LLRs in (1) entails prohibitively
high computational complexity. Therefore, a variety of sub-
optimum algorithms has been proposed in the literature,
e.g., [13], [15]. In this paper, we focus on the SISO MMSE-
PIC algorithm initially proposed by Wang and Poor in
1999 [15] in the context of multi-user detection. Since then,
various algorithm optimizations have been proposed [21]–[24].
The following five paragraphs summarize the SISO MMSE-
PIC algorithm as described in [23].
1) Computation of Soft-Symbols: The algorithm starts by
computing estimates ˆs
i
for i = 1, . . . , M
T
(referred to as “soft-
symbols”) for the transmitted symbols s
i
according to [21]
ˆs
i
= E[s
i
] =
X
a∈O
P[s
i
= a] a (2)
where P[s
i
= a] =
Q
Q
b=1
P[x
i,b
= k] denotes to the a-priori
probability of the symbol a ∈ O with k = [a]
b
referring to the
bth bit associated with the symbol a. The reliability of each
soft-symbol ˆs
i
is characterized by its variance
E
i
= Var[s
i
] = E
h
|e
i
|
2
i
(3)
with e
i
= s
i
− ˆs
i
. The a-priori probabilities involved in the
computation of the soft-symbols (2) and their variances (3)
are calculated on the basis of the a-priori LLRs L
A
i,b
delivered
by the channel decoder.
3
According to [25], we have
P[x
i,b
= k] =
1
2
1 + (2k − 1) tanh
1
2
L
A
i,b
(4)
which can be approximated efficiently in hardware through
table look-ups.
As observed in [23], using intrinsic a-priori LLRs in the
computation of (4) instead of the extrinsic ones leads, in gen-
eral, to significantly better error-rate performance of the SISO
MMSE-PIC algorithm. We therefore exclusively use intrinsic
a-priori LLRs for the computation of (4) throughout the pa-
per. We finally note that for most Gray mappings (including
that used in IEEE 802.11n) the soft-symbols in (2) and their
corresponding variances in (3) can be computed efficiently in
hardware using the method proposed in [24].
2) Parallel Interference Cancellation (PIC): With the aid
of the previously computed soft-symbols (2), the algorithm
considers each of the i streams separately and cancels the
interference in y induced by all other streams j 6= i as follows:
ˆ
y
i
= y −
X
j,j6=i
h
j
ˆs
j
= h
i
s
i
+
˜
n
i
(5)
where
˜
n
i
=
P
j,j6=i
h
j
e
j
+ n corresponds to the remaining
noise-plus-interference (NPI).
3) MMSE Filter-Vector Computation: In order to reduce
the NPI in each
ˆ
y
i
of (5), a linear MMSE filter is used. These
M
T
MMSE filter vectors are computed according to [21]
˜
w
H
i
= E
s
h
H
i
e
A
−1
i
(6)
where
e
A
i
= H
e
Λ
i
H
H
+ N
0
I
M
R
(7)
and
e
Λ
i
being an M
T
× M
T
diagonal matrix having entries
˜
Λ
j,j
=
(
E
j
, j 6= i
E
s
, j = i.
It is important to realize that (6) requires the inversion of a
M
R
× M
R
-dimensional matrix for each of the M
T
streams,
for each received vector, and for each iteration, which in-
hibits an efficient implementation in hardware. In order to
substantially reduce this computational burden, a novel low-
complexity method is proposed in Section III.
3
The LLRs are initialized as L
A
i,b
= 0, ∀i, b, in the first iteration.