WA NG et al.: FAST DECODING AND HARDWARE DESIGN FOR BINARY-INPUT COMPRESSIVE SENSING 593
Inspired by the simi larity between CS sampling and en-
coding of channel codes, some research has started to relate
CS to channel coding. Sarvotham et al. [36] design Sudocodes,
which can be used as erasure codes for real-valued data. By
limiting
to sparse binary matrixes, Sudo codes dramatically
reduce both e n coding (sampling ) and decoding (recovery)
complexity. In particular, the worst-case decoding complexity
is only
. By using bipartite expander graphs,
Xu et al. further reduce decoding com plex ity to
[37], [38]. In addition to the binary ,binary is also con-
sidered in the literature. Wu et al. [39] characterize the CS
decoding process by a set of differential equations and derive
its closed-form form ulat ion. Liu et al. [40] improve the formu-
lation by lev eraging th e asymmetrical property on decoding
of bits “1” and “0.” However, these works all assume erasure
channels (i.e., a measurement i s either correctly received or
completely lost) and decoding in noisy channels (i.e., measure-
ments are contaminated with noise) is not discussed.
Although the problem of CS decoding from noisy m easure-
ments has been heavily discussed [41]–[44], CS -BP [45], [46]
is the first work that reduces decoding com plexity by adding
certain restrictions. In particular, CS-BP adopts a sparse
whose nonzero entries are draw n from Rademacher distribu-
tion, and adopts m essage passing algorith ms for decoding.
As such, CS-BP decoding algorithm uses
mea-
surements and
computation. Please note that
although the complexity is h igher than that of Sudocodes, it is
achieved in noisy settings. Cui et al. [7] apply CS to practical
wireless communications, and design a sparse integer
from
the channel capacity perspective. The decoding algorithm is
a variation of CS-BP and therefore has similar com pl exity as
CS-BP. In this application of CS, the input signal
is a block
of bits, so we name it binary input CS or BiCS.
Although the message passing algori thm can tolerate additi ve
noise in measurements, the messages at weighted sum nodes
have to be calcu lated by convolution of probabilities. It greatly
increases the decoding complexity com pared with the message
passing algorithms for binary channel codes. Furthermore, the
fast decoding algorith ms for channel codes h ave n ot been ap-
plied to CS decoding yet.
C. Fast Decoding of Channel Codes
Due to the sim ilarity between BiCS and channel codes, we re-
view the fast decoding algorithms for two typical channel codes,
namely Turbo codes and LDPC codes. The two key techniques
to realize fast decoding are to use LLR as me ssages and to use
various approximations for probability computation.
The optimal decoding algorithm for Turbo codes is maximum
a poster ior i probability (MAP) estimation. Its per for mance is
close to Shannon limit in terms of bit er ror rate [12 ]. However,
the MAP algorithm needs to calculate a l arge number of mul-
tiplications of probabilities, which not only request inten sive
computation but also result in numerical representation prob-
lems. The key idea to solv e these problems is to calculate them
in the log domain . This variant is well known as the Log-MAP
algorithm [ 47] , where the multip lications of probabilities are
converted to the additions of their log values.
Furthermore, the
function in the Log-MAP algorithm
can be approximated by a maxim um term and a correction term.
The Max-Log-MAP algorithm [48], [49] is proposed to only
keep the maximum term and discard the correction term. Thus
it is suboptimal and has performance degradation about 0.5 dB
[47]. Subsequently, there are more approach es reported to im-
prove the approximation. For example, Vogt et al. introduce the
scaling op eration [50]. Cheng et al. [51] and Wang et al. [52]
propose to approxim ate the c orrection term as a linear function
and a piece-wise function respectively.
The MAP algorithm can also be applied to the decoding of
binary LDPC codes and be calculated in log domain [53]. Later
on, the Log-MAP algorithm is further extended f or LDPC codes
over
with (known as nonbinary LDPC codes)
[54]. Similar to Turbo codes, different approximations used in
the Log-MAP alg orith ms for LDPC codes provide different
trade-offs among implementation feasibility, comp utat ion al
complexity and num erical stability.
BiCS differs from Turbo codes and LDPC codes in that t he
operations to generate symbols are arithm etic not logical. Since
the m essages at w eig hted sum nodes are calculated by convolu-
tion, directly using LLR as m essages cannot bring any benefit
to the decoding of CS. This paper is the exact one that solves
this problem. To our best knowledge, there are few papers that
study the fast massage passing algorithm for CS.
D. Hardware Design for LDPC Codes
Since both LDPC codes and CS can be represented by a bi-
partite graph, we follow the hardware design for LDPC codes.
The architectures of LDPC hardware decoders can be catego-
rized into two classes: full-parallel decoding and partial-parallel
decoding.
In full-parallel decoding [55], each row and each column of
the parity check ma tr ix is directly mapped to a different pro-
cessing unit and all these pro cessing units operate in parallel.
In partial-parallel decoding [56], the pa ri ty check matrix is par-
titioned into som e nonoverlap regions such that a set of check
nodes and variable nodes are updated per cycle. In general, most
of the partial-parallel decoders have lower decoding throughput
and higher energy dissipation than the full-parallel decoders.
However, the full-parallel decoders have much larger silicon
area than the partial-parallel decoders [57].
Unfortunately, high parallelism inevitably introdu ces the col-
lision problem in memory access. This is a well-known problem
that has been addressed in the parallel implementation [58].
Two main approaches are proposed to deal with the collisio n
problem. The first one is to desig n collision fr ee codes. The
second one is to design a decoder architecture able to avoid or
at l east m i tig ate collision effects. The second approach is well
suited to flexible and general architectures [59].
Current researches on decoder architecture and hardware im-
plementation for L DPC codes have m ade them suitable to high-
speed communication in modern wireless systems. In this paper,
we will propose the first hardware d esign for fast BiCS de-
coding. It is a small but solid step to make it applicable to prac-
tical systems.