IEEE Communications Magazine • November 2010
166
SENSING
So far it was assumed that y is available, and
that one can simply apply the transform into
the domain of {ψ
k
}
n
k=1
to determine which x
k
are relevant (non-zero). Although this case
does exist and is important for some forms of
data-compression, the real application of com-
pressive sensing is the acquisition of the signal
from m, possibly noisy, measurements z
l
= φ
l
H
y
+ v
l
for l = 1, …, m, where here it is assumed
that v
l
is zero-mean complex Gaussian dis-
tributed with variance N
0
and the noiseless case
is included for N
0
→ 0. The signal acquisition
process can now be written using the m × n
matrix A,
where
Φ
= [φ
1
, φ
2
, …, φ
m
] is an n × m matrix
and z = [z
1
, z
2
, …, z
m
]
T
is the stacked measure-
ment vector. Since this is a simple linear Gaus-
sian model, it is well posed as long as A is at
least of rank n. By well posed we simply mean
that there exists some estimator xˆ (or yˆ for that
matter), whose estimation error is proportional
to the noise variance; therefore as the noise vari-
ance approaches zero, the estimation error does
as well. This generally requires at least m ≥ n
measurements if y is unconstrained in CC
n
.
SIGNAL RECOVERY AND RIP
The novelty in compressive sensing is that for
signals y that are s-sparse in some {ψ
k
}
n
k=1
, less
measurements are sufficient to make this a well
posed problem. The requirement on A to have at
least rank n is replaced by the restricted isome-
try property (RIP) (first defined in [11]) that we
will explain in the following.
For any matrix A with unit-norm columns one
can define the restricted isometry constants δ
s
as
the smallest number such that, |Ax|
2
≥ (1 – δ
s
)|x|
2
and |Ax|
2
≤ (1 + δ
s
)|x|
2
for any x that is s-sparse.
This can be seen as conserving the (approxi-
mate) length of s-sparse vectors in the measure-
ment domain and effectively puts bounds on the
Eigenvalues of any s × s submatrix of A
H
A.
Now assuming that under all s-sparse vec-
tors, one chooses the estimate xˆ that has the
smallest distance to the observations, |z – Axˆ|
2
,
it is easily shown that the estimation error is
bounded by E{|x – xˆ|
2
} ≤ 2mN
0
/(1 – δ
2s
). This
uses the fact that the estimation error x
~
:= x –
xˆ is 2s-sparse. So we see that the signal recov-
ery problem is well-posed as long as δ
2s
< 1,
but since the δ
s
are monotonic in s, δ
s
≤δ
s+1
,
and usually increase gradually, it is commonly
said that A obeys the RIP if δ
s
is not too close
to one.
In case of approximately sparse signals, the
error caused by noisy observations is additive
with the error caused by the approximation as s-
sparse. Therefore a good choice of s needs to
consider the noise level N
0
, since a trade-off
exists between choosing a smaller s that increas-
es the approximation error, but decreases the
error caused by the noise due to the monotonic
nature of the δ
s
and vice-versa.
SENSING MATRICES
While evaluating the RIP for a particular matrix
at hand is (at worst) an NP-hard problem, there
are large classes of matrices that obey the RIP
with high probability, that is δ
s
<< 1 for any s
<< m. Specifically for random matrices like i.i.d.
Gaussian or Bernoulli entries, or randomly
selected rows of an orthogonal (n × n) matrix
(e.g., the DFT), it can be shown that for m ≥
Cslog(n/s) measurements the probability that δ
s
≥δdecreases exponentially with m and δ. With
other words, as long as one takes enough mea-
surements, i.e., increase m, the probability of any
such matrix obeying the RIP for a given thresh-
old δ can be made arbitrarily small. Although
the constant C is only loosely specified for the
various types of matrices, the fact that the prob-
ability decreases exponentially is encouraging as
to the number of required measurements. Fur-
thermore it is important to consider that these
bounds are on worst cases, so that on the aver-
age much fewer measurements, m, will be suffi-
cient.
ALGORITHMS
Previously we considered the estimator that
chooses the solution with minimum distance
from the observations between all s-sparse vec-
tors in CC
n
to show that the average estimation
error is bounded. This is in essence a combina-
torial problem, which has exponential complexi-
ty. In case s is not known, or for an
approximately sparse signal, a joint cost func-
tion has to be used that penalizes less sparse
solutions vs. a better fit of the observations.
This can be achieved using a Lagrangian formu-
lation adding a penalty proportional to s, which
is usually formulated using the zero-norm, ||x||
0
that counts the non-zero elements in x. This fur-
ther increases the size of the combinatorial
problem as all s-sparse vectors for various val-
ues of s have to be considered now.
Other algorithms that reconstruct a signal
taking advantage of its sparse structure have
been used well before the term compressive
sensing was coined. The surprising discovery is
that it can be shown that several of these algo-
rithms will — under certain conditions — render
the same solution as the combinatorial approach.
These conditions largely amount to tighter con-
straints on the sparsity of x beyond identifiabili-
ty. We briefly introduce the two main types of
algorithms.
CONVEX/L
1
-BASED
Since the exact formulation using the zero-norm
||x||
0
is not amenable to efficient optimization, an
immediate choice is its convex relaxation, lead-
ing to the following Lagrangian formulation,
where the l
1
-norm is defined as ||x||
l
1
= ∑
n
k=1
|x
k
|.
While the l
1
-norm has been used in various
applications to promote sparse solutions in the
past [4, references therein], it is now largely pop-
ular under the name Basis Pursuit (BP), as intro-
What all these
algorithms have in
common is that they
lead to convex
optimization
problems, which can
be solved efficiently
with advanced
techniques, such as
interior-point
methods, projected
gradient methods,
or iterative
thresholding.
BERGER LAYOUT 10/20/10 3:40 PM Page 166