TO APPEAR IN THE IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2007. 3
as an EM algorithm, in the context of image deconvolution
problems [45], [25]. IST can also be derived in a majorization-
minimization (MM) framework
2
[16], [26] (see also [23],
for a related algorithm derived from a different perspective).
Convergence of IST algorithms was shown in [13], [16].
IST algorithms are based on bounding the matrix A
T
A (the
Hessian of ky − Axk
2
2
) by a diagonal D (i.e., D − A
T
A
is positive semi-definite), thus attacking (1) by solving a
sequence of simpler denoising problems. While this bound
may be reasonably tight in the case of deconvolution (where
R is usually a square matrix), it may be loose in the CS case,
where matrix R usually has many fewer rows than columns.
For this reason, IST may not be as effective for solving (1) in
CS applications, as it is in deconvolution problems.
Finally, we mention matching pursuit (MP) and orthogonal
MP (OMP) [5], [17], [20], [56], which are greedy schemes
to find a sparse representation of a signal on a dictionary of
functions. (Matrix A is seen as an n-element dictionary of
k-dimensional signals). MP works by iteratively choosing the
dictionary element that has the highest inner product with the
current residual, thus most reduces the representation error.
OMP includes an extra orthogonalization step, and is known
to perform better than standard MP. Low computational cost
is one of the main arguments in favor of greedy schemes like
OMP, but such methods are not designed to solve any of the
optimization problems above. However, if y = Ax, with x
sparse and the columns of A sufficiently incoherent, then OMP
finds the sparsest representation [56]. It has also been shown
that, under similar incoherence and sparsity conditions, OMP
is robust to small levels of noise [20].
C. Proposed Approach
The approach described in this paper also requires only
matrix-vector products involving A and A
T
, rather than
explicit access to A. It is essentially a gradient projection (GP)
algorithm applied to a quadratic programming formulation of
(1), in which the search path from each iterate is obtained by
projecting the negative-gradient direction onto the feasible set.
(See [3], for example, for background on gradient projection
algorithms.) We refer to our approach as GPSR (gradient
projection for sparse reconstruction). Various enhancements to
this basic approach, together with careful choice of stopping
criteria and a final debiasing phase (which finds the least
squares fit over the support set of the solution to (1)), are
also important in making the method practical and efficient.
Unlike the MM approach, GPSR does not involve bounds
on the matrix A
T
A. In contrasts with the IP approaches
discussed above, GPSR involves only one level of iteration.
(The approaches in [11] and [36] have two iteration levels—an
outer IP loop and an inner CG, PCG, or LSQR loop. The ℓ
1
-
magic algorithm for (3) has three nested loops—an outer log-
barrier loop, an intermediate Newton iteration, and an inner
CG loop.)
GPSR is able to solve a sequence of problems (1) efficiently
for a sequence of values of τ. Once a solution has been
2
Also known as bound optimization algorithms (BOA). For a general
introduction to MM/BOA, see [33].
obtained for a particular τ, it can be used as a “warm-start”
for a nearby value. Solutions can therefore be computed for a
range of τ values for a small multiple of the cost of solving
for a single τ value from a “cold start.” This feature of GPSR
is somewhat related to that of LARS and other homotopy
schemes, which compute solutions for a range of parameter
values in succession. In particular, “warm-starting” allows
using GPSR within a continuation scheme (as suggested in
[31]). IP methods such as those in [11], [36], and ℓ
1
-magic
have been less successful in making effective use of warm-
start information, though this issue has been investigated in
various contexts (see, e.g., [30], [35], [61]). To benefit from
a warm start, IP methods require the initial point to be not
only close to the solution but also sufficiently interior to the
feasible set and close to a “central path,” which is difficult to
satisfy in practice.
II. PROPOSED FORMULATION
A. Formulation as a Quadratic Program
The first key step of our GPSR approach is to express (1)
as a quadratic program; as in [28], this is done by splitting
the variable x into its positive and negative parts. Formally,
we introduce vectors u and v and make the substitution
x = u − v, u ≥ 0, v ≥ 0. (6)
These relationships are satisfied by u
i
= (x
i
)
+
and v
i
=
(−x
i
)
+
for all i = 1, 2, . . . , n, where (·)
+
denotes the
positive-part operator defined as (x)
+
= max{0, x}. We thus
have kxk
1
= 1
T
n
u + 1
T
n
v, where 1
n
= [1, 1, . . . , 1]
T
is the
vector consisting of n ones, so (1) can be rewritten as the
following bound-constrained quadratic program (BCQP):
min
u,v
1
2
ky − A(u − v)k
2
2
+ τ 1
T
n
u + τ 1
T
n
v,
s.t. u ≥ 0 (7)
v ≥ 0.
Note that the ℓ
2
-norm term is unaffected if we set u ← u+s
and v ← v + s, where s ≥ 0 is a shift vector. However such
a shift increases the other terms by 2 τ 1
T
n
s ≥ 0. It follows
that, at the solution of the problem (7), u
i
= 0 or v
i
= 0, for
i = 1, 2, . . . , n, so that in fact u
i
= (x
i
)
+
and v
i
= (−x
i
)
+
for all i = 1, 2 , . . . , n, as desired.
Problem (7) can be written in more standard BCQP form,
min
z
c
T
z +
1
2
z
T
Bz ≡ F (z),
s.t. z ≥ 0, (8)
where
z =
u
v
, b = A
T
y, c = τ 1
2n
+
−b
b
and
B =
A
T
A −A
T
A
−A
T
A A
T
A
. (9)