
SPARSE MODELING OF SIGNALS AND IMAGES 41
mention work in quantum information theory, constructing error-correcting codes us
ing a collection of orthogonal bases with minimal coherence, obtaining similar bounds
on the mutual coherence for amalgams of orthogonal bases [11].
Mutual coherence, relatively easy to compute, allows us to lower bound the spark,
which is often hard to compute.
LEMMA 4 (see [46]). For any matrix A C Rnxr, the following relationship holds:
(7) spark(A) > 1 + (A)
Proof. First, modify the matrix A by normalizing its columns to unit f2 norm,
obtaining A. This operation preserves both the spark and the mutual coherence. The
entries of the resulting Gram matrix G - ATA satisfy the following properties:
{Gk,k = I: 1 < k < m} and {|Gk,j| < A : 1 < k,j < m, k # j}.
Consider an arbitrary minor from G of size p x p, built by choosing a subgroup
of p columns from A and computing their sub-Gram matrix. From the Gershgorin
disk theorem [91], if this minor is diagonally dominant i.e., if ,joi JG,j I < JGj,jI for
every i then this submatrix of G is positive definite, and so those p columns from
A are linearly independent. The condition p < 1 + 1/,u implies positive definiteness
of every p x p minor, and so spark(A) > p + 1 > 1 + 17/. 0
We have the following analogue of Theorem 2.
THEOREM 5 (uniqueness: mutual coherence [46]). If a system of linear equations
Ax = b has a solution x obeyzng lIx lo < 2 (1 + 1//(A)), this solution is necessarily
the sparsest possible.
Compare Theorems 2 and 5. They are parallel in form, but with different as
sumptions. In general, Theorem 2, which uses spark, is sharp and far more powerful
than Theorem 5, which uses the coherence and so only a lower bound on spark. The
coherence can never be smaller than 1/#, and, therefore, the cardinality bound of
Theorem 5 is never larger than v1/2. However, the spark can easily be as large as n,
and Theorem 2 then gives a bound as large as n/2.
We have now given partial answers to the questions Qi and Q2 posed at the start
of this section. We have seen that any sufficiently sparse solution is guaranteed to
be unique among sufficiently sparse solutions. Consequently, any sufficiently sparse
solution is necessarily the global optimizer of (PO). These results show that searching
for a sparse solution can lead to a well-posed question with interesting properties. We
now turn to discuss Q3 practical methods for obtaining solutions.
2.2. Pursuit Algorithms: Practice. A straightforward approach to solving (PO)
seems hopeless; we now discuss methods which, it seems, have no hope of working
but which, under specific conditions, will work.
2.2.1. Greedy Algorithms. Suppose that the matrix A has spark(A) > 2 and
the optimization problem (PO) has value val(Po) = 1, so b is a scalar multiple of some
column of the matrix A. We can identify this column by applying m tests one per
column of A. This procedure requires order O(mn) flops, which may be considered
reasonable. Now suppose that A has spark(A) > 2ko, and the optimization problem
is known to have value val(Po) = ko. Then b is a linear combination of at most ko
columns of A. Generalizing the previous solution, one might try to enumerate all
7n) = O(mko) subsets of ko columns from A and then to test each one. Enumeration
takes O(mkonko2) flops, which seems prohibitively slow in many settings.
This content downloaded from 39.174.135.235 on Mon, 12 Nov 2018 08:24:27 UTC
All use subject to https://about.jstor.org/terms