At each step, we only need to compute an SVD and perform elementary matrix
operations. With the help of a standard numerical linear algebra package, the whole
algorithm can be coded in just a few lines. As we will see later, the iteration (2.7) is
the linearized Bregman iteration, which is a special instance of Uzawa’s algorithm.
Before addressing further computational issues, we would like to make explicit the
relationship between this iteration and the original problem (1.1). In Section 4, we
will show that the sequence {X
k
} converges to the unique solution of an optimization
problem closely related to (1.1), namely,
minimize τkXk
∗
+
1
2
kXk
2
F
subject to P
Ω
(X) = P
Ω
(M ).
(2.8)
Furthermore, it is intuitive that the solution to this modified problem converges to
that of (1.4) as τ → ∞ as shown in Section 3. Thus by selecting a large value of the
parameter τ, the sequence of iterates converges to a matrix which nearly minimizes
(1.1).
As mentioned earlier, there are two crucial properties which make this algorithm
ideally suited for matrix completion.
• Low-rank property. A remarkable empirical fact is that the matrices in the
sequence {X
k
} have low rank (provided, of course, that the solution to (2.8)
has low rank). We use the word “empirical” because all of our numerical ex-
periments have produced low-rank sequences but we cannot rigorously prove
that this is true in general. The reason for this phenomenon is, however,
simple: because we are interested in large values of τ (as to better approxi-
mate the solution to (1.1)), the thresholding step happens to ‘kill’ most of the
small singular values and produces a low-rank output. In fact, our numerical
results show that the rank of X
k
is nondecreasing with k, and the maximum
rank is reached in the last steps of the algorithm, see Section 5.
Thus, when the rank of the solution is substantially smaller than either di-
mension of the matrix, the storage requirement is low since we could store
each X
k
in its SVD form (note that we only need to keep the current iterate
and may discard earlier values).
• Sparsity. Another important property of the SVT algorithm is that the it-
eration matrix Y
k
is sparse. Since Y
0
= 0, we have by induction that Y
k
vanishes outside of Ω. The fewer entries available, the sparser Y
k
. Because
the sparsity pattern Ω is fixed throughout, one can then apply sparse matrix
techniques to save storage. Also, if |Ω| = m, the computational cost of up-
dating Y
k
is of order m. Moreover, we can call subroutines supporting sparse
matrix computations, which can further reduce computational costs.
One such subroutine is the SVD. However, note that we do not need to com-
pute the entire SVD of Y
k
to apply the singular value thresholding operator.
Only the part corresponding to singular values greater than τ is needed.
Hence, a good strategy is to apply the iterative Lanczos algorithm to com-
pute the first few singular values and singular vectors. Because Y
k
is sparse,
Y
k
can be applied to arbitrary vectors rapidly, and this procedure offers a
considerable speedup over naive methods.
2.3. Relation with other works. Our algorithm is inspired by recent work in
the area of `
1
minimization, and especially by the work on linearized Bregman itera-
tions for compressed sensing, see [11–13, 27, 56, 67] for linearized Bregman iterations
and [17,19–21,30] for some information about the field of compressed sensing. In this
6