AN ANALYSIS DICTIONARY LEARNING ALGORITHM BASED ON RECURSIVE LEAST
SQUARES
Ye Zhang
1
, Haolong Wang
1
, Wenwu Wang
2
1
Department of Electronic and Information Engineering, Nanchang University, Nanchang, China
2
Department of Electronic Engineering, University of Surrey, Guildford, United Kingdom
Emails: zhangye@ncu.edu.cn, hlwangncu@163.com, w.wang@surrey.ac.uk
ABSTRACT
We consider the dictionary learning problem in sparse rep-
resentations based on an analysis model with noisy observa-
tions. A typical limitation associated with several existing
analysis dictionary learning (ADL) algorithms, such as Anal-
ysis K-SVD, is their slow convergence due to the procedure
used to pre-estimate the source signal from the noisy mea-
surements when updating the dictionary atoms in each itera-
tion. In this paper, we propose a new ADL algorithm where
the recursive least squares (RLS) algorithm is used to estimate
the dictionary directly from the noisy measurements. To im-
prove the convergence properties of the proposed algorithm,
the initial dictionary is estimated from a small training set by
using the K-plane clustering algorithm. The proposed algo-
rithm, as shown by experiments, offers advantages over the
Analysis K-SVD, in both the runtime and atom recovery rate.
Index Terms— Recursive least squares; analysis dictio-
nary learning; sparse representation
1. INTRODUCTION
In the past decade spare representations based on the synthe-
sis model have been studied extensively. Consider a signal
x ∈ R
M
, the sparse representations of x over a dictionary
D can be described as x = Da, where D ∈ R
M×N
is a
possibly overcomplete dictionary (N ≥ M ), and a ∈ R
N
,
containing the coding coefficients, is assumed to be sparse,
i.e. ∥a∥
0
= n ≪ N , where the ℓ
0
quasi-norm ∥·∥
0
counts
the number of nonzero components in its argument. In oth-
er words, the signals are represented as a linear combination
of a small number of signal components (i.e. atoms) chosen
from the dictionary. A great deal of effort has been dedicat-
ed to the problem of learning the dictionary D from signal
examples [1–9] based on the synthesis model.
There is a dual viewpoint to sparse representations based
on e.g. the analysis model, where an analysis dictionary or
operator Ω ∈ R
P ×M
(P ≥ M) is sought to transform x ∈
R
M
to a high dimensional space, i.e. Ωx = z. The coefficient
vector z ∈ R
P
is called the analysis representation of x and
assumed to be sparse. In this model, the rows of Ω that are
associated with zero entries in z define a subspace that the sig-
nal x is orthogonal to, as opposed to the few non-zero entries
of a in the synthesis model. In general, Ω can be learned from
the observed signals Y = [y
1
y
2
... y
K
] ∈ R
M×K
measured
in the presence of additive noise, i.e. Y = X + V, where
X = [x
1
x
2
... x
K
] ∈ R
M×K
contains the original signals,
V = [v
1
v
2
... v
K
] ∈ R
M×K
is noise and K is the number of
signals. Compared to the synthesis counterpart, there are on-
ly a few algorithms proposed recently for analysis dictionary
learning (ADL) [10–13].
In [10], the Analysis K-SVD algorithm was proposed for
dictionary learning. By keeping Ω fixed, the optimal back-
ward greedy (OBG) algorithm has been employed to estimate
the sub-matrix of Ω whose rows are orthogonal to
ˆ
X, which
is the estimation of X from Y. A data set, i.e., the sub-matrix
of Y, can be obtained, with its columns corresponding to that
of
ˆ
X. The smallest singular values of the sub-matrix are then
used to update Ω. The algorithm is effective, however, the
original signal X should be estimated by using the greedy al-
gorithm with heavy computation. In [11], a subset pursuit
algorithm has been proposed for the ADL. The algorithm is
based on the two-phase approach. In the first phase, the al-
gorithm exploits the analysis representation of Y to obtain
the subset Y
j
, rather than using
ˆ
X
j
to determine Y
j
, where
Y
j
is the subset of Y that corresponds to the j-th row of Ω,
which is orthogonal to Y
j
. In the second phase, the j-th row
of the analysis dictionary can be updated by the eigenvector
associated with the smallest eigenvalue of the Gram matrix
of Y
j
. In [12], a sequential minimal eigenvalue algorithm
for the ADL was proposed by considering the orthogonality
between the rows of Ω and a sub-set of training signals X.
Once the sub-set is found, the corresponding row of the dic-
tionary can be updated with the eigenvector associated with
the smallest eigenvalue of the autocorrelation matrix of these
signals. However, as the number of the rows in Ω increases,
so does the computational cost of the method. In [13], a pro-
jected subgradient algorithm was proposed for analysis oper-
ator learning, where an ℓ
1
-norm penalty function is applied
to Ωx, and a uniformly normalized tight frame is employed
978-1-4799-2186-7/14/$31.00 ©2014 IEEE