reconstructed based on the learnt quality-aware dictionary
and its corresponding sparse coefficients w.r.t. the learnt
feature-aware dictionary.
(3) In addition to the commonly used Gabor filter response
based features which have proven to be useful for quality
assessment, we also incorporate Histogram of Oriented
Gradient (HOG) descriptor for local feature representation.
Experimental results demonstrate the effectiveness of HOG
feature.
The remainder of this paper is organized as follows. Section 2
introduces the related works and then describes the general idea
of this work. Section 3 illustrates the detailed design of the pro-
posed model. Experimental results and analyses are presented in
Section 4, and finally conclusions are drawn in Section 5.
2. Related works and general idea
2.1. Dictionary learning using sparse coding
The goal of sparse coding is to simulate the sparsity of simple-
cell RF properties in V1. Previous studies have demonstrated that
the sparsity is an important prior based on the observation that
natural images generally contain sparse structures and can be
described by only a small number of structural primitives like lines
and edges [35]. We aim to learn an overcomplete dictionary in
which each basis function is tailored to one specific structural
primitive or one particular feature, so that any complex structure
in an image can be described as a linear combination of a set of
basis functions.Particularly, given n patches, each patch is
described by a d-dimensional feature vector y
i
2 R
d
, such that
the raw patches can be represented by a matrix
Y ¼½y
1
; ...; y
i
; ...; y
n
2R
dn
. From these raw patches, a dictionary
D ¼½d
1
; ...; d
j
; ...; d
m
2R
dm
(allowing m > d to make the dic-
tionary overcomplete) and the corresponding sparse coefficients
C ¼½c
1
; ...; c
i
; ...; c
n
2R
mn
can be learned simultaneously by
using existing dictionary learning algorithms [36–38], where
d
j
2 R
d
, and c
i
2 R
m
. Mathematically, this process can be accom-
plished by optimizing the following objective function:
min
fD;Cg
ky
i
Dc
i
k
2
F
no
; subjec to
8i; kc
i
k
0
6 T
0
ð1Þ
where kk
F
denotes the Frobenius norm, kk
0
denotes the l
0
-norm,
and T
0
is a predefined sparsity constraint factor that represents the
maximum number of non-zero elements in each sparse coefficient
c
i
. Although the l
0
-norm gives a straightforward measurement of
sparsity, the introduction of l
0
-norm sparsity constraint makes this
problem NP-hard. Fortunately, recent developments in the field of
optimization theories reveal that the l
0
-norm minimization prob-
lem will have the same solution with the l
1
-norm minimization
problem if the restricted isometric property (RIP) condition is satis-
fied [39]. Based on this important theory, the above formulation can
be rewritten as follows:
fD; Cg¼arg min
fD;Cg
X
n
i¼1
1
2
ky
i
Dc
i
k
2
F
þ kkc
i
k
1
ð2Þ
where k is a positive constant controlling the relative important of
the reconstruction error term and sparse constraint term. Typically,
both D and C are unknown in this stage. In order to solve this prob-
lem, several approaches have been proposed to seek an optimal
sparse solution, such as K-SVD [37] and online dictionary learning
(ODL) [38]. Once the dictionary D is learned, given a testing sample
similarly described by a d-dimensional feature vector y
t
, we can
automatically convert it to its sparse coefficients c
t
by solving the
following l
1
-norm minimization problem:
c
t
¼ arg min
c
t
2R
m
1
2
ky
t
Dc
t
k
2
F
þ kkc
t
k
1
ð3Þ
Generally, we term the sparse coefficient vector c
t
as the sparse
feature of y
t
over dictionary D because the majority elements in c
t
are zeroes. Since the sparse feature has been proved to be highly
consistent with visual perceptual, it has been widely used in many
computer vision and image processing applications, such as object
classification [40], visual saliency detection [41], face recognition
[42] and image quality assessment [33,34].
2.2. Sparse coding solution for IQA
As stated above, the sparse feature obtained with the learnt dic-
tionary is a promising solution to predict the perceived quality. In
this subsection, we give a short overview of some representative
sparse coding-based IQA methods.
Chang et al. proposed a visual cortex-like FR-IQA metric by
modeling the neural processing mechanism of RFs of simple cells
in V1 [33]. Specifically, independent component analysis (ICA) is
adopted to train a feature detector from a collection of natural
image samples for sparse coding. Then, the image quality of a dis-
torted image is quantified by measuring SFF . Mathematically, the
proposed SFF metric is defined as
SFFðI
ref
; I
dis
Þ¼
1
K M
X
K
i¼1
X
M
j¼1
2A
ij
B
ij
þ c
ðA
ij
Þ
2
þðB
ij
Þ
2
þ c
ð4Þ
where K denotes the number of the sparse feature vectors in an
image, M is the dimension of each sparse feature vector, A
ij
and B
ij
represent the values of the j-th element in the i-th sparse feature
vector of the distorted image I
ref
and reference image I
dis
,
respectively.
Guha et al. devised a new FR-IQA metric, named sparse
representation-based quality index (SPARQ) [34]. Different from
Chang’s work, K-SVD algorithm is used for dictionary learning in
this work. The fidelity of the sparse coefficients is computed to
measure the image quality by
SPARQðI
ref
; I
dis
Þ¼
1
K
X
K
i¼1
jx
T
r;i
x
d;i
jþc
kx
r;i
k
2
kx
d;i
k
2
þc
1
kx
r;i
x
d;i
k
2
þc
kx
r;i
k
2
þkx
d;i
k
2
þc
()
ð5Þ
where K denotes the number of the sparse feature vectors in an
image, x
r;i
and x
d;i
represent the sparse feature vector of the i-th
image patch in I
ref
and I
dis
, respectively.
In addition to the solution for FR-IQA, sparse coding has been
adopted for OA-BIQA as well. He et al. proposed a simple yet effec-
tive BIQA metric based on sparse representation of natural scene
statistics (SRNSS) [43]. In this work, a set of NSS feature vectors
and the corresponding human opinion scores are collected from
the training images to construct a dictionary. In the testing stage,
by extracting the NSS feature vector from a testing image, the
sparse coefficients over the constructed dictionary can be obtained
by using the sparse coding strategy in Eq. (3). The final quality
score is computed by weighting the human opinion scores of all
the training images using the estimated sparse coefficients.
Although the performances are promising, the existing
sparse coding-based IQA methods still suffer from the following
limitations:
(a) In those sparse coding-based FR-IQA methods (e.g., SFF and
SPARQ), the dictionary is learned in an unsupervised way
and acted as an unsupervised cortex-like feature detector.
Q. Jiang et al. / J. Vis. Commun. Image R. 33 (2015) 123–133
125