MITTAL et al.: NR IQA IN THE SPATIAL DOMAIN 4697
in this way is sufficient not only to quantify naturalness, but
also to quantify quality in the presence of distortion.
In this article, we detail the statistical model of locally
normalized luminance coefficients in the spatial domain, as
well as the model for pairwise products of these coefficients.
We describe the statistical features that are used from the
model and demonstrate that these features correlate well with
human judgements of quality. We then describe how we learn
a mapping from features to quality space to produce an
automatic blind measure of perceptual quality. We thoroughly
evaluate the performance of BRISQUE, and statistically com-
pare BRISQUE performance to state-of-the-art FR and NR
IQA approaches. We demonstrate that BRISQUE is highly
competitive to these NR IQA approaches, and also statistically
better than the popular full-reference peak signal-to-noise-
ratio (PSNR) and structural similarity index (SSIM). We
show that BRISQUE performs well on independent databases,
analyze its complexity and compare it with other NR IQA
approaches. Finally, to further illustrate the practical relevance
of BRISQUE, we describe how a non-blind image denoising
algorithm can be augmented with BRISQUE in order to
improve blind image denoising. Results show that BRISQUE
augmentation leads to significant performance improvements
over the state-of-the-art. Before we describe BRISQUE in
detail, we first briefly review relevant prior work in the area
of blind IQA.
II. P
REVIOUS WORK
Most existing blind IQA models proposed in the past
assume that the image whose quality is being assessed is
afflicted by a particular kind of distortion [5]–[11], [17].
These approaches extract distortion-specific features that relate
to loss of visual quality, such as edge-strength at block-
boundaries. However, a few general purpose approaches for
NR IQA have been proposed recently.
Li devised a set of heuristic measures to characterize visual
quality in terms of edge sharpness, random noise and structural
noise [18] while Gabarda and Cristobal, modeled anisotropies
in images using Renyi entropy [19]. The authors in [20]
use gabor filter based local appearance descriptors to form
a visual codebook, and learn DMOS score vector, associating
each word with a quality score. However, in the process of
visual codebook formation, each feature vector associated with
an image patch is labeled by DMOS asigned to the entire
image. This is questionable as each image patch can present a
different level of quality depending on the distortion process
the image is afflicted with. In particular, local distortions
such as packet loss might afflict only a few image patches.
Also, the approach is computationally expensive limiting its
applicability in real time applications.
Tang et al. [21] proposed an approach which learns an
ensemble of regressors trained on three different groups of
features - natural image statistics, distortion texture statistics
and blur/noise statistics. Another approach [22] is based on a
hybrid of curvelet, wavelet and cosine transforms. Although
these approaches work on a variety of distortions, each set
of features (in the first approach) and transforms (in the
second) caters only to certain kinds of distortion processes.
This limits the applicability of their framework to new
distortions.
We have also developed previous NR QA models in the
past, following our philosophy, first fully developed in [23],
that NSS models provide powerful tools for probing human
judgements of visual distortions. Our work on NSS based
FR QA algorithms [9], [23], [24], more recent RR models
[3] and very recent work on NSS based NR QA [12],
[13], [25] have led us to the conclusion that visual features
derived from NSS lead to particularly potent and simple QA
models [26].
Our recently proposed NSS based NR IQA model, dubbed
the Distortion Identification-based Image INtegrity and Ver-
ity Evaluation (DIIVINE) index, deploys summary statistics
derived from an NSS wavelet coefficient model, using a two
stage framework for QA: distortion-identification followed by
distortion-specific QA [12]. The DIIVINE index performs
quite well on the LIVE IQA database [27], achieving statistical
parity with the full-reference structural similarity (SSIM)
index [28].
A complementary approach developed at the same time,
named BLind Image Notator using DCT Statistics (BLIINDS-
II index) is a pragmatic approach to NR IQA that operates
in the DCT domain, where a small number of features are
computed from an NSS model of block DCT coefficients [13].
Efficient NSS features are calculated and fed to a regression
function that delivers accurate QA predictions. BLIINDS-II is
a single-stage algorithm that also delivers highly competitive
QA prediction power. Although BLIINDS-II index is multi-
scale, the small number of feature types (4) allow for efficient
computation of visual quality and hence the index is attractive
for practical applications.
While both DIIVINE and BLIINDS-II deliver top NR IQA
performance (to date), each of them has certain limitations.
The large number of features that DIIVINE computes implies
that it may be difficult to compute in real time. Although
BLIINDS-II is more efficient than DIIVINE, it requires non-
linear sorting of block based NSS features, which slows it
considerably.
In our continued search for fast and efficient high perfor-
mance NSS based NR QA indices, we have recently stud-
ied the possibility of developing transform-free models that
operate directly on the spatial pixel data. Our inspiration for
thinking we may succeed is the pioneering work by Ruderman
[15] on spatial natural scene modeling, and the success of the
spatial multi-scale SSIM index [29], which competes well with
transform domain IQA models.
III. B
LIND SPATIAL IMAGE QUALITY ASSESSMENT
Much recent work has focused on modeling the statistics
of responses of natural images using multiscale transforms
(eg., Gabor filters, wavelets etc.) [16]. Given that neuronal
responses in area V1 of visual cortex perform scale-space-
orientation decompositions of visual data, transform domain
models seem like natural approaches, particularly in view of
the energy compaction (sparsity) and decorrelating properties