Stereoscopic Images Quality Assessment
Based On Deep Learning
Kai Wang
1,2
, Jun Zhou
1,2
, Member, IEEE, Ning Liu
1,2
, Xiao Gu
1,2
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University
1
Shanghai Key Laboratory of Digital Media Processing and Transmissions
2
Shanghai, 200240, China
Email: {aa576aaa, zhoujun, ningliu, gugu97}@sjtu.edu.cn
Abstract—With the popularity of stereoscopic 3D (S3D) images
and videos, many advanced objective quality assessment methods
have been proposed to evaluate viewers’ Quality of Experience
(QoE). Among them, most algorithms take advantages of the
disparity maps to extract useful features. On the other hand,
deep learning has been one of the hottest research topics during
these years, but limited efforts focused on the field in objective
quality evaluation of S3D images. In this paper, we propose a
S3D image quality assessment (S3D IQA) method based on deep
learning. In this method, the Convolutional Restricted Boltzmann
Machines (CRBM) combined with Factored Third-Order RBM
(FTO-RBM) is considered as learning model to extract feature
maps from pre-processed left and right images automatically.
Then an improved traversal algorithm based on two pooling
strategies is put forward to optimize extracted feature maps,
which improves the final quality assessment performance signif-
icantly. Experimental results show that our S3D IQA method
achieves good performance on 3D databases tested.
Index Terms—Stereoscopic Image Quality Assessment, Con-
volutional Restricted Boltzmann Machines (CRBM), Factored
Third-Order RBM (FTO-RBM), Deep Learning, Optimized Fea-
ture Maps
I. INTRODUCTION
In recent years, stereoscopic 3D (S3D) movies have been
increasingly popular among viewers. However, the viewing
experience may be unsatisfied if we watch S3D films for a
long time, which can be improved by many proposed S3D
image or video quality assessment (QA) methods. Subjective
and objective QA can be used to classify these concrete
QA methods. Compared to object QA, subject assessment is
expensive and time-consuming so that more attention has been
focused on building objective S3D IQA models.
Most objective S3D IQA models presented their stereo
visual related features to setup a IQA model. Deep learning
can be applied to extract features automatically, and has been
used in fields like speech processing, image classification, etc.
Recently, some researchers have paid attention to combine
deep learning with 2D IQA. Mocanu et al.[1] took Gaussian
Bernoulli RBM for reconstruction error to define RBMSim to
performe IQA. Hou et al.[2] employed DBN to obtain features
and took five classified results that representing 5 IQA scores.
In [3], DNN was used to collect features, and compared with
shallow architectures, such method can better approximate the
sensation of HVS to IQA. However, fewer researchers have
combined deep learning with S3D IQA.
Fig. 1. CRBM+FTO-RBM model. The inputs(V1) of CRBM are real values,
while units of other layers(from H1 to H3) are all binary.
Pooling is a common and valid method to optimize fea-
ture maps. The goal of spatial feature pooling is to con-
vert joint feature representation into more feasible one that
preserves significant information while removing irrelevant
details. Boureau et al.[4] gave a detailed pooling theoretical
analysis. Wang et al. [5] investigated three spatial feature
pooling methods under the background of perceptual IQA.
In this paper, we combine CRBM with FTO-RBM as model
to learn S3D image feature maps, which is then optimized by
an improved traversal algorithm based on pooling methods.
The rest of the paper is organized as follows. Section II firstly
introduces our learning model, then describes complete S3D
IQA method. Section III describes two pooling methods in
detail, and then gives improved traversal algorithm. Section IV
gives experimental results based on two benchmark databases,
and Section V draws the conclusion.
II. PROPOSED METHOD
A. Learning Model
Complete learning model used in this paper is shown in
Fig.1. In this model, two CRBMs are used as underlying
architecture, and FTO-RBM is used as top-level model.
The basic CRBM consists of two layers: one visible layer
V1 and one hidden layer H1. At the same time, probabilistic
max-pooling concept(layer P1) was led into CRBM to acquire
more stable performance[7]. CRBM can deal with larger scale
images and supply K group elementary features {f
k
ele
}(k ∈
[1, K]) in pool layer (layer P1). The training process of CRBM
is similar to RBM, which is convenient and highly-efficient.
978–1–5090–5316–2/16/$31.00
c
2016 IEEE VCIP 2016, Nov. 27 – 30, 2016, Chengdu, China