J. Ma et al. Signal Processing: Image Communication 65 (2018) 33–45
Fig. 1. Proposed FR quality assessment framework for stereoscopic images.
2.1. Contrast sensitivity function filtering
Since there are some inherent limitations with respect to the visibil-
ity of stimuli, the BVS is not equally sensitive to all stimuli. According
to [28], the binocular visual sensitivity to stimulus at different spatial
frequencies is different which could be modeled by an empirical CSF.
A widely-used CSF is the one introduced by Mannos and Sakrison [29]
with adjustments specified by Daly [30]. This CSF, 𝐻(𝑓, 𝜃), is defined
as
𝐻(𝑓, 𝜃) =
2.6(0.0192 + 𝜆𝑓
𝜃
) exp[−(𝜆𝑓
𝜃
)
1.1
], if𝑓 ≥ 𝑓
𝑝𝑒𝑎𝑘
𝑐∕ deg
0.981 otherwise
(1)
where 𝑓 denotes the radial spatial frequency in cycles per degree of
visual angle (c/deg),𝜃 ∈
[
−𝜋, 𝜋
]
denotes the orientation, and 𝑓
𝜃
=
𝑓∕[0.15 cos(4𝜃 + 0.85)] accounts for the oblique effect. Fig. 2 shows the
resulting curves called Mannos and Sakrison’s CSF. From Fig. 2, we can
see that the BVS is sensitive to a limited range of frequencies. Therefore,
in this paper, we consider variations in sensitivity to spatial frequency by
applying the CSF filtering independently to each image of the reference
and distorted stereopairs. Suppose we apply the CSF filtering to the
luminance image
𝐿
𝑙
. This CSF filtering is performed in the frequency
domain via
𝐼 = 𝐹
−1
H(𝑢, 𝑣) × 𝐹 [
L
𝑙
]
(2)
where 𝐹 [.] and 𝐹
−1
[.] denote the DFT and inverse DFT, respectively. The
quantity
H(𝑢, 𝑣) denotes a DFT based version of 𝐻(𝑓, 𝜃), where 𝑢, 𝑣 are
the DFT indices. Here, the CSF is further adjusted as described in [31]
to have a lowpass profile by explicitly setting frequencies below 𝑓
𝑝𝑒𝑎𝑘
to 0.981, which is the maximum value of 𝐻
(
𝑓, 𝜃
)
as determined by 𝜆.
According to [31–33], we have set 𝜆 = 0.114, resulting in a peak at
a frequency of 𝑓
𝑝𝑒𝑎𝑘
≈ 8𝑐∕ deg, which is measured before forcing the
lowpass profile within the range of 1 to 8 c/deg typically reported for
CSF.
2.2. The weights of binocular energy perception
From a biological point of view, the stereoscopy can be defined as
the association of two eyes in the visual analysis of the same region of
the scene. If the information received by the two eyes is compatible, the
brain combines their inputs in a way that yields a stable, unitary percept.
Fig. 2. Spatial frequency response curve of CSF.
This process of combination, known as ‘‘binocular fusion’’. However,
for merging of slightly different images from the two eyes, arising from
binocular disparity, into a single stereoscopic perception, the BVS needs
to decide which points in the left and right images correspond to the
same physical location. In [34], Banks et al. pointed out that the BVS
might solve the correspondence problem by using an approach similar to
cross-correlation. Also, several electro-physiological experiments have
provided detailed descriptions of the response properties of binocular
neurons in the primary visual cortex [35,36]. Interestingly, these re-
sponses of the receptive field are well described by binocular energy
model [37,38]. Since the binocular energy model provides a good
description of the first stages of cortical binocular processing, many
previous studies adopted binocular energy model for diverse 3D visual
signal processing [39,40]. For example, Bensalma et al. [39] proposed
a stereoscopic color image coding approach by using binocular energy
model. Furthermore, in stereo vision, binocular energy response not only
depends on the amplitude and phase but also relies on the disparity
information inputs. Because of the left and right images do not have the
same position, the left view response 𝑅
𝑙
(𝑥, 𝑦) is the equal of a shifted
35