Multidim Syst Sign Process
our results show that the proposed algorithm can improve the visual quality of ROI by about
1.01 dB while preserving coding efficiency.
Keywords Video coding · Visual communications · Perceptual adaptation · Optical flow ·
Human detection
1 Introduction
With increasing demand in multimedia applications over limited bandwidth, continued and
huge effort have been made to improve the compression performance of the H.264/MPEG-4
AVC video coding standard (H.264 2010). The commonly used tools in H.264/AVC and
the latest HEVC (ITU H.265) focus mostly on exploiting statistical correlation of signals.
These tools mainly include transforms, motion estimation and compensation, intra and inter
prediction, bit-rate control, entropy coding, quantization, and more. However, conventional
methods seldom consider the fact that human visual system (HVS) usually focuses on a
small region instead of the entire frames. Recently, coding of video using computational
psychology models of visual saliency (Hadizadeh and Bajic 2014) or visual attention model
(Itti et al. 1998) has been considered as an effective solution to compress video to achieve
higher coding efficiency (Itti 2004; Chen et al. 2010; Chen and Guillemot 2010; Li et al.
2011; Liu et al. 2008; Chen et al. 2009). The common idea with these methods is to encode
a region around the computed attention areas with higher fidelity as compared to other less
visually important regions.
Many of the existing methods for perceptual coding is to execute HVS guided preprocess-
ing (Yuan et al. 2009; Cavallaro et al. 2004), which detect conspicuous information from
the original video for higher fidelity. However it is hard to set the parameters for a detecting
algorithm in typical rate distortion optimization (RDO). More commonly, HVS characteris-
tics work at the quantization stage of the encoder (Tang et al. 2006; Yang et al. 2005), where
perceptual unimportant regions are coarsely quantized. In this way fewer bits are allocated
to regions that can withstand greater distortion; as a result coding bit-rate is reduced. Chen
and Guillemot (2010) describe a foveation model as well as a foveated just noticeable dif-
ference (FJND) model in which spatial and temporal JND models are enhanced to account
for the relationship between visibility and eccentricity. Their model is used for macroblock
(MB) quantization adjustment in H.264/AVC. For each MB, the quantization parameters are
optimized based on its FJND information. The Lagrange multiplier in the RDO is adapted so
that the MB noticeable distortion is minimized. Nevertheless, a simple weighting of the mean
square error (MSE) may not lead to the optimal trade-off between visual quality and rate.
Luo et al. (2013) propose an alternative perceptual video coding method to improve upon
the current H.264/AVC framework based on an independent JND-directed suppression tool.
They also analytically derive a JND mapping formula between the integer DCT domain and
the classic DCT domain which permits the reuse of the JND models in a more natural way.
However, when quantization distortion is large, maximum coefficient adjustment amplitudes
constrained by the original JND thresholds may be comparatively conservative for coeffi-
cient suppression in their method. The proposed method (Hadizadeh and Bajic 2014)aimsat
reducing salient coding artifacts in non-ROI parts of the frame in order to keep users’ attention
on ROIs. Further, the method allows saliency to increase in high quality parts of the frame,
and allows saliency to reduce in non-ROI parts. In the distortion models, they had to compute
two Lagrange multipliers, which are extremely time-consuming, associated with distortions
123