人类中心的感知适应：视频编码新策略

200 浏览量更新于2024-07-15 收藏 2.05MB PDF 举报

本文档标题为"Human-Centered Perceptual Adaptation for Video Coding"，发表在《多维系统与信号处理》(Multidimensional Systems and Signal Processing)这一国际期刊上，该刊的国际标准连续出版物编号为ISSN 0923-6082。作者包括Minglei Tong、Zhouye Gu、Nam Ling和Junjie Yang，他们的研究成果旨在探索一种以人为本的感知适应技术，以提升视频编码的效率和质量。在当前的视频编码领域，随着高清、超高清视频的普及，传统的编码方法已无法满足人类视觉系统的复杂需求。人类的视觉感知并非对所有细节都同等敏感，而是倾向于在观看过程中对信息进行自适应处理，以优化观感。因此，"human-centered"方法强调了在编码过程中考虑到人眼的视觉特性，如动态范围限制、注意力分配和运动感知等。论文的核心内容可能包括以下几个方面： 1. **感知模型**：研究团队可能构建了一种感知模型，模拟人类视觉系统的特性，以便在编码过程中更准确地预测哪些部分是观众关注的重点，哪些可以适当降低精度以节省带宽。 2. **算法设计**：他们可能会介绍一种新的编码算法，该算法能根据人眼的感知特性动态调整压缩参数，如帧率、分辨率或颜色深度，以提供更流畅、更舒适的观看体验。 3. **实验评估**：通过对比实验，论文展示了这种人本感知适应方法相较于传统编码技术在压缩效率、视觉质量和主观满意度方面的优势。这可能包括量化指标和用户主观评价。 4. **应用场景**：论文可能会探讨这种方法在不同场景下的应用，如网络流媒体、虚拟现实或增强现实视频编码，以及如何在有限的带宽条件下提供最佳用户体验。 5. **未来方向**：文章也可能讨论了进一步改进和扩展的可能性，比如结合人工智能技术来更精确地预测观众的视觉习惯，或者开发适用于不同设备和网络条件的自适应策略。这篇论文为视频编码技术的发展提供了新的视角，它强调了在编码过程中考虑人的感知特性，以实现更高效、更符合人类视觉习惯的视频压缩方案。这不仅有助于提升现有的视频通信系统性能，还为未来的视听技术奠定了基础。

Multidim Syst Sign Process

our results show that the proposed algorithm can improve the visual quality of ROI by about

1.01 dB while preserving coding efﬁciency.

Keywords Video coding · Visual communications · Perceptual adaptation · Optical ﬂow ·

Human detection

1 Introduction

With increasing demand in multimedia applications over limited bandwidth, continued and

huge effort have been made to improve the compression performance of the H.264/MPEG-4

AVC video coding standard (H.264 2010). The commonly used tools in H.264/AVC and

the latest HEVC (ITU H.265) focus mostly on exploiting statistical correlation of signals.

These tools mainly include transforms, motion estimation and compensation, intra and inter

prediction, bit-rate control, entropy coding, quantization, and more. However, conventional

methods seldom consider the fact that human visual system (HVS) usually focuses on a

small region instead of the entire frames. Recently, coding of video using computational

psychology models of visual saliency (Hadizadeh and Bajic 2014) or visual attention model

(Itti et al. 1998) has been considered as an effective solution to compress video to achieve

higher coding efﬁciency (Itti 2004; Chen et al. 2010; Chen and Guillemot 2010; Li et al.

2011; Liu et al. 2008; Chen et al. 2009). The common idea with these methods is to encode

a region around the computed attention areas with higher ﬁdelity as compared to other less

visually important regions.

Many of the existing methods for perceptual coding is to execute HVS guided preprocess-

ing (Yuan et al. 2009; Cavallaro et al. 2004), which detect conspicuous information from

the original video for higher ﬁdelity. However it is hard to set the parameters for a detecting

algorithm in typical rate distortion optimization (RDO). More commonly, HVS characteris-

tics work at the quantization stage of the encoder (Tang et al. 2006; Yang et al. 2005), where

perceptual unimportant regions are coarsely quantized. In this way fewer bits are allocated

to regions that can withstand greater distortion; as a result coding bit-rate is reduced. Chen

and Guillemot (2010) describe a foveation model as well as a foveated just noticeable dif-

ference (FJND) model in which spatial and temporal JND models are enhanced to account

for the relationship between visibility and eccentricity. Their model is used for macroblock

(MB) quantization adjustment in H.264/AVC. For each MB, the quantization parameters are

optimized based on its FJND information. The Lagrange multiplier in the RDO is adapted so

that the MB noticeable distortion is minimized. Nevertheless, a simple weighting of the mean

square error (MSE) may not lead to the optimal trade-off between visual quality and rate.

Luo et al. (2013) propose an alternative perceptual video coding method to improve upon

the current H.264/AVC framework based on an independent JND-directed suppression tool.

They also analytically derive a JND mapping formula between the integer DCT domain and

the classic DCT domain which permits the reuse of the JND models in a more natural way.

However, when quantization distortion is large, maximum coefﬁcient adjustment amplitudes

constrained by the original JND thresholds may be comparatively conservative for coefﬁ-

cient suppression in their method. The proposed method (Hadizadeh and Bajic 2014)aimsat

reducing salient coding artifacts in non-ROI parts of the frame in order to keep users’ attention

on ROIs. Further, the method allows saliency to increase in high quality parts of the frame,

and allows saliency to reduce in non-ROI parts. In the distortion models, they had to compute

two Lagrange multipliers, which are extremely time-consuming, associated with distortions

123

Author's personal copy

剩余16页未读，继续阅读

weixin_38575118

粉丝: 3
资源: 923

人类中心的感知适应：视频编码新策略

Video coding

视频编码的自适应感知预处理

信息安全_数据安全_The Value of Human Centered Rese.pdf

Big Data for Urban Sustainability_A Human-Centered Perspective(2018).pdf

Field Guide to Human-Centered Design

Motion-compensated interpolation for face-centered-orthorhombic sampled video sequence

human-centered-exnlp.github.io

Towards a Student-centered Lab Design for Learning Principles of Communications

Home start: Family-centered preschool enrichment for black and white children

assignment-2--human-centered-design--mwd-270--spring-2021

最新资源