深度机器视觉的可觉察差异研究

视频压缩

需积分: 9 169 浏览量更新于2024-08-04 收藏 646KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇论文探讨了深度机器视觉(Deep Machine Vision, DMV)中的Just Noticeable Difference (JND)，即人类视觉系统对于图像和视频处理中的一个关键感知特性。研究发现，深度机器视觉也存在类似的现象，作者将其称为DMV-JND，并提出了一种针对图像分类任务的JND模型——DMV-JND-Net，该模型可以通过无监督学习生成JND，允许图像在平均PSNR仅为9.56dB的情况下仍能被正确识别，同时设计了一种语义引导的冗余评估策略来控制失真程度。" 在深度学习领域，机器视觉已经成为许多应用的核心，如图像识别、目标检测和自动驾驶等。传统的图像和视频处理中，Just Noticeable Difference (JND) 是一个重要的概念，它指的是人类视觉系统能够感知到的最小差异。JND被广泛应用于视觉信号压缩，因为它可以帮助确定在不影响视觉质量的前提下可以接受的最低压缩水平。然而，尽管深度机器视觉在近年来取得了显著的进步，但关于DMV中是否存在类似JND的现象研究却相对较少。本论文填补了这一空白，通过实验证明了深度学习模型在处理图像时也存在JND现象，即DMV-JND。这表明，即使图像存在一定程度的失真，深度学习模型仍然能够准确地执行其任务。为了进一步理解并利用DMV-JND，作者提出了一个针对图像分类任务的JND模型，名为DMV-JND-Net。这个模型采用无监督学习的方法生成JND，这意味着它可以在没有标签数据的情况下训练，这极大地降低了模型训练的复杂性和成本。通过实验，他们发现DMV-JND-Net可以在平均峰值信噪比(PSNR)只有9.56dB的情况下仍然保持良好的分类性能，这是一个非常低的值，说明模型对图像质量的容忍度很高。此外，论文还提出了一种语义引导的冗余评估策略。这个策略考虑了图像的语义信息，以更精确地控制哪些区域的失真是可以接受的，哪些是需要避免的。这样做有助于减少对关键视觉元素的破坏，同时保留足够的信息供深度学习模型进行分类。这篇论文为深度机器视觉领域的感知理论提供了一个新的视角，即模型不仅能够容忍一定程度的失真，而且可以利用这种失真来优化计算效率。这项工作对于未来图像和视频压缩技术以及深度学习模型的优化具有重要指导意义，特别是在资源有限的环境下，如移动设备或边缘计算。

资源详情

资源推荐

arXiv:2102.08168v2 [cs.CV] 7 Jan 2022

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY. 1

Just Noticeable Difference for Deep Machine Vision

Jian Jin, Member, IEEE, Xingxing Zhang, Xin Fu, Huan Zhang,

Weisi Lin, Fellow, IEEE, Jia n Lou, Yao Zhao, Senior Member, IEEE

Abstract—As an important perceptual characteristic of the

Human Visual System (HVS), the Just Noticeable Difference

(JND) has been studied for decades with image and video

processing (e.g., perceptual visual signal compression). However,

there is little exploration on the existence of JND for the D eep

Machine Vision (DMV), although the DMV has made great

strides in many machine vision tasks. In this paper, we take

an initial attempt, and demonstrate that the DMV has the JND,

termed as the DMV-JND. We then propose a JND model for

the image classiﬁcation task in th e DMV. It has been discovered

that th e DMV can tolerate distorted images with average PSNR

of only 9.56dB (the lower the better), by generating JND via

unsupervised learning with the proposed DMV-JND-NET. In

particular, a semantic-guided redundancy assessment strategy

is designed to restrain the magnitude and spatial distribution

of the DMV-JND. Experimental results on image classiﬁcation

demonstrate that we successfully ﬁnd the JND for deep machine

vision. Our DMV-JND facilitates a possible direction for DMV-

oriented image and video compression, watermarking, quality

assessment, deep neural network security, and so on.

Index Terms—Just noticeable difference (JND), human visual

system (HVS), deep machine vision (D MV), image classiﬁcation,

class activation mapping (CAM)

I. INTRODUCTION

HE unique psychological and physiological mech a nisms

of the Hum a n Visual System (HVS) make humans unab le

to perceive certain changes in images and videos. This is du e to

its underlying spatial-temporal sensitivities and masking p rop-

erties [1]. That is, images and vid eos have visual redunda ncy

for the HVS. The HVS oriented Just Noticeable Difference

(JND), termed as the HVS-JND, refers to ﬁnd the maximum

visual threshold of each pixel. A ny changes under the thr esh-

old can be tolerated by the HV S. Commonly, this kind of

property of JND is regarded as the homogeneous property,

which exists in human perception, such as vision, hearing,

smell, touch, taste, and so on. All ch anges below JND form

a homogeneous range that leads to the same perception. The

This work was supported by Alibaba Group through Alibaba Innovative

Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute

(JRI), Nanyang Technological University, Singapore. (Corresponding author:

Weisi Lin.)

J. Jin, H. Zhang, and W. Lin are with the School of Computer Science

and Engineering, Nanyang Technological University, 639798, Singapore.

J. Jin and W. Lin are also with Alibaba-NTU Singapore Joint Research

Institute, N anyang Technological University, 639798, Singapore. E-mail:

jian.jin@ntu.edu.sg, huan.zhang@siat.ac.cn, wslin@ntu.edu.sg.

X. Zhang is with the Department of Computer Science and

Technology, Tsinghua University, Beijing 100084, China. E-mail:

xxzhang2020@mail.tsinghua.edu.cn.

X. Fu and Y. Zhao are with the Institute of Information Science, Beijing

Jiao Tong University, Beijing 100044, China, and also with the Beijing

Key L aboratory of Advanced Information Science and Network Technology,

Beijing 100044, China. E-mail: {xinfu and yzhao}@bjtu.edu.cn.

J. Lou is with the Alibaba cloud business group, department of video cloud,

Alibaba, Hangzhou 310052, China. Email: jianedwardlou@gmail.com.

Classifiers

100%

(RCA)

15.5%

(RCA)

DMV-JND

WGN

Original

澳

Fig. 1. The Relative Classiﬁcation Accuracy (RCA) comparison between

DMV-JND distorted image and White Gaussian Noise (WGN) distorted

image. After adding DMV-JND (generated via our proposed DMV-JND

model) and WGN (with same amount of noise) to the original image, we

get 100% and 15.55% RCA on the CIFAR-10 dataset, respectively.

homogeneous pr operty reﬂects the characteristics in sensitivity

of the human perception, which makes the HVS-JND being

widely used in image and video processing, such as perceptual

visual signal compression [1], quality-of-experience (QoE) in

video stre a ming service [2], watermarking [3], error resilience

[4], supper reso lution [5], graphic rendering [6], and so on.

With massive data and high-performance GPU har dware,

Deep Machine Vision (DMV) has made breakthroughs in

many machine vision tasks, su ch as image classiﬁcation [7],

object detection [8], person re-identiﬁcation [9], and so on.

It also makes the ultimate receiver and appreciator of in-

creasingly larger number of images and videos change from

the HVS to the DMV. Many images and videos processing

applications are developed for the DMV now, and we naturally

wonder: does the DMV have the JND? Unlike the HVS-JND

aiming to ﬁnd the visual redundancy for the HVS, the JND

for the D MV is to ﬁnd the redundancy of images and vide os

for dee p m achine vision by considering the effects of such

redundancy during the DMV tasks. If the DMV has JND, the

JND for the DMV will greatly be neﬁt the DMV-oriented v isu al

computing applications. For instance, it would help to design

novel codecs for DMV-oriented image and video compre ssion

[10] via a DMV-JND inspired bit allocation strategy. For

example, the lower bit is assigned to pixels with higher

redundancy for the DMV, while the higher bit is assigned

to pixels with lower redundancy so as to ac hieve overall bit

saving. Besides, it may provide us a novel perspective for a

wider scope, e.g., DMV-oriented quality evaluation for natu ral

下载后可阅读完整内容，剩余8页未读，立即下载

zax1522

粉丝: 4
资源: 7

深度机器视觉的可觉察差异研究

Self-similarity based structural regularity for just noticeable difference estimation

Just noticeable difference based fast coding unit partition in HEVC intra coding

如何使用matlab计算JND

https://www.cnblogs.com/noticeable/p/7172143.html

unity DOShakePosition

error: storage class specified for parameter

mathematical explanation of diagonal loading

CVAT生成的多个coco标签合并成为一个代码

fault error failure

det_high_indicies = det_results[:, 4] >= self.det_thresh IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

你给的代码annotations的image_id的对应关系没改，改一下

python opencv 实现对极约束

python搜索一个目录文件夹所有子文件夹中某一类数据的封装好的代码

上面生成代码中的anno的id有问题，改一下

把一个目录下的多个cocokepoints标签合并成一个，但是每个图片对应的标签要分离

把一个目录下的多个cocokepoints标签合并成一个

最新资源