深度特征作为感知度量的惊人效果

需积分: 33 182 浏览量更新于2024-09-09 1 收藏 5.29MB PDF 举报

"这篇论文是2018年计算机视觉领域顶级会议CVPR的最佳论文，探讨了深度特征在作为感知度量标准方面的出人意料的效果。作者包括Richard Zhang、Phillip Isola、Alexei A. Efros（均来自加州大学伯克利分校）、Eli Shechtman和Oliver Wang（来自Adobe Research）。研究发现，深度学习网络的特征可以有效地模拟人类对图像相似性的判断，即使在不同架构和监督类型下（如监督学习、自我监督和无监督学习）也是如此。" 正文：《深度特征作为感知指标的不合理有效性》是2018年CVPR会议上的一篇重要论文，它揭示了一个惊人的现象：深度学习网络的特征在评估图像相似性时，能够与人类的感知判断高度一致，而传统的图像质量评价指标（如L2距离、PSNR、SSIM和FSIM）则往往与人类直觉存在偏差。论文中，作者通过一系列实验展示了这一发现。例如，图1中的例子显示，传统指标无法准确反映人类对于图像补丁之间接近度的判断，而各种深度网络，无论其架构（如SqueezeNet、AlexNet或VGG）还是训练方式（监督学习、自我监督或无监督学习），都能提供一种一致的嵌入空间，该空间中的图像表示与人类的感知判断相吻合。此外，研究人员还对现有的深度嵌入进行了校准，利用大规模的感知判断数据库来优化模型。这一工作不仅为深度学习在图像处理领域的应用提供了新的视角，而且对计算机视觉的评估标准提出了挑战。论文的开放源代码和数据集（可在https://www.github.com/richzhang/PerceptualSimilar获取）使得其他研究者可以进一步探索和验证这些发现。这一研究的意义在于，它证明了深度学习网络在理解和模拟人类视觉系统上的潜力，这可能对图像生成、图像修复、图像识别等任务的算法设计产生深远影响。同时，这也意味着未来的计算机视觉系统可以通过学习深度特征来更准确地模拟人类对视觉内容的感知，从而提升系统的性能和用户体验。这篇论文推动了我们对深度学习如何捕捉和表达图像信息的理解，并可能引领计算机视觉领域对图像质量评估标准的革新，使得机器能更好地理解和模仿人类的视觉感知。

Dataset # Input Imgs/ Input Num Distort. # # Distort. # Judg- Judgment

Patches Type Distort. Types Levels Imgs/Patches ments Type

LIVE [51] 29 images 5 traditional continuous .8k 25k MOS

CSIQ [29] 30 images 6 traditional 5 .8k 25k MOS

TID2008 [46] 25 images 17 traditional 4 2.7k 250k MOS

TID2013 [45] 25 images 24 traditional 5 3.0k 500k MOS

BAPPS (2AFC–Distort) 160.8k 64 × 64 patch 425 trad + CNN continuous 321.6k 349.8k 2AFC

BAPPS (2AFC–Real alg) 26.9k 64 × 64 patch – alg outputs – 53.8k 134.5k 2AFC

BAPPS (JND–Distort) 9.6k 64 × 64 patch 425 trad. + CNN continuous 9.6k 28.8k Same/Not same

Table 1: Dataset comparison. A primary differentiator between our proposed Berkeley-Adobe Perceptual Patch Similarity

(BAPPS) dataset and previous work is scale of distortion types. We provide human perceptual judgments on distortion set

using uncompressed images from [7, 10]. Previous datasets have used a small number of distortions at discrete levels. We

use a large number of distortions (created by sequentially composing atomic distortions together) and sample continuously.

For each input patch, we corrupt it using two distortions and ask for a few human judgments (2 for train, 5 for test set) per

pair. This enables us to obtain judgments on a large number of patches. Previous databases summarize their judgments into

a mean opinion score (MOS); we simply report pairwise judgments (two alternative force choice). In addition, we provide

judgments on outputs from real algorithms, as well as a same/not same Just Noticeable Difference (JND) perceptual test.

model low-level perceptual similarity surprisingly

well, outperforming previous, widely-used metrics.

• We demonstrate that network architecture alone does

not account for the performance: untrained nets

achieve much lower performance.

• With our data, we can improve performance by “cali-

brating” feature responses from a pre-trained network.

Prior work on datasets In order to evaluate existing sim-

ilarity measures, a number of datasets have been proposed.

Some of the most popular are the LIVE [51], TID2008 [46],

CSIQ [29], and TID2013 [45] datasets. These datasets are

referred to Full-Reference Image Quality Assessment (FR-

IQA) datasets and have served as the de-facto baselines for

development and evaluation of similarity metrics. A related

line of work is on No-Reference Image Quality Assessment

(NR-IQA), such as AVA [38] and LIVE In the Wild [18].

These datasets investigate the “quality” of individual im-

ages by themselves, without a reference image. We collect

a new dataset that is complementary to these: it contains a

substantially larger number of distortions, including some

from newer, deep network based outputs, as well as ge-

ometric distortions. Our dataset is focused on perceptual

similarity, rather than quality assessment. Additionally, it is

collected on patches as opposed to full images, in the wild,

with a different experimental design (more details in Sec 2).

Prior work on deep networks and human judgments

Recently, advances in DNNs have motivated investigation

of applications in the context of visual similarity and image

quality assessment. Kim and Lee [25] use a CNN to pre-

dict visual similarity by training on low-level differences.

Concurrent work by Talebi and Milanfar [54, 55] train a

deep network in the context of NR-IQA for image aesthet-

ics. Gao et al. [16] and Amirshahi et al. [3] propose tech-

niques involving leveraging internal activations of deep net-

works (VGG and AlexNet, respectively) along with addi-

tional multiscale post-processing. In this work, we conduct

a more in-depth study across different architectures, train-

ing signals, on a new, large scale, highly-varied dataset.

Recently, Berardino et al. [6] train networks on percep-

tual similarity, and importantly, assess the ability of deep

networks to make predictions on a separate task – predict-

ing most and least perceptually-noticeable directions of dis-

tortion. Similarly, we not only assess image patch similarity

on parameterized distortions, but also test generalization to

real algorithms, as well as generalization to a separate per-

ceptual task – just noticeable differences.

2. Berkeley-Adobe Perceptual Patch Similarity

(BAPPS) Dataset

To evaluate the performance of different perceptual met-

rics, we collect a large-scale highly diverse dataset of per-

ceptual judgments using two approaches. Our main data

collection employs a two alternative forced choice (2AFC)

test, that asks which of two distortions is more similar to a

reference. This is validated by a second experiment where

we perform a just noticeable difference (JND) test, which

asks whether two patches – one reference and one distorted

– are the same or different. These judgments are collected

over a wide space of distortions and real algorithm outputs.

2.1. Distortions

Traditional distortions We create a set of “traditional”

distortions consisting of common operations performed on

the input patches, listed in Table 2 (left). In general, we

use photometric distortions, random noise, blurring, spatial

shifts and corruptions, and compression artifacts. We show

qualitative examples of our traditional distortions in Fig-

ure 2. The severity of each perturbation is parameterized -

剩余13页未读，继续阅读

qq_43279787

粉丝: 0
资源: 1

深度特征作为感知度量的惊人效果

Learning transferable features with deep adaptation networks.pdf

人工智能中深度学习的不合理有效性（The unreasonable effectiveness of DL in AI）.pdf

The Unreasonable Effectiveness of Data

【CVPR2018】A Constrained Deep Neural Network for O

cvpr论文_2018CVPR

[CVPR 2018]Discriminative Learning of Latent Features for Zero-Shot Recognition

2018CVPR跟踪论文

CVPR2021-纸面代码解释：cvpr2021cvpr2020cvpr2019cvpr2018cvpr2017论文，极市团队整理

CVPR2020-纸代码解释：cvpr2020cvpr2019 ／ cvpr2018cvpr2017论文，极市团队整理

[2018 CVPR] Competitive Collaboration

最新资源