计算机视觉：域泛化方法与挑战综述

下载需积分: 34 | PDF格式 | 2.08MB | 更新于2024-07-09 | 41 浏览量 | 举报

1 收藏

"这篇综述论文深入探讨了计算机视觉领域的领域泛化(Domain Generalization)问题，重点关注如何使机器学习模型在未见过的数据分布（分布外泛化，OOD）上保持良好的性能。" 计算机视觉是一个广泛应用的领域，涵盖物体识别、图像分割、动作识别和行人重识别等任务。然而，当训练数据与实际应用环境中的数据分布不同时，即出现所谓的"域转移"，大多数基于深度学习的方法会遇到性能下降的问题。这是因为这些方法通常假设输入数据遵循独立同分布(i.i.d.)原则，但在实际场景中，这一假设往往难以满足。领域泛化的目标正是解决这一挑战，它旨在仅利用源域数据训练模型，使其能适应目标域中的未知数据。自2011年以来，领域泛化研究已经取得了显著进展，涌现出了多种策略和技术，如： 1. 领域对齐：这种方法试图减少源域和目标域之间的特征分布差异，通过在特征空间中找到一个共享的表示，使得模型在不同域之间具有通用性。 2. 元学习：元学习或学习如何学习，通过在多个小规模任务中快速适应，来提高模型的泛化能力。在领域泛化中，这些任务可以代表不同的数据分布。 3. 数据增强：通过对源域数据进行各种变换，如旋转、缩放、裁剪等，来增加模型对变化的鲁棒性，模拟更广泛的数据分布。 4. 集成学习：通过构建和结合多个模型的预测，可以提高整体的泛化性能，每个模型可能专注于数据的不同方面或特征。 5. 对抗性训练：通过向源域数据添加对抗性噪声，使模型学习到更稳健的特征表示，从而减少对特定数据分布的依赖。 6. 多任务学习：通过同时学习多个相关任务，模型可以学习到更通用的特征，这些特征在不同任务和域中都可能有用。这篇综述论文详细回顾了过去十年间领域泛化在计算机视觉中的研究成果，不仅概述了上述各种方法，还讨论了现有方法的优缺点，以及未来可能的研究方向。论文还探讨了评估领域泛化性能的挑战，包括如何构建适当的基准测试集和衡量指标，这对于推动领域泛化领域的进一步发展至关重要。通过总结这些进展，该综述为研究人员和实践者提供了全面的参考，帮助他们理解当前的挑战，选择合适的方法，并为未来的研究提供新的思路。对于希望在面临域转移问题的项目中提高模型泛化能力的人来说，这篇论文是一份宝贵的资源。

展开

TABLE 2

Commonly used domain generalization datasets.

Benchmark # samples # domains Task Description

Rotated MNIST [53] 70,000 6 Handwritten digit recognition Rotation degree ∈ {0, 15, 30, 45, 60, 75}

Digits-DG [31] 24,000 4 Handwritten digit recognition Combination of MNIST [54], MNIST-M [13],

SVHN [55] and SYN [13]

VLCS [56] 10,729 4 Object recognition Combination of Caltech101 [39], LabelMe [40],

PASCAL [57], and SUN09 [58]

Ofﬁce-31 [10] 4,652 3 Object recognition Domain ∈ {amazon, webcam, dslr}

OfﬁceHome [59] 15,588 4 Object recognition Domain ∈ {art, clipart, product, real}

PACS [33] 9,991 4 Object recognition Domain ∈ {photo, art, cartoon, sketch}

DomainNet [60] 586,575 6 Object recognition Domain ∈ {clipart, infograph, painting, quick-

draw, real, sketch}

miniDomainNet [61] 140,006 4 Object recognition A smaller and less noisy version of Domain-

Net; domain ∈ {clipart, painting, real, sketch}

ImageNet-Sketch [51] 50,000 2 Object recognition Domain shift between real and sketch images

VisDA-17 [62] 280,157 3 Object recognition Synthetic-to-real generalization

CIFAR-10-C [8] 60,000 - Object recognition

The test data are damaged by 15 corruptions

(each with 5 intensity levels) drawn from 4

categories (noise, blur, weather, and digital)

CIFAR-100-C [8] 60,000 - Object recognition

ImageNet-C [8] ≈1.3M - Object recognition

Visual Decathlon [63] 1,659,142 10 Object/action/handwritten

digit recognition

Combination of 10 datasets

IXMAS [64] 1,650 5 Action recognition 5 camera views; 10 subjects; 5 actions (see [27])

UCF-HMDB [65], [66] 3,809 2 Action recognition 12 overlapping actions (see [67])

SYNTHIA [68] 2,700 15 Semantic segmentation 4 locations; 5 weather conditions (see [43])

GTA5-Cityscapes [69], [70] 29,966 2 Semantic segmentation Synthetic-to-real generalization

TerraInc [71] 24,788 4 Animal classiﬁcation Captured at different geographical locations

Market-Duke [72], [73] 69,079 2 Person re-identiﬁcation Cross-dataset re-ID; heterogeneous DG

Face [36] >5M 9 Face recognition Combination of 9 face datasets

COMI [74], [75], [76], [77] ≈8,500 4 Face anti-spooﬁng Combination of 4 face anti-spooﬁng datasets

style is close to the target image style (both sharing the

same visual cues), the performance would be higher (e.g.,

photo→painting, both relying on colors and textures); oth-

erwise, if the source image style is drastically different from

the target image style, the performance would be poor

(e.g., photo→quickdraw, with the latter strongly relying on

shape information while requiring no color information at

all). This observation also applies to unsupervised domain

adaptation. For instance, the performance on the quickdraw

domain of DomainNet is usually the lowest among all target

domains [61], [81], [82].

3) A couple of recent DG studies [83], [84] have investi-

gated, from a transfer learning perspective, how to preserve

the knowledge learned via large-scale pre-training when

training on abundant labeled synthetic data for synthetic-

to-real applications. The experiments were carried out on

VisDA-17 [62]. This is an important yet under-studied topic

in DG: when only given sufﬁcient synthetic data, how can

we avoid over-ﬁtting in synthetic images by leveraging the

initialization weights learned on real images? Such a setting

is particularly useful to problems where manual labels are

difﬁcult/expensive to obtain.

4) Synthetic image corruptions like Gaussian noise and

motion blur have also been used to simulate domain shift

by Hendrycks and Dietterich [8]. In their proposed datasets,

i.e. CIFAR-10-C, CIFAR-100-C and ImageNet-C, a model is

learned using the original images but tested on the cor-

rupted images. This research is largely motivated by adver-

sarial attacks [85], and aims to evaluate model robustness

under common image perturbations for safety applications.

5) Lastly, a hybrid dataset initially proposed for multi-

domain/task learning, i.e. Visual Decathlon [63], has also

been employed, for evaluating heterogeneous DG [34], [37].

However, due to both the changes in label space and the

use of target data for training SVM classiﬁers, this setup

overlaps with transfer learning [86].

Action Recognition Learning generalizable models is

critical for action recognition. This is because the test data

typically contain actions performed by new subjects in

new environments. IXMAS [64] has been widely used as

a cross-view action recognition benchmark [27], [37], which

contains action videos collected from ﬁve different views.

The common practice is to use four views for training and

the remaining view for test. In addition to view changes, dif-

ferent subjects and environments might also cause failure.

Intuitively, different persons can perform the same action in

下载后可阅读完整内容，剩余17页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

syp_net

粉丝: 158

计算机视觉：域泛化方法与挑战综述

NeurIPS 2020上与【域自适应】相关论文（六篇）

微软亚研「领域泛化 (Domain Generalization)」综述论文概述理论、算法等

Domain Generalization A Survey.pdf

清华大学崔鹏等最新「分布外泛化(Out-Of-Distribution Generalization)」 综述论文

UML用例图之泛化(generalization)、扩展(extend)和包含(include)关系

Domain Generalization A Survey.zip

A Fundus Image Dataset for Domain Generalization segmentation

Generalizing to Unseen Domains A Survey on Domain Generalization

机器视觉中的领域泛化综述

未见领域泛化的综述：域泛化技术探讨

最新资源

清华大学崔鹏等最新「分布外泛化(Out-Of-Distribution Generalization)」综述论文