通用视觉预训练模型BigTransfer：提升样本效率与性能

需积分: 25 129 浏览量更新于2024-07-16 2 收藏 3.26MB PDF 举报

大迁移：通用视觉表示学习（General Visual Representation Learning）是当前深度学习领域的重要研究方向，它关注如何在视觉任务中利用预训练模型来提升样本效率和简化超参数调整过程。传统的深度学习实践中，先在大规模监督数据集上进行模型预训练，如ImageNet等，然后将这些预训练模型迁移到特定目标任务上进行微调。这种方法在诸如BigTransfer（BiT）这样的方法中得到了显著优化。 BigTransfer（BiT）是一个由Alexander Kolesnikov、Lucas Beyer、Xiaohua Zhai等人提出的战略，他们隶属于Google Research的Brain Team，位于瑞士苏黎世。BiT的核心思想是扩大预训练的规模，并设计一种简单的迁移策略。通过精心选择组件和运用一种直观的迁移规则，BiT能够在超过20个不同的数据设置下表现出强大的性能，包括从每个类别仅有一个样本到数百万样本的广泛场景。具体来说，BiT在ImageNet-2012图像识别挑战赛中达到了87.5%的Top-1精度，这是一项非常高的成绩。在CIFAR-10小型图像分类任务上，它达到了99.4%的准确率，显示出对小数据集的高效处理能力。此外，它在Visual Task Adaptation Benchmark（VTAB）的19项任务上也取得了76.3%的准确度，这证明了BiT在跨任务迁移学习中的有效性。对于小规模数据集，BiT在ILSVRC-2012上的表现更是惊人，达到了76.8%的精度，这意味着即使面对资源有限的环境，BiT也能提供相当不错的性能。这种通用的视觉表示学习方法不仅提升了视觉模型的泛化能力，而且极大地减少了任务适应的复杂性，从而节省了开发时间和资源。大迁移：通用视觉表示学习是推动深度学习在视觉领域广泛应用的关键技术之一，它代表了在深度学习预训练和迁移学习研究领域的最新进展。

6 Kolesnikov

, Beyer

, Zhai

, Puigcerver, Yung, Gelly, Houlsby

Table 2: Improvement in accuracy when pre-training on the public ImageNet-21k

dataset over the “standard” ILSVRC-2012. Both models are ResNet152x4.

ILSVRC-

2012

CIFAR-

100

Pets Flowers VTAB-1k

(19 tasks)

BiT-S (ILSVRC-2012) 81.30 97.51 86.21 93.97 89.89 66.87

BiT-M (ImageNet-21k) 85.39 98.91 92.17 94.46 99.30 70.64

Improvement +4.09 +1.40 +5.96 +0.49 +9.41 +3.77

Downstream Fine-Tuning To attain a low per-task adaptation cost, we do

not perform any hyperparameter sweeps downstream. Instead, we present BiT-

HyperRule, a heuristic to determine all hyperparameters for ﬁne-tuning. Most

hyperparameters are ﬁxed across all datasets, but schedule, resolution, and usage

of MixUp depend on the tasks image resolution and training set size.

For all tasks, we use SGD with an initial learning rate of 0.003, momentum

0.9, and batch size 512. We resize input images with area smaller than 96 × 96

pixels to 160 × 160 pixels, and then take a random crop of 128 × 128 pixels. We

resize larger images to 448 × 448 and take a 384 × 384-sized crop.

We apply

random crops and horizontal ﬂips for all tasks, except those for which cropping

or ﬂipping destroys the label semantics, see Supplementary section F for details.

For schedule length, we deﬁne three scale regimes based on the number of ex-

amples: we call small tasks those with fewer than 20 k labeled examples, medium

those with fewer than 500 k, and any larger dataset is a large task. We ﬁne-tune

BiT for 500 steps on small tasks, for 10k steps on medium tasks, and for 20k

steps on large tasks. During ﬁne-tuning, we decay the learning rate by a factor of

10 at 30%, 60% and 90% of the training steps. Finally, we use MixUp [67], with

α = 0.1, for medium and large tasks. See Supplementary section A for details.

3.4 Standard Computer Vision Benchmarks

We evaluate BiT-L on standard benchmarks and compare its performance to the

current state-of-the-art results (Table 1). We separate models that perform task-

independent pre-training (“general” representations), from those that perform

task-dependent auxiliary training (“specialist” representations). The specialist

methods condition on a particular task, for example ILSVRC-2012, then train

using a large support dataset, such as JFT-300M [38] or Instagram-1B [63]. See

discussion in Section 5. Specialist representations are highly eﬀective, but require

a large training cost per task. By contrast, generalized representations require

large-scale training only once, followed by a cheap adaptation phase.

BiT-L outperforms previously reported generalist SOTA models as well as,

in many cases, the SOTA specialist models. Inspired by strong results of BiT-L

trained on JFT-300M, we also train models on the public ImageNet-21k dataset.

For our largest R152x4, we increase resolution to 512 × 512 and crop to 480 × 480.

剩余27页未读，继续阅读

syp_net

粉丝: 158
资源: 1187

通用视觉预训练模型BigTransfer：提升样本效率与性能

Image-embodied Knowledge Representation Learning

big_transfer：“大转移（BiT）：常规视觉表示学习”论文的官方资料库

《Scalable Rule Learning via Learning Representation》 阅读报告

Defeats GAN：A Simpler Model Outperforms in Knowledge Representation Learning.pdf

图表示学习(Graph representation learning)-AAAI-19-Tutorial

awesome-visual-representation-learning-with-transformers:计算机视觉中的超赞变形金刚（自我关注）

图表示学习（Graph representation learning） （part0 and part1）.zip

图表示学习（Graph representation learning） (part2 and part3).zip

network-representation-learning:网络表示学习

ConceptBert: Concept-Aware Representation for Visual Question A

最新资源

《Scalable Rule Learning via Learning Representation》阅读报告

图表示学习（Graph representation learning）（part0 and part1）.zip