Mix-and-Match调优：自监督语义分割新突破

需积分: 9 58 浏览量更新于2024-09-08 收藏 7.01MB PDF 举报

“Mix-and-Match Tuning for Self-Supervised Semantic Segmentation” 本文主要探讨了在自监督语义分割中的“Mix-and-Match”微调方法，由Zhan、Liu、Luo、Tang和Loy等人提出，旨在解决自监督学习在图像语义分割任务上性能不足的问题。传统的深度卷积网络在进行语义图像分割时，通常依赖大规模的标注数据，如ImageNet和MS COCO，而自监督学习则尝试在无需人工标签的情况下预训练网络。自监督学习通过设计代理任务（如图像着色）来在未标注数据上形成判别性损失，但许多这类任务缺乏引导目标图像分割任务所需的关键监督信号。因此，自监督学习的性能仍然远逊于有监督预训练。研究者们提出了一种新的“Mix-and-Match”（M&M）策略，该策略可以方便地添加到现有的自监督方法中，而且不需要额外的标注样本。 M&M策略分为两个阶段：首先是“mix”阶段，该阶段从目标数据集中稀疏采样并混合图像块，以反映目标图像丰富的局部区域统计特性。接下来是“match”阶段，它构建了一个类别的连接图，从而可以生成用于网络微调的强三元组判别损失。这种方法遵循了现有自监督研究的标准实践，不需额外数据或标签。通过应用M&M方法，自监督方法首次能够在PASCAL VOC2012和CityScapes数据集上实现与ImageNet预训练方法相当甚至更好的性能。这一突破表明，利用有限的像素级标注，自监督学习可以显著提高其在目标图像分割任务上的表现。总结来说，这项工作为自监督学习在图像语义分割领域的应用开辟了新的途径，通过“Mix-and-Match”策略优化了无监督学习的潜力，使得在没有大量标注数据的情况下，模型的性能也能达到或超过传统的有监督预训练模型。这对于减少对大规模标注数据的依赖，降低图像处理任务的人工成本具有重要意义。

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Xiaohang Zhan Ziwei Liu Ping Luo Xiaoou Tang Chen Change Loy

Department of Information Engineering, The Chinese University of Hong Kong

{zx017, lz013, pluo, xtang, ccloy}@ie.cuhk.edu.hk

Abstract

Deep convolutional networks for semantic image segmen-

tation typically require large-scale labeled data, e.g., Ima-

geNet and MS COCO, for network pre-training. To reduce

annotation efforts, self-supervised semantic segmentation is

recently proposed to pre-train a network without any human-

provided labels. The key of this new form of learning is to

design a proxy task (e.g., image colorization), from which

a discriminative loss can be formulated on unlabeled data.

Many proxy tasks, however, lack the critical supervision

signals that could induce discriminative representation for

the target image segmentation task. Thus self-supervision’s

performance is still far from that of supervised pre-training.

In this study, we overcome this limitation by incorporating a

‘mix-and-match’ (M&M) tuning stage in the self-supervision

pipeline. The proposed approach is readily pluggable to many

self-supervision methods and does not use more annotated

samples than the original process. Yet, it is capable of

boosting the performance of target image segmentation task

to surpass fully-supervised pre-trained counterpart. The im-

provement is made possible by better harnessing the limited

pixel-wise annotations in the target dataset. Speciﬁcally, we

ﬁrst introduce the ‘mix’ stage, which sparsely samples and

mixes patches from the target set to reﬂect rich and diverse

local patch statistics of target images. A ‘match’ stage then

forms a class-wise connected graph, which can be used to

derive a strong triplet-based discriminative loss for ﬁne-

tuning the network. Our paradigm follows the standard prac-

tice in existing self-supervised studies and no extra data or

label is required. With the proposed M&M approach, for the

ﬁrst time, a self-supervision method can achieve comparable

or even better performance compared to its ImageNet pre-

trained counterpart on both PASCAL VOC2012 dataset and

CityScapes dataset.

Introduction

Semantic image segmentation is a classic computer vision

task that aims at assigning each pixel in an image with a class

label such as “chair”, “person”, and “dog”. It enjoys a wide

spectrum of applications, such as scene understanding (Li,

Socher, and Fei-Fei 2009; Lin et al. 2014; Li et al. 2017b)

and autonomous driving (Geiger et al. 2013; Cordts et al.

2016; Li et al. 2017a). Deep convolutional neural network

 2018, Association for the Advancement of Artiﬁcial

(CNN) is now the state-of-the-art technique for semantic

image segmentation (Long, Shelhamer, and Darrell 2015;

Liu et al. 2015; Zhao et al. 2017; Liu et al. 2017). The

excellent performance, however, comes with a price of

expensive and laborious label annotations. In most existing

pipelines, a network is usually ﬁrst pre-trained on millions

of class-labeled images, e.g., ImageNet (Russakovsky et al.

2015) and MS COCO (Lin et al. 2014), and subsequently

ﬁne-tuned with thousands of pixel-wise annotated images.

Self-supervised learning

is a new paradigm proposed

for learning deep representations without extensive anno-

tations. This new technique has been applied to the task

of image segmentation (Zhang, Isola, and Efros 2016a;

Larsson, Maire, and Shakhnarovich 2016; 2017). In general,

self-supervised image segmentation can be divided into two

stages: the proxy stage, and the ﬁne-tuning stage. The proxy

stage does not need any labeled data but requires one to

design a proxy or pretext task with self-derived supervisory

signals on unlabeled data. For instance, learning by coloriza-

tion (Larsson, Maire, and Shakhnarovich 2017) utilizes the

fact that a natural image is composed of luminance channel

and chrominance channels. The proxy task is formulated

with cross-entropy loss to predict an image chrominance

from the luminance of the same image. In the ﬁne-tuning

stage, the learned representations are utilized to initialize the

target semantic segmentation network. The network is then

ﬁne-tuned with pixel-wise annotations. It has been shown

that without large-scale class-labeled pre-training, semantic

image segmentation could still gain encouraging perfor-

mance over random initialization or from-scratch training.

Though promising, the performance of self-supervised

learning is still far from that achieved by supervised

pre-training. For instance, a VGG-16 network trained

with the self-supervised method of (Larsson, Maire, and

Shakhnarovich 2017) achieves a 56.0% mean Intersection

over Union (mIoU) on PASCAL VOC 2012 segmentation

benchmark (Everingham et al. 2010), higher than a random

initialized network that only yields 35.0% mIoU. How-

ever, an identical network trained on ImageNet achieves

64.2% mIoU. There exists a considerable gap between self-

supervised and pure supervised pre-training.

We believe that the performance discrepancy is mainly

Project page: http://mmlab.ie.cuhk.edu.hk/projects/M&M/

arXiv:1712.00661v3 [cs.CV] 30 Jan 2018

下载后可阅读完整内容，剩余7页未读，立即下载

aprilcuhk

粉丝: 3
资源: 6

Mix-and-Match调优：自监督语义分割新突破

self-supervised-semantic-segmentation

DeepLab: Semantic Image Segmentation

apache-spark-best-practices-and-tuning

Improving-Deep-Neural-Networks-Hyperparameter-tuning-Regularization-and-Optimization:我从不断完善的深度神经网络进行编程作业的解决方案

Prentice - HP-UX 11i Tuning and Performance.chm

flowers-MobileNetV3-large-model-fine-tuning-aug

Vagrant-web-performance-tuning

JDK11-hotspot-virtual-machine-garbage-collection-tuning-guide

JDK18-hotspot-virtual-machine-garbage-collection-tuning-gui

JDK12-hotspot-virtual-machine-garbage-collection-tuning-guide

最新资源