全卷积网络推动语义分割新突破

1星 | 下载需积分: 12 | PDF格式 | 2.66MB | 更新于2024-09-08 | 118 浏览量 | 举报

"全卷积网络在语义分割中的应用与突破" 全卷积网络（Fully Convolutional Networks, FCN）是深度学习领域的一项重要进展，它起源于对卷积神经网络（Convolutional Neural Networks, CNN）的深入理解和创新设计。CNN以其在视觉任务中的强大表现在业界享有盛誉，特别是通过逐像素的端到端训练，能够提取出层次丰富的特征表示。传统CNN主要用于图像分类任务，但FCN的主要贡献在于扩展了其应用场景，使之能处理空间密集预测任务，如像素级别的语义分割。 FCNs的核心理念是将全连接层替换为全卷积层，这样可以接受任意大小的输入并生成同样大小的输出，实现了高效的学习和推理。这一设计对于解决需要像素级精度的任务至关重要，例如在医学影像分析、自动驾驶、图像识别和视频分析等领域，需要对每个像素进行精准标注。为了实现这一目标，研究者们如Jonathan Long等人，将现有的分类网络模型如AlexNet [22]、VGG net [34] 和 GoogLeNet [35] 转换为全卷积形式。他们采用迁移学习策略，首先在大规模图像分类数据集上预训练这些网络，然后通过微调（fine-tuning）方法调整网络权重，以适应语义分割的具体需求。这种方法保留了原始模型在高层抽象特征上的学习，同时增加了对细节和局部特征的关注。全卷积网络的架构设计还包括了跳跃连接（skip architecture），这是一种有效的信息融合机制。它结合了来自深层、粗糙特征层的语义信息和浅层、精细特征层的外观信息，使得模型能够同时捕捉全局和局部特征，从而提高了分割结果的准确性和细节完整性。这种结构有助于解决深度网络在处理小对象或边缘细节时可能出现的失真问题。全卷积网络的出现极大地推动了计算机视觉中语义分割的性能，并且通过创新的网络设计和迁移学习，它成为了现代图像处理和理解中的核心工具。在未来的研究中，全卷积网络可能会继续发展，与其他技术结合，如注意力机制、自注意力模块等，以提升在更多复杂场景下的表现。"

Fully Convolutional Networks for Semantic Segmentation

Jonathan Long

∗

Evan Shelhamer

∗

Trevor Darrell

UC Berkeley

{jonlong,shelhamer,trevor}@cs.berkeley.edu

Abstract

Convolutional networks are powerful visual models that

yield hierarchies of features. We show that convolu-

tional networks by themselves, trained end-to-end, pixels-

to-pixels, exceed the state-of-the-art in semantic segmen-

tation. Our key insight is to build “fully convolutional”

networks that take input of arbitrary size and produce

correspondingly-sized output with efﬁcient inference and

learning. We deﬁne and detail the space of fully convolu-

tional networks, explain their application to spatially dense

prediction tasks, and draw connections to prior models. We

adapt contemporary classiﬁcation networks (AlexNet [22],

the VGG net [34], and GoogLeNet [35]) into fully convolu-

tional networks and transfer their learned representations

by ﬁne-tuning [5] to the segmentation task. We then deﬁne a

skip architecture that combines semantic information from

a deep, coarse layer with appearance information from a

shallow, ﬁne layer to produce accurate and detailed seg-

mentations. Our fully convolutional network achieves state-

of-the-art segmentation of PASCAL VOC (20% relative im-

provement to 62.2% mean IU on 2012), NYUDv2, and SIFT

Flow, while inference takes less than one ﬁfth of a second

for a typical image.

1. Introduction

Convolutional networks are driving advances in recog-

nition. Convnets are not only improving for whole-image

classiﬁcation [22, 34, 35], but also making progress on lo-

cal tasks with structured output. These include advances

in bounding box object detection [32, 12, 19], part and key-

point prediction [42, 26], and local correspondence [26, 10].

The natural next step in the progression from coarse to

ﬁne inference is to make a prediction at every pixel. Prior

approaches have used convnets for semantic segmentation

[30, 3, 9, 31, 17, 15, 11], in which each pixel is labeled with

the class of its enclosing object or region, but with short-

comings that this work addresses.

∗

Authors contributed equally

384

256

4096

backward/learning

forward/inference

pixelwise prediction

segmentation g.t.

256

384

Figure 1. Fully convolutional networks can efﬁciently learn to

make dense predictions for per-pixel tasks like semantic segmen-

tation.

We show that a fully convolutional network (FCN)

trained end-to-end, pixels-to-pixels on semantic segmen-

tation exceeds the state-of-the-art without further machin-

ery. To our knowledge, this is the ﬁrst work to train FCNs

end-to-end (1) for pixelwise prediction and (2) from super-

vised pre-training. Fully convolutional versions of existing

networks predict dense outputs from arbitrary-sized inputs.

Both learning and inference are performed whole-image-at-

a-time by dense feedforward computation and backpropa-

gation. In-network upsampling layers enable pixelwise pre-

diction and learning in nets with subsampled pooling.

This method is efﬁcient, both asymptotically and abso-

lutely, and precludes the need for the complications in other

works. Patchwise training is common [30, 3, 9, 31, 11], but

lacks the efﬁciency of fully convolutional training. Our ap-

proach does not make use of pre- and post-processing com-

plications, including superpixels [9, 17], proposals [17, 15],

or post-hoc reﬁnement by random ﬁelds or local classiﬁers

[9, 17]. Our model transfers recent success in classiﬁca-

tion [22, 34, 35] to dense prediction by reinterpreting clas-

siﬁcation nets as fully convolutional and ﬁne-tuning from

their learned representations. In contrast, previous works

have applied small convnets without supervised pre-training

[9, 31, 30].

Semantic segmentation faces an inherent tension be-

tween semantics and location: global information resolves

what while local information resolves where. Deep feature

hierarchies encode location and semantics in a nonlinear

下载后可阅读完整内容，剩余9页未读，立即下载

Lucian_s

粉丝: 2

全卷积网络推动语义分割新突破

深度学习卷积神经网络图像参考文献

Python-图卷积网络相关文献汇总

fcn全卷积网络代码

图像分割翻译

《数字图像处理》冈萨雷斯 中文+英文 pdf

基于深度学习的中文实体关系抽取方法.pdf

句法分析与中文语法：从BNC语料库提取规则的方法

分词技术深度解析：掌握中文文本处理的基石，成为NLP专家

文本挖掘技术宝典：非结构化数据信息提取全攻略

自然语言处理迁移学习：技术进步与实用案例全解析

最新资源

《数字图像处理》冈萨雷斯中文+英文 pdf