深度学习与空间金字塔池化在语义图像分割中的应用

空间金字塔池

需积分: 21 32 浏览量更新于2024-07-17 收藏 1.65MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇文章主要介绍了Deeplab模型在语义图像分割中的应用，强调了空洞卷积（Atrous Convolution）的重要性，并提出了空洞空间金字塔池化（Atrous Spatial Pyramid Pooling, ASPP）来处理多尺度物体分割问题。" 在深度学习领域，特别是计算机视觉的任务中，语义图像分割是一项关键任务，它要求模型能够识别出图像中的每一个像素所属的类别。Deeplab模型是解决这一问题的一种先进方法，由Liang-Chieh Chen等人提出。该模型利用深度卷积神经网络（Deep Convolutional Neural Networks, DCNNs）的力量，结合了空洞卷积和全连接条件随机场（Fully Connected Conditional Random Fields, FC-CRFs）来提高分割的准确性。空洞卷积是本文的一个核心概念，它是对传统卷积操作的扩展。在传统的卷积中，滤波器是在固定大小的窗口上滑动并进行计算，而空洞卷积允许滤波器的步长大于1，即在滤波器的权重之间留有“空洞”（或称为“膨胀”）。这种操作可以保持较高的特征图分辨率，同时增大了滤波器的有效感受野，使得网络能捕获更广泛的上下文信息，而无需增加参数量或计算复杂度。 ASPP是Deeplab模型中的另一个创新点，用于处理不同尺度的物体。在图像分割中，由于物体的大小各异，单一的卷积层可能无法适应所有尺度的对象。ASPP通过在不同的采样率下应用滤波器，同时具有不同大小的有效视野，对输入的卷积特征层进行多尺度探测。这样，模型能够同时捕捉到小到大不同尺寸的物体，提高了对不同尺度目标的分割性能。 ASPP通常包括多个具有不同空洞率的卷积层，这些层并行工作，每层都从不同的角度提取特征。此外，为了引入全局信息，ASPP还可能包含全局平均池化或者1x1卷积层。最后，所有并行分支的输出会被融合，以产生最终的分割结果。总结来说，Deeplab模型通过空洞卷积和ASPP模块，有效地解决了语义图像分割中的尺度问题，提升了模型的泛化能力和分割质量。这两个技术的结合不仅在理论上有重要的贡献，而且在实际应用中也显示出了显著的效果，广泛应用于自动驾驶、医学影像分析、遥感图像处理等多个领域。

资源详情

资源推荐

DCNNs at multiple image resolutions and then employ a

segmentation tree to smooth the prediction results. More

recently, [21] propose to use skip layers and concatenate the

computed intermediate feature maps within the DCNNs for

pixel classiﬁcation. Further, [51] propose to pool the inter-

mediate feature maps by region proposals. These works still

employ segmentation algorithms that are decoupled from

the DCNN classiﬁer’s results, thus risking commitment to

premature decisions.

The third family of works uses DCNNs to directly provide

dense category-level pixel labels, which makes it possible to

even discard segmentation altogether. The segmentation-

free approaches of [14], [52] directly apply DCNNs to the

whole image in a fully convolutional fashion, transforming

the last fully connected layers of the DCNN into convolu-

tional layers. In order to deal with the spatial localization

issues outlined in the introduction, [14] upsample and con-

catenate the scores from intermediate feature maps, while

[52] reﬁne the prediction result from coarse to ﬁne by propa-

gating the coarse results to another DCNN. Our work builds

on these works, and as described in the introduction extends

them by exerting control on the feature resolution, introduc-

ing multi-scale pooling techniques and integrating the

densely connected CRF of [22] on top of the DCNN. We

show that this leads to signiﬁcantly better segmentation

results, especially along object boundaries. The combination

of DCNN and CRF is of course not new but previous works

only tried locally connected CRF models. Speciﬁcally, [53]

use CRFs as a proposal mechanism for a DCNN-based

reranking system, while [39] treat superpixels as nodes for a

local pairwise CRF and use graph-cuts for discrete inference.

As such their models were limited by errors in superpixel

computations or ignored long-range dependencies. Our

approach instead treats every pixel as a CRF node receiving

unary potentials by the DCNN. Crucially, the Gaussian CRF

potentials in the fully connected CRF model of [22] that we

adopt can capture long-range dependencies and at the same

time the model is amenable to fast mean ﬁeld inference. We

note that mean ﬁeld inference had been extensively studied

for traditional image segmentation tasks [54], [55], [56], but

these older models were typically limited to short-range con-

nections. In independent work, [57] use a very similar

densely connected CRF model to reﬁne the results of DCNN

for the problem of material classiﬁcation. However, the

DCNN module of [57] was only trained by sparse point

supervision instead of dense supervision at every pixel.

Since the ﬁrst version of this work was made publicly

available [38], the area of semantic segmentation has pro-

gressed drastically. Multiple groups have made important

advances, signiﬁcantly raising the bar on the PASCAL VOC

2012 semantic segmentation benchmark, as reﬂected to the

high level of activity in the benchmark’s leaderboard

[17],

[40], [58], [59], [60], [61], [62], [63]. Interestingly, most top-

performing methods have adopted one or both of the key

ingredients of our DeepLab system: Atrous convolution for

efﬁcient dense feature extraction and reﬁnement of the raw

DCNN scores by means of a fully connected CRF. We outline

below some of the most important and interesting advances.

End-to-end training for structured prediction has more

recently been explored in several related works. While we

employ the CRF as a post-processing method, [40], [59],

[62], [64], [65] have successfully pursued joint learning of

the DCNN and CRF. In particular, [59], [65] unroll the CRF

mean-ﬁeld inference steps to convert the whole system into

an end-to-end trainable feed-forward network, while [62]

approximates one iteration of the dense CRF mean ﬁeld

inference [22] by convolutional layers with learnable ﬁlters.

Another fruitful direction pursued by [40], [66] is to learn

the pairwise terms of a CRF via a DCNN, signiﬁcantly

improving performance at the cost of heavier computation.

In a different direction, [63] replace the bilateral ﬁltering

module used in mean ﬁeld inference with a faster domain

transform module [67], improving the speed and lowering

the memory requirements of the overall system, while [18],

[68] combine semantic segmentation with edge detection.

Weaker supervision has been pursued in a number of

papers, relaxing the assumption that pixel-level semantic

annotations are available for the whole training set [58],

[69], [70], [71], achieving signiﬁcantly better results than

weakly-supervised pre-DCNN systems such as [72]. In

another line of research, [49], [73] pursue instance segmen-

tation, jointly tackling object detection and semantic

segmentation.

Fig. 1. Model illustration. A deep convolutional neural network such as VGG-16 or ResNet-101 is employed in a fully convolutional fashion, using

atrous convolution to reduce the degree of signal downsampling (from 32x down 8x). A bilinear interpolation stage enlarges the feature maps to the

original image resolution. A fully connected CRF is then applied to reﬁne the segmentation result and better capture the object boundaries.

1. http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?

challengeid=11&compid=6

836 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 40, NO. 4, APRIL 2018

剩余14页未读，继续阅读

sunshinefcx

粉丝: 74
资源: 10

深度学习与空间金字塔池化在语义图像分割中的应用

pytorch torch.nn.AdaptiveAvgPool2d()自适应平均池化函数详解

[cv] MTCNN 多任务卷积神经网络

Spatial Pyramid Pooling Net 介绍

spp空间金字塔池化

空间金字塔池化visio文件下载

空间金字塔池化谁提出的

空间金字塔池化SPP 1000字介绍

SimSPPF空间金字塔池化原理

yolov7空间金字塔池化层改进

SPP（空间金字塔池化）

空间金字塔池化keras代码

yolov5空间金字塔池化

什么是空间金字塔池化（SPP），用python举例

yolov8更换空间金字塔池化

空间金字塔池化(Spatial Pyramid Pooling, SPP)

空间金字塔池化SPP介绍

什么是空间金字塔池化（SPP），用pytorch举例

空洞空间金字塔池化谁提出的

润色并优化：SPP-Net（Spatial Pyramid Pooling Network）是一种用于图像分类的卷积神经网络架构，主要思想是在卷积神经网络中添加空间金字塔池化层，提高网络的感受野 ，从而适应不同大小的输入图像。

在空间金字塔池化后引入坐标注意力机制的优势

最新资源

润色并优化：SPP-Net（Spatial Pyramid Pooling Network）是一种用于图像分类的卷积神经网络架构，主要思想是在卷积神经网络中添加空间金字塔池化层，提高网络的感受野，从而适应不同大小的输入图像。