深度学习驱动的图像分割技术综述

需积分: 24 187 浏览量更新于2024-07-16 收藏 5.88MB PDF 举报

"本文是关于深度学习在图像分割中的应用的综述，由Shervin Minaee、Yuri Boykov、Fatih Porikli、Antonio Plaza、Nasser Kehtarnavaz和Demetri Terzopoulos撰写。" 深度学习在图像分割领域的应用已经取得了显著的成就，它为图像处理和计算机视觉提供了强大的工具。图像分割是理解和分析图像的关键步骤，它在场景理解、医学图像分析、机器人感知、视频监控、增强现实和图像压缩等多个领域都有广泛应用。传统算法尽管有一定的效果，但深度学习模型的出现极大地推动了图像分割技术的发展。在这篇综述中，作者回顾了当前文献中的大量工作，重点关注了使用深度学习模型进行语义分割和实例级分割的方法。语义分割是对图像进行类别级别的分割，而实例级分割则进一步区分同一类别的不同对象。以下是一些重要的技术： 1. **全卷积像素标注网络**（Fully Convolutional Pixel-Labeling Networks）：这些网络将全连接层转换为卷积层，使得可以处理任意大小的输入图像，并直接输出像素级的分类结果。 2. **编码器-解码器架构**：这类网络借鉴了编码器（如VGG或ResNet）来捕获图像的全局特征，然后通过解码器恢复细节信息，实现高分辨率的分割输出。 3. **多尺度和金字塔方法**：通过在不同尺度上应用网络，可以捕捉到图像的局部和全局信息，提高分割的准确性，例如使用图像金字塔进行多尺度分析。 4. **循环神经网络**（Recurrent Neural Networks, RNNs）：RNNs在序列数据处理中表现出色，当应用于图像分割时，它们可以处理时间序列图像或对前一帧的预测进行迭代改进。 5. **视觉注意力模型**：这些模型模拟人类视觉系统的注意力机制，只关注图像中的关键区域，从而减少计算量并提高性能。 6. **对抗性生成模型**：在对抗性环境中训练的生成模型（如GANs）可以在图像分割任务中生成逼真的结果，同时也能帮助模型学习更复杂的表示。这篇综述深入探讨了这些方法的相似性和差异，分析了各自的优缺点，并对未来的研究方向给出了见解。通过这种方式，读者能够全面了解深度学习在图像分割领域的最新进展，为研究和实践提供有价值的参考。

proaches incorporate probabilistic graphical models, such

as Conditional Random Fields (CRFs) and Markov Random

Field (MRFs), into DL architectures.

Chen et al. [38] proposed a semantic segmentation algo-

rithm based on the combination of CNNs and fully connected

CRFs (Figure 10). They showed that responses from the ﬁnal

layer of deep CNNs are not sufﬁciently localized for accurate

object segmentation (due to the invariance properties that

make CNNs good for high level tasks such as classiﬁcation).

To overcome the poor localization property of deep CNNs,

they combined the responses at the ﬁnal CNN layer with a

fully-connected CRF. They showed that their model is able to

localize segment boundaries at a higher accuracy rate than it

was possible with previous methods.

Fig. 10. A CNN+CRF model. The coarse score map of a CNN is up-

sampled via interpolated interpolation, and fed to a fully-connected CRF

to reﬁne the segmentation result. From [38].

Schwing and Urtasun [39] proposed a fully-connected

deep structured network for image segmentation. They

presented a method that jointly trains CNNs and fully-

connected CRFs for semantic image segmentation, and

achieved encouraging results on the challenging PASCAL

VOC 2012 dataset. In [40], Zheng et al. proposed a similar

semantic segmentation approach integrating CRF with CNN.

In another relevant work, Lin et al. [41] proposed an

efﬁcient algorithm for semantic segmentation based on

contextual deep CRFs. They explored “patch-patch” context

(between image regions) and “patch-background” context to

improve semantic segmentation through the use of contex-

tual information.

Liu et al. [42] proposed a semantic segmentation algorithm

that incorporates rich information into MRFs, including high-

order relations and mixture of label contexts. Unlike previous

works that optimized MRFs using iterative algorithms, they

proposed a CNN model, namely a Parsing Network, which

enables deterministic end-to-end computation in a single

forward pass.

3.3 Encoder-Decoder Based Models

Another popular family of deep models for image seg-

mentation is based on the convolutional encoder-decoder

architecture. Most of the DL-based segmentation works use

some kind of encoder-decoder models. We group these works

into two categories, encoder-decoder models for general

segmentation, and for medical image segmentation (to better

distinguish between applications).

3.3.1 Encoder-Decoder Models for General Segmentation

Noh et al. [43] published an early paper on semantic

segmentation based on deconvolution (a.k.a. transposed

convolution). Their model (Figure 11) consists of two parts,

an encoder using convolutional layers adopted from the

VGG 16-layer network and a deconvolutional network that

takes the feature vector as input and generates a map of

pixel-wise class probabilities. The deconvolution network

is composed of deconvolution and unpooling layers, which

identify pixel-wise class labels and predict segmentation

masks. This network achieved promising performance on the

PASCAL VOC 2012 dataset, and obtained the best accuracy

(72.5%) among the methods trained with no external data at

the time.

Fig. 11. Deconvolutional semantic segmentation. Following a convolution

network based on the VGG 16-layer net, is a multi-layer deconvolution

network to generate the accurate segmentation map. From [43].

In another promising work known as SegNet, Badri-

narayanan et al. [44] proposed a convolutional encoder-

decoder architecture for image segmentation (Figure 12).

Similar to the deconvolution network, the core trainable

segmentation engine of SegNet consists of an encoder net-

work, which is topologically identical to the 13 convolutional

layers in the VGG16 network, and a corresponding decoder

network followed by a pixel-wise classiﬁcation layer. The

main novelty of SegNet is in the way the decoder upsamples

its lower resolution input feature map(s); speciﬁcally, it

uses pooling indices computed in the max-pooling step

of the corresponding encoder to perform non-linear up-

sampling. This eliminates the need for learning to up-sample.

The (sparse) up-sampled maps are then convolved with

trainable ﬁlters to produce dense feature maps. SegNet is also

signiﬁcantly smaller in the number of trainable parameters

than other competing architectures. A Bayesian version of

SegNet was also proposed by the same authors to model the

uncertainty inherent to the convolutional encoder-decoder

network for scene segmentation [45].

Several other works adopt transposed convolutions, or

encoder-decoders for image segmentation, such as Stacked

Deconvolutional Network (SDN) [46], Linknet [47], W-Net

[48], and locality-sensitive deconvolution networks for RGB-

D segmentation [49].

Fig. 12. SegNet has no fully-connected layers; hence, the model is fully

convolutional. A decoder up-samples its input using the transferred pool

indices from its encoder to produce a sparse feature map(s). From [44].

剩余22页未读，继续阅读

icanflysjt

粉丝: 0
资源: 3

深度学习驱动的图像分割技术综述

"基于MATLAB的图像分割算法研究与仿真

"遗传算法在图像分割中的优势研究及应用

"基于模糊聚类分析的图像分割技术研究综述

medical image segmentation using deep learning: a survey

Getting Started with Semantic Segmentation using DL:Getting Started with Deep Learning Semantic Segmentation using your own image dataset-matlab开发

Deep Learning for Medical Image Analysis- Academic Press (2017).pdf

Fine-Sliced Image Area Segmentation: A Detailed ...from Threshold Segmentation to Deep Learning

[Advanced Chapter] Image Fusion in MATLAB: Using Deep Learning for Image Fusion

Image_segmentation_for_self_driving_cars_using_deep_learning_techniques

Deep Learning and Convolutional Neural Networks for Medical Image Computing

最新资源