深度学习驱动的图像分割技术解析

需积分: 31 73 浏览量更新于2024-07-16 收藏 2.38MB PDF 举报

"这篇论文《理解深度学习在图像分割中的技术》由Swarnendu Ghosh、Nibaran Das、Ishita Das和Ujjwal Maulik于2019年7月16日撰写，主要探讨了深度学习如何在图像分割领域发挥作用。论文对各种深度学习方法进行了分析，包括卷积神经网络(CNN)、循环神经网络(RNN)、对抗网络(GAN)、自动编码器(Autoencoder)等，并特别关注了这些技术在图像分割上的应用和发展。" 在计算机视觉领域，深度学习已经成为解决像物体检测、定位、识别和无约束环境中的图像分割等复杂任务的有效工具。其中，深度神经网络的各种变体如CNNs因其在处理图像特征提取方面的优越性而受到广泛关注。CNN通过多层次的卷积和池化操作，能够逐步提取图像的局部特征，形成高层语义表示，这对图像分割至关重要。论文中详细介绍了传统图像分割方法，如阈值分割、区域生长、边缘检测等，然后过渡到深度学习方法。例如，全卷积网络(FCN)是最早用于像素级预测的深度学习模型之一，它将分类网络的反向传播转化为像素级别的分割输出。随后，U-Net等网络结构引入了跳跃连接，增强了细节信息的保留，提高了分割精度。此外，递归神经网络(RNN)和长短期记忆网络(LSTM)在处理序列数据时表现出色，它们在图像分割中的应用主要体现在处理具有时间依赖性的序列图像上，如视频序列分割。对抗网络(GAN)则通过生成对抗的方式，使得生成的分割结果更加逼真。同时，自编码器(Autoencoder)在图像降噪和异常检测方面有所贡献，其压缩-解压的架构也启发了低秩表示在图像分割中的应用。论文还分析了各种深度学习技术的独特贡献，比如基于注意力机制的模型，它们能引导网络专注于图像的特定区域，提高分割的准确性。还有，深度强化学习在部分像素级决策任务中也展现出潜力。这篇论文通过对深度学习在图像分割领域的系统分析，为读者提供了深入的理解，帮助他们可视化这些复杂过程的工作原理。无论是对于研究人员还是实践者，都是一份宝贵的参考资料，能够促进他们在这个快速发展的领域的进一步探索和创新。

Figure 6: The Sharpmask Network

using convolutional reﬁnements at every steps to generate high resolution masks

(Refer ﬁg. 6). The sharpmask scored an average recall of 39.3 which beats

deepmask, which scored 36.6 on the MS COCO Segmentation Dataset.

4.1.2 Region proposal networks

Another similar wing that started developing with image segmentation was ob-

ject localization. Task such as this involved locating speciﬁc objects in images.

Expected outputs for such problems is normally a set of bounding boxes corre-

sponding to the queried objects. Though strictly stating, some of these algo-

rithms do not address image segmentation problems, however their approaches

are of relevance to this domain.

RCNN (Region-based Convolutional Neural Networks) The introduc-

tion of the CNNs raised many new questions in the domain of computer vision.

One of them primarily being whether a network like AlexNet can be extended

to detect the presence of more than one object. Region-based-CNN [70] or

more commonly known as R-CNN used selective search technique to propose

probable object regions and performed classiﬁcation on the cropped window to

verify sensible localization based on the output probability distribution. Selec-

tive search technique [198, 200] analyses various aspects like texture, color, or

intensities to cluster the pixels into objects. The bounding boxes corresponding

to these segments are passed through classifying networks to short-list some of

the most sensible boxes. Finally, with a simple linear regression network tighter

co-ordinate can be obtained. The main downside of the technique is its compu-

tational cost. The network needs to compute a forward pass for every bounding

box proposition. The problem with sharing computation across all boxes was

that the boxes were of diﬀerent sizes and hence uniform sized features were not

achievable. In the upgraded Fast R-CNN [69], ROI (Region of Interest) Pooling

was proposed in which region of interests were dynamically pooled to obtain a

ﬁxed size feature output. Henceforth, the network was mainly bottlenecked by

the selective search technique for candidate region proposal. In Faster-RCNN

[175], instead of depending on external features, the intermediate activation

maps were used to propose bounding boxes, thus speeding up the feature ex-

traction process. Bounding boxes are representative of the location of the object,

however they do not provide pixel-level segments. The Faster R-CNN network

was extended as Mask R-CNN [76] with a parallel branch that performed pixel

level object speciﬁc binary classiﬁcation to provide accurate segments. With

Mask-RCNN an average precision of 35.7 was attained in the COCO[122] test

images. The family of RCNN algorithms have been depicted in ﬁg.7. Region

proposal networks have often been combined with other networks [118, 44] to

give instance level segmentations. RCNN was further improved under the name

of HyperNet [99] by using features from multiple layers of the feature extrac-

tor. Region proposal networks have also been implemented for instance speciﬁc

segmentation as well. As mentioned before object detection capabilities of ap-

proaches like RCNN are often coupled with segmentation models to generate

diﬀerent masks for diﬀerent instances of the same object[43].

4.1.3 DeepLab

While pixel level segmentation was eﬀective, two complementing issues were

still aﬀecting the performance. Firstly, smaller kernel sizes failed to capture

contextual information. In classiﬁcation problems, this is handled using pooling

layers that increases the sensory area of the kernels with respect to the original

image. But in segmentation that reduces the sharpness of the segmented output.

Alternative usage of larger kernels tend to be slower due to signiﬁcanty larger

number of trainable parameters. To handle this issue the DeepLab [30, 32]

family of algorithms demonstrated the usage of various methodologies like atrous

convolutions [211], spatial pooling pyramids [77] and fully connected conditional

random ﬁelds [100] to perform image segmentation with great eﬃciency. The

DeepLab algorithm was able to attain a meanIOU of 79.7 on the PASCAL VOC

2012 dataset[54].

Atrous/Dilated Convolution The size of the convolution kernels in any

layer determine the sensory response area of the network. While smaller kernels

extract local information, larger kernels try to focus on more contextual informa-

tion. However, larger kernels normally comes with more number of parameters.

剩余57页未读，继续阅读

苏锌雨

粉丝: 35
资源: 2

深度学习驱动的图像分割技术解析

Understanding Deep Learning

Understanding DeepLearning

[DeepLearning 综述]DeepLearning Segmentation

Bishop Pattern Recognition and Machine Learning

Analysis of Frequency Domain Deep Learning Techniques

Using Fully Convolutional Networks for Semantic Image Segmentation

Essential Basics and Techniques for Beginners

MATLAB Version and Hardware Requirements: Understanding Configuration Needs for Smooth Operation

Evaluation Methods for Unsupervised Learning: Assessing the Performance of Clustering Algorithms

C Language Image Pixel Data Reading and Analysis [Image Processing Library] CImg Library: A Powerful...

最新资源