深度学习驱动的2D图像语义分割进展综述

需积分: 10 48 浏览量更新于2024-07-16 收藏 4.85MB PDF 举报

本文是一篇关于深度学习在2D图像语义分割领域的综述论文，由Irem Ulku和Erdem Akagunduz两位作者撰写，发表于2019年12月24日。他们来自土耳其Çankaya大学的电气与电子工程系。语义分割旨在对图像中的每个像素进行精细分类，这对于许多应用如计算机视觉、自动驾驶和医学图像分析至关重要。随着卷积神经网络（CNN）展现出卓越的特征提取能力，特别是在高级和层次化的图像理解上，过去的十年里，基于深度学习的2D语义分割方法的数量急剧增长。文章首先探讨了公共图像数据集和领导者排行榜，这些平台对于评估和比较不同方法的性能至关重要。作者详细介绍了当前研究中常见的技术路线，包括但不限于： 1. CNN架构：论文深入分析了各种深度学习架构，如FCN（全卷积网络）、U-Net、SegNet等，这些模型利用反卷积层来恢复原始分辨率的图像，并进行像素级别的分类。 2. 特征融合：作者讨论了如何通过多尺度特征融合、空间金字塔池化等手段，集成不同层级的特征信息，提高语义分割的准确性。 3. 数据增强：通过旋转、缩放、翻转等操作，增强训练数据的多样性，防止过拟合并提升模型泛化能力。 4. 迁移学习：介绍如何利用预训练的CNN模型，如VGG、ResNet或EfficientNet，作为基础模型，然后在其上进行微调，以适应特定的语义分割任务。 5. 端到端学习：强调了端到端训练的重要性，即从输入图像到像素级别的预测标签，无需人工设计复杂的特征选择步骤。 6. 深度监督与弱监督：区分了完全监督学习（每个像素都有标注）与弱监督学习（仅部分像素或类别的标注）的方法，以及它们之间的权衡。 7. 挑战与未来方向：论文还讨论了当前在2D语义分割领域面临的挑战，如处理小目标、场景复杂性以及计算效率问题，并展望了可能的研究趋势，如更高效的模型设计、自适应学习策略和跨模态融合。这篇综述为读者提供了一个全面的视角，帮助理解深度学习在2D图像语义分割中的最新进展和关键技术，对于研究人员、工程师以及希望了解这一领域的专业人士来说，具有很高的参考价值。

A PREPRINT - DECEMBER 24, 2019

a total of 2,688 images. These and other relatively primitive image sets have been mostly abandoned in the

semantic segmentation literature due to their limited resolution and low volume.

2.1.2 Urban Street Semantic Segmentation Image Sets

•

Cityscapes [

]: This is a largescale image set with a focus on the semantic understanding of urban street

scenes. It contains annotations for high-resolution images from 50 different cities, taken at different hours of

the day and from all seasons of the year, and also with varying background and scene layout. The annotations

are carried out at two quality levels: ﬁne for 5,000 images and course for 20,000 images. There are 30 different

class labels, some of which also have instance annotations (vehicles, people, riders etc.). Consequently, there

two challenges with separate public leaderboards

: one for pixel-level semantic segmentation, and a second

for instance-level semantic segmentation. There are more than 100 entries to the challenge, making it the most

popular regarding semantic segmentation of urban street scenes.

•

Other Urban Street Semantic Segmentation Image Sets: There are a number of alternative image sets for urban

street semantic segmentation, such as CamVid [

], KITTI [

], and SYNTHIA [

]. These are generally

overshadowed by the Cityscapes image set [

] for several reasons. Principally, their scale is relatively low.

Only the SYNTHIA image set [

] can be considered as largescale (with more than 13k annotated images);

however, it is an artiﬁcially generated image set, and this is considered a major limitation for security-critical

systems like driverless cars.

2.2 Performance Evaluation

There are two main criteria in evaluating the performance of semantics segmentation: accuracy, or in other words, the

success of an algorithm; and computation complexity in terms of speed and memory requirements. In this section we

analyse these two criteria separately.

2.2.1 Accuracy

Measuring the performance of segmentation can be complicated, mainly because there are two distinct values to measure.

The ﬁrst is classiﬁcation, which is simply determining the pixel-wise class labels; and the second is localisation, or

ﬁnding the correct set of pixels that enclose the object. Different metrics can be found in the literature to measure one or

both of these values. The following is a brief explanation of the principal measures most commonly used in evaluating

semantic segmentation performance.

•

ROC-AUC: ROC stands for the Receiver-Operator Characteristic curve, which summarises the trade-off

between true positive rate and false positive rate for a predictive model using different probability thresholds;

whereas AUC stands for the area under this curve, which is 1 at maximum. This tool is useful in the

interpretation of binary classiﬁcation problems, and is appropriate when observations are balanced between

classes. However, since most semantic segmentation image sets [

] are not balanced

between the classes, this metric is no longer used by the most popular challenges.

•

Pixel Accuracy: Also known as global accuracy [

], pixel accuracy (PA) is a very simple metric which

calculates the ratio between the amount of properly classiﬁed pixels and their total number. Mean pixel

accuracy (mPA), is a version of this metric which computes the ratio of correct pixels on a per-class basis.

mPA is also referred to as class average accuracy [27].

P A =

j=1

, mP A =

j=1

(1)

where

is the total number of pixels both classiﬁed and labelled as class j. In other words,

corresponds

to the total number of True Positives for class j. t

is the total number of pixels labelled as class j.

•

Intersection over Union (IoU): Also known as the Jaccard Index, IoU is a statistic used for comparing the

similarity and diversity of sample sets. In semantics segmentation, it is the ratio of the intersection of the

pixel-wise classiﬁcation results with the ground truth, to their union.

IoU =

j=1

+ n

)

, i 6= j (2)

https://www.cityscapes-dataset.com/benchmarks/

剩余19页未读，继续阅读

icanflysjt

粉丝: 0
资源: 3

深度学习驱动的2D图像语义分割进展综述

迁移学习综述a survey on transfer learning的整理下载

迁移学习入门级综述文章：A Survey on Transfer Learning

（原文+译文）A Survey on Transfer Learning_Pan and Yang_2010.pdf

藏经阁-LEAN APP Instagram Architectur.pdf

UMTS-System-Architecture-and-Protocol-Architectur_Windows编程_PDF_

64-ia-32-architectures-optimization-manual.zip_32_IA_architectur

webkitgtk-2.4.9-1.el7.x86_64.7z

erlang-21.3.1-1.el7.x86_64.rpm

iptables-1.8.4-20.el8.x86_64.rpm

五大框架_cn.pdf

最新资源