深度学习驱动的语义分割进展与应用综述

需积分: 16 101 浏览量更新于2024-07-18 1 收藏 7.92MB PDF 举报

深度学习在语义分割领域的应用已经成为计算机视觉和机器学习研究的热点。语义分割旨在对图像中的每个像素进行精确分类，赋予每个对象或场景一个独特的语义标签，这对于自动驾驶、室内导航、虚拟现实和增强现实等众多应用至关重要。随着深度学习技术的快速发展，其在这些领域中的影响力日益增强。本篇综述论文由A. Garcia-Garcia等人撰写，详细探讨了深度学习方法如何应用于各种应用场景中的语义分割。首先，作者定义了语义分割的基本概念和技术背景，确保读者对研究领域有清晰的理解。他们强调了准确性和效率在实际应用中的关键性，特别是在对自动驾驶汽车识别道路标志、室内环境理解等方面的需求。接着，论文列举了主要的语义分割数据集，如Cityscapes、PASCAL VOC、COCO等，以及相关的挑战，如小目标检测、边界精度提升和多模态数据融合。这些数据集的存在帮助研究人员选择最符合项目需求的基准，并了解当前领域的研究趋势。深入部分，论文系统地审查了已有的深度学习方法，包括但不限于全卷积网络（FCN）、U-Net、SegNet和DeepLab等。每种方法都有其独特的优势和贡献，例如FCN首次将全连接层应用于卷积神经网络，显著提高了像素级别的标注性能；U-Net结合了下采样和上采样的结构，实现了高效的细节保留；而DeepLab系列则通过引入空洞卷积和金字塔池化来提升分辨率和精度。最后，论文提供了定量结果的比较分析，以评估不同方法在标准评估指标如IoU（Intersection over Union）和mAP（mean Average Precision）上的表现。这为研究者提供了衡量和改进现有技术的依据，同时促进了未来算法的进一步优化和创新。这篇综述旨在为读者提供一个全面的视角，以便他们更好地理解和利用深度学习技术在语义分割中的潜力，推动计算机视觉领域向着更高效、精准的方向发展。

useful information such as their purpose, number of classes,

data format, and training/validation/testing splits.

3.1 2D Datasets

Throughout the years, semantic segmentation has been

mostly focused on two-dimensional images. For that reason,

2D datasets are the most abundant ones. In this section

we describe the most popular 2D large-scale datasets for

semantic segmentation, considering 2D any dataset that

contains any kind of two-dimensional representations such

as gray-scale or Red Green Blue (RGB) images.

• PASCAL Visual Object Classes (VOC) [27]

: this

challenge consists of a ground-truth annotated

dataset of images and ﬁve different competitions:

classiﬁcation, detection, segmentation, action classi-

ﬁcation, and person layout. The segmentation one is

specially interesting since its goal is to predict the

object class of each pixel for each test image. There

are 21 classes categorized into vehicles, household,

animals, and other: aeroplane, bicycle, boat, bus, car,

motorbike, train, bottle, chair, dining table, potted

plant, sofa, TV/monitor, bird, cat, cow, dog, horse,

sheep, and person. Background is also considered if

the pixel does not belong to any of those classes.

The dataset is divided into two subsets: training

and validation with 1464 and 1449 images respec-

tively. The test set is private for the challenge. This

dataset is arguably the most popular for semantic

segmentation so almost every remarkable method in

the literature is being submitted to its performance

evaluation server to validate against their private

test set. Methods can be trained either using only

the dataset or either using additional information.

Furthermore, its leaderboard is public and can be

consulted online

• PASCAL Context [28]

: this dataset is an extension

of the PASCAL VOC 2010 detection challenge which

contains pixel-wise labels for all training images

(10103). It contains a total of 540 classes – includ-

ing the original 20 classes plus background from

PASCAL VOC segmentation – divided into three

categories (objects, stuff, and hybrids). Despite the

large number of categories, only the 59 most frequent

are remarkable. Since its classes follow a power law

distribution, there are many of them which are too

sparse throughout the dataset. In this regard, this

subset of 59 classes is usually selected to conduct

studies on this dataset, relabeling the rest of them

as background.

• PASCAL Part [29]

: this database is an extension of

the PASCAL VOC 2010 detection challenge which

goes beyond that task to provide per-pixel segmen-

tation masks for each part of the objects (or at least

silhouette annotation if the object does not have a

1. http://host.robots.ox.ac.uk/pascal/VOC/voc2012/

2. http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?

challengeid=11&compid=6

3. http://www.cs.stanford.edu/

∼

roozbeh/pascal-context/

4. http://www.stat.ucla.edu/

∼

xianjie.chen/pascal part dataset/

pascal part.html

consistent set of parts). The original classes of PAS-

CAL VOC are kept, but their parts are introduced,

e.g., bicycle is now decomposed into back wheel,

chain wheel, front wheel, handlebar, headlight, and

saddle. It contains labels for all training and valida-

tion images from PASCAL VOC as well as for the

9637 testing images.

• Semantic Boundaries Dataset (SBD) [30]

: this

dataset is an extended version of the aforementioned

PASCAL VOC which provides semantic segmenta-

tion ground truth for those images that were not

labelled in VOC. It contains annotations for 11355

images from PASCAL VOC 2011. Those annotations

provide both category-level and instance-level infor-

mation, apart from boundaries for each object. Since

the images are obtained from the whole PASCAL

VOC challenge (not only from the segmentation one),

the training and validation splits diverge. In fact,

SBD provides its own training (8498 images) and

validation (2857 images) splits. Due to its increased

amount of training data, this dataset is often used as

a substitute for PASCAL VOC for deep learning.

• Microsoft Common Objects in Context (COCO)

[31]

: is another image recognition, segmentation,

and captioning large-scale dataset. It features various

challenges, being the detection one the most relevant

for this ﬁeld since one of its parts is focused on

segmentation. That challenge, which features more

than 80 classes, provides more than 82783 images

for training, 40504 for validation, and its test set

consist of more than 80000 images. In particular,

the test set is divided into four different subsets or

splits: test-dev (20000 images) for additional vali-

dation, debugging, test-standard (20000 images) is

the default test data for the competition and the

one used to compare state-of-the-art methods, test-

challenge (20000 images) is the split used for the

challenge when submitting to the evaluation server,

and test-reserve (20000 images) is a split used to

protect against possible overﬁtting in the challenge

(if a method is suspected to have made too many

submissions or trained on the test data, its results will

be compared with the reserve split). Its popularity

and importance has ramped up since its appearance

thanks to its large scale. In fact, the results of the

challenge are presented yearly on a joint workshop

at the European Conference on Computer Vision

(ECCV)

together with ImageNet’s ones.

• SYNTHetic Collection of Imagery and Annotations

(SYNTHIA) [32]

: is a large-scale collection of photo-

realistic renderings of a virtual city, semantically

segmented, whose purpose is scene understanding in

the context of driving or urban scenarios.The dataset

provides ﬁne-grained pixel-level annotations for 11

classes (void, sky, building, road, sidewalk, fence,

vegetation, pole, car, sign, pedestrian, and cyclist). It

5. http://home.bharathh.info/home/sbd

6. http://mscoco.org/

7. http://image-net.org/challenges/ilsvrc+coco2016

8. http://synthia-dataset.net/

剩余22页未读，继续阅读

一..一

粉丝: 22

深度学习驱动的语义分割进展与应用综述

Halcon22.11深度学习语义分割工具：实战教程

深度学习在语义分割技术中的应用研究

深度学习语义分割数据集完整指南

深度学习图像语义分割项目

ECCV2020语义分割论文_神经网络_深度学习_语义分割_

使用MATLAB深度学习进行语义分割

深度学习-语义分割-基于UNet

深度学习在语义分割中的应用

halcon深度学习的语义分割

基于深度学习的语义分割技术研究

最新资源