深度学习驱动的语义分割技术综述：应用与挑战

需积分: 10 150 浏览量更新于2024-07-18 收藏 8.01MB PDF 举报

本文是一篇深入探讨深度学习技术在语义分割领域的应用综述文章，由A.Garcia-Garcia等人撰写。随着计算机视觉和机器学习研究的日益发展，图像语义分割的需求正在增长，特别是在自动驾驶、室内导航以及虚拟现实等众多领域，它对于场景理解和精确高效地划分图像区域至关重要。深度学习在这些应用场景中的崛起，特别是针对语义分割的任务，使得该研究领域成为关注焦点。首先，作者在文中明确了语义分割的定义及其在计算机视觉中的核心概念，确保读者对基本术语有清晰理解。这包括像素级别的分类，即识别图像中的每个像素属于哪个物体类别。接下来，作者详细介绍了当前主要的语义分割数据集，如Cityscapes、PASCAL VOC和COCO等，这些数据集为研究者提供了丰富的训练和评估基准，帮助他们选择最适合自身研究需求和目标的数据集。随后，文章深入剖析了现有的深度学习方法，如全卷积网络（FCN）、U-Net、SegNet、DeepLab等，讨论了它们各自的创新点、优点和局限性。每种方法都与其应用场景紧密相连，例如FCN通过全连接层实现像素级预测，而U-Net则结合了上采样和下采样结构以保持高分辨率特征。作者还强调了这些方法在提高分割准确性和效率方面的重要贡献，以及它们在推动领域进步中的关键作用。最后，作者通过定量结果和实验对比，展示了不同方法在各种任务上的性能，为研究者提供了实用的参考和指导。同时，文章还讨论了未来的研究趋势，如多模态融合、轻量化模型设计和自监督学习在语义分割中的潜力，以及如何应对数据不平衡和实时性挑战。这篇综述为读者提供了一个全面的视角，不仅涵盖了深度学习语义分割的基础理论，还包括了最新的研究成果和技术动态，是理解和跟进这一领域研究的重要参考资料。无论是初入该领域的学生还是经验丰富的研究人员，都能从中收获有价值的信息和洞见。

useful information such as their purpose, number of classes,

data format, and training/validation/testing splits.

3.1 2D Datasets

Throughout the years, semantic segmentation has been

mostly focused on two-dimensional images. For that reason,

2D datasets are the most abundant ones. In this section

we describe the most popular 2D large-scale datasets for

semantic segmentation, considering 2D any dataset that

contains any kind of two-dimensional representations such

as gray-scale or Red Green Blue (RGB) images.

• PASCAL Visual Object Classes (VOC) [27]

: this

challenge consists of a ground-truth annotated

dataset of images and ﬁve different competitions:

classiﬁcation, detection, segmentation, action classi-

ﬁcation, and person layout. The segmentation one is

specially interesting since its goal is to predict the

object class of each pixel for each test image. There

are 21 classes categorized into vehicles, household,

animals, and other: aeroplane, bicycle, boat, bus, car,

motorbike, train, bottle, chair, dining table, potted

plant, sofa, TV/monitor, bird, cat, cow, dog, horse,

sheep, and person. Background is also considered if

the pixel does not belong to any of those classes.

The dataset is divided into two subsets: training

and validation with 1464 and 1449 images respec-

tively. The test set is private for the challenge. This

dataset is arguably the most popular for semantic

segmentation so almost every remarkable method in

the literature is being submitted to its performance

evaluation server to validate against their private

test set. Methods can be trained either using only

the dataset or either using additional information.

Furthermore, its leaderboard is public and can be

consulted online

• PASCAL Context [28]

: this dataset is an extension

of the PASCAL VOC 2010 detection challenge which

contains pixel-wise labels for all training images

(10103). It contains a total of 540 classes – includ-

ing the original 20 classes plus background from

PASCAL VOC segmentation – divided into three

categories (objects, stuff, and hybrids). Despite the

large number of categories, only the 59 most frequent

are remarkable. Since its classes follow a power law

distribution, there are many of them which are too

sparse throughout the dataset. In this regard, this

subset of 59 classes is usually selected to conduct

studies on this dataset, relabeling the rest of them

as background.

• PASCAL Part [29]

: this database is an extension of

the PASCAL VOC 2010 detection challenge which

goes beyond that task to provide per-pixel segmen-

tation masks for each part of the objects (or at least

silhouette annotation if the object does not have a

1. http://host.robots.ox.ac.uk/pascal/VOC/voc2012/

2. http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?

challengeid=11&compid=6

3. http://www.cs.stanford.edu/

∼

roozbeh/pascal-context/

4. http://www.stat.ucla.edu/

∼

xianjie.chen/pascal part dataset/

pascal part.html

consistent set of parts). The original classes of PAS-

CAL VOC are kept, but their parts are introduced,

e.g., bicycle is now decomposed into back wheel,

chain wheel, front wheel, handlebar, headlight, and

saddle. It contains labels for all training and valida-

tion images from PASCAL VOC as well as for the

9637 testing images.

• Semantic Boundaries Dataset (SBD) [30]

: this

dataset is an extended version of the aforementioned

PASCAL VOC which provides semantic segmenta-

tion ground truth for those images that were not

labelled in VOC. It contains annotations for 11355

images from PASCAL VOC 2011. Those annotations

provide both category-level and instance-level infor-

mation, apart from boundaries for each object. Since

the images are obtained from the whole PASCAL

VOC challenge (not only from the segmentation one),

the training and validation splits diverge. In fact,

SBD provides its own training (8498 images) and

validation (2857 images) splits. Due to its increased

amount of training data, this dataset is often used as

a substitute for PASCAL VOC for deep learning.

• Microsoft Common Objects in Context (COCO)

[31]

: is another image recognition, segmentation,

and captioning large-scale dataset. It features various

challenges, being the detection one the most relevant

for this ﬁeld since one of its parts is focused on

segmentation. That challenge, which features more

than 80 classes, provides more than 82783 images

for training, 40504 for validation, and its test set

consist of more than 80000 images. In particular,

the test set is divided into four different subsets or

splits: test-dev (20000 images) for additional vali-

dation, debugging, test-standard (20000 images) is

the default test data for the competition and the

one used to compare state-of-the-art methods, test-

challenge (20000 images) is the split used for the

challenge when submitting to the evaluation server,

and test-reserve (20000 images) is a split used to

protect against possible overﬁtting in the challenge

(if a method is suspected to have made too many

submissions or trained on the test data, its results will

be compared with the reserve split). Its popularity

and importance has ramped up since its appearance

thanks to its large scale. In fact, the results of the

challenge are presented yearly on a joint workshop

at the European Conference on Computer Vision

(ECCV)

together with ImageNet’s ones.

• SYNTHetic Collection of Imagery and Annotations

(SYNTHIA) [32]

: is a large-scale collection of photo-

realistic renderings of a virtual city, semantically

segmented, whose purpose is scene understanding in

the context of driving or urban scenarios.The dataset

provides ﬁne-grained pixel-level annotations for 11

classes (void, sky, building, road, sidewalk, fence,

vegetation, pole, car, sign, pedestrian, and cyclist). It

5. http://home.bharathh.info/home/sbd

6. http://mscoco.org/

7. http://image-net.org/challenges/ilsvrc+coco2016

8. http://synthia-dataset.net/

剩余22页未读，继续阅读

孤风醉影

粉丝: 1
资源: 7

深度学习驱动的语义分割技术综述：应用与挑战

翻译 Review on Deep Learning Segmentation （应用于语义分割问题的深度学习技术综述）

[DeepLearning 综述]DeepLearning Segmentation

Getting Started with Semantic Segmentation using DL:Getting Started with Deep Learning Semantic Segmentation using your own image dataset-matlab开发

论文笔记：Learning Deconvolution Network for Semantic Segmentation

Describe the steps of deep learning semantic segmentation in detail .

Research on Pedestrian Attribute Recognition Based on Semantic Segmentation

Semantic Segmentation of Point Clouds using Deep Learning

SemanticSegmentation

A SURVEY ON DEEP LEARNING-BASED ARCHITECTUR.pdf

DeepLab: Semantic Image Segmentation

最新资源