3D模型生成合成图像提升语义分割性能

155 浏览量更新于2024-08-27 收藏 633KB PDF 举报

"使用3D模型合成训练图像进行语义分割" 在计算机视觉领域，语义分割是一项核心任务，它涉及到将图像的每个像素分配到预定义的类别中，如行人、车辆、建筑物等。近年来，卷积神经网络（CNNs）在解决这一问题上取得了显著的进步，它们能够学习到丰富的特征并进行精确的区域划分。然而，CNNs的高效性能往往依赖于大量带有像素级注释的训练图像，而这些图像的获取非常耗时且成本高昂。针对这个问题，本文提出了一种创新方法，即使用3D模型来自动生成带有像素级注释的合成图像。这种方法的优势在于，通过随机采样渲染参数，可以创建出物体外观和背景复杂度极高的合成图像。例如，可以改变光照、纹理、视角等因素，使生成的图像具有多样性，更接近真实世界的复杂性。此外，通过添加随机背景图案，可以模拟实际场景中的各种环境因素，进一步增加图像的现实感。合成图像的生成过程如下：首先，选择合适的3D模型库，然后通过算法随机选择模型和相应的参数进行渲染；其次，将生成的合成图像与公开可用的真实世界图像结合，以此作为训练数据集的补充。这种数据增强策略有助于扩大训练集的规模，同时丰富了模型的训练样本，使得CNN能够更好地泛化到未见过的场景。实验结果显示，在PASCAL VOC 2012数据集上，使用包含合成图像的训练集进行训练的CNN在语义分割任务上的性能得到了提升。PASCAL VOC 2012是一个广泛用于评估语义分割模型的标准数据集，包含了20个不同的类别。通过比较在标准数据集上的表现，我们可以看到合成图像对提高模型的泛化能力起到了积极作用。利用3D模型生成合成图像是一种有效的方法，它可以减轻人工注释的负担，同时提高CNN在语义分割任务中的性能。这种方法为解决深度学习模型训练数据需求与实际可用数据之间的矛盾提供了一个新的思路，对于推动计算机视觉领域的发展具有重要意义。未来的研究可能会进一步探索如何优化合成图像的质量，以及如何结合更多真实世界数据以实现更好的模型性能。

Synthesizing Training Images for Semantic

Segmentation

Yunhui Zhang

, Zizhao Wu

, Zhiping Zhou

,Yigang Wang

Digite Media Interactive Simulation Lab, Hangzhou Dianzi University, Hangzhou

ZJ 310018, China

School of Computer Science, Hangzhou Dianzi University, Hangzhou ZJ 310018,

China

Abstract. Semantic segmentation is one of the key problems in the

computer vision area. Recently, Convolutional Neural Networks (CNNs)

have yielded a signiﬁcant performance for the semantic segmentation

task. However, CNNs require a suﬃcient amount of annotated train-

ing images, which is challenging since massive human labour is needed.

In this paper, we propose to use 3D models to automatically gener-

ate synthetic images with pixel-level annotations. We take advantage of

3D models to generate synthetic images of high diversity in object ap-

pearance and background clutterness, by randomly sampling rendering

parameters and adding random background patterns. Then, we use the

synthetic images to augment training samples for semantic segmentation

by combining with publicly available real-world images. Experimental re-

sults demonstrate that CNNs trained with our synthetic images improve

performance on the semantic segmentation task in the PASCAL VOC

2012 dataset.

Keywords: semantic segmentation,synthesizing training images,CNN,

augmentation,generate synthetic images

1 Introduction

Semantic image segmentation is the problem of labeling each pixel in an im-

age with a semantic class. This is a fundamental problem in computer vision

with many applications in scene understanding [4], automatic driving [7], video

surveillance [23], etc. Such problem has been addressed in the past using various

traditional computer vision techniques [18, 2, 20, 11].

In the recent years, tremendous progress has been made through the use

of deep Convolutional Neural Networks (CNNs) due to their rich hierarchical

features and an end-to-end trainable framework [26, 27, 5, 1]. For example, the

Fully Convolutional Network (FCN) method proposed by Long et al. [13] has

been showed that convolutional network architectures that had originally been

developed for image classiﬁcation can be successfully repurposed for dense pre-

diction, which signiﬁcantly surpasses the prior state of the art by a large margin

in terms of accuracy and sometimes even eﬃciency. However, CNNs require

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38685694

粉丝: 4
资源: 900

3D模型生成合成图像提升语义分割性能

基于深度学习高分辨率遥感影像语义分割.pdf

语义分割txt数据集

如何制备语义分割所需要的标签图像

carla鸟瞰图语义分割

文字语义分割数据集有哪些

请给出MIA，MICCAI，MICCAI，NIPS，CVPR，ICLR在2020到2021年的语义分割方面的高亮工作

图像领域的最新研究进展

去除图片中指定的物体怎么做

cityscapes雾天数据集是怎么制作的

mxnet怎么生成训练集

最新资源