语义分割神经网络ENet_语义分割算法 - CSDN文库

需积分: 50 168 浏览量更新于2023-03-16 评论收藏 2.94MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

ENet: A Deep Neural Network Architecture for

Real-Time Semantic Segmentation

Adam Paszke

Faculty of Mathematics, Informatics and Mechanics

University of Warsaw, Poland

a.paszke@students.mimuw.edu.pl

Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello

Electrical and Computer Engineering

Purdue University, USA

aabhish, sangpilkim, euge@purdue.edu

Abstract

The ability to perform pixel-wise semantic segmentation in real-time is of

paramount importance in mobile applications. Recent deep neural networks aimed

at this task have the disadvantage of requiring a large number of ﬂoating point oper-

ations and have long run-times that hinder their usability. In this paper, we propose

a novel deep neural network architecture named ENet (efﬁcient neural network),

created speciﬁcally for tasks requiring low latency operation. ENet is up to 18

×

faster, requires 75

×

less FLOPs, has 79

×

less parameters, and provides similar or

better accuracy to existing models. We have tested it on CamVid, Cityscapes and

SUN datasets and report on comparisons with existing state-of-the-art methods,

and the trade-offs between accuracy and processing time of a network. We present

performance measurements of the proposed architecture on embedded systems and

suggest possible software improvements that could make ENet even faster.

1 Introduction

Recent interest in augmented reality wearables, home-automation devices, and self-driving vehicles

has created a strong need for semantic-segmentation (or visual scene-understanding) algorithms

that can operate in real-time on low-power mobile devices. These algorithms label each and every

pixel in the image with one of the object classes. In recent years, the availability of larger datasets

and computationally-powerful machines have helped deep convolutional neural networks (CNNs)

[

1

,

2

,

3

,

4

] surpass the performance of many conventional computer vision algorithms [

5

,

6

,

7

]. Even

though CNNs are increasingly successful at classiﬁcation and categorization tasks, they provide coarse

spatial results when applied to pixel-wise labeling of images. Therefore, they are often cascaded with

other algorithms to reﬁne the results, such as color based segmentation [

8

] or conditional random

ﬁelds [9], to name a few.

In order to both spatially classify and ﬁnely segment images, several neural network architectures

have been proposed, such as SegNet [

10

,

11

] or fully convolutional networks [

12

]. All these works

are based on a VGG16 [

13

] architecture, which is a very large model designed for multi-class

classiﬁcation. These references propose networks with huge numbers of parameters, and long

inference times. In these conditions, they become unusable for many mobile or battery-powered

applications, which require processing images at rates higher than 10 fps.

In this paper, we propose a new neural network architecture optimized for fast inference and high

accuracy. Examples of images segmented using ENet are shown in Figure 1. In our work, we chose

arXiv:1606.02147v1 [cs.CV] 7 Jun 2016

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余9页未读，立即下载

评论0

若可再来

粉丝: 14
资源: 10

会员权益专享

图片转文字

全年可省5，000元立即开通

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈