SBNet:利用稀疏块网络加速深度推理

版权申诉

16 浏览量更新于2024-09-09 收藏 1.45MB PDF 举报

SBNet（Sparse Blocks Network）是一项针对实时应用尤其是自动驾驶领域（如Uber）的深度学习技术，它提出了一种新颖的方法来提高卷积神经网络（Convolutional Neural Networks, CNNs）的推理速度。传统CNNs在处理几百层时，采用均匀的卷积操作，这在实时任务中可能导致较高的计算成本。然而，对于像对象检测和语义分割这类问题，往往可以通过先验知识或低分辨率网络得出一个低成本的计算掩码，这些掩码可以帮助我们减少高分辨率主网络中的运算量。 SBNet的主要创新在于将这种计算稀疏性结构引入到大型任务中，不同于之前针对小规模任务的稀疏激活CNN，SBNet不仅关注理论上的浮点运算次数（FLOPs）减少，还追求在实际应用中的性能提升。尽管稀疏网络在理论上的模型复杂度降低，但如果没有与高度优化的密集卷积实现相比较，其速度优势可能并未得到充分展示。 SBNet的工作机制是通过设计一种策略，将计算密集的部分替换为基于掩码的稀疏卷积操作。这种方法可以动态地根据输入数据的特征选择性地执行卷积，从而减少不必要的计算。这样做的好处是可以大幅节省资源，特别是在处理大量高分辨率图像时，使得模型能够在保持准确性的前提下，实现快速的推理。为了实现这一目标，SBNet可能包括了以下几个关键组件： 1. **计算掩码生成**：首先，利用先验知识或辅助网络预测出输入数据的计算掩码，该掩码指示哪些部分的卷积操作对最终结果影响较大，哪些部分可以被忽略。 2. **稀疏卷积设计**：针对生成的掩码，设计一种高效且适应性强的稀疏卷积算法，能够有效地处理非均匀分布的计算需求。 3. **硬件优化**：考虑到稀疏卷积的特性，可能需要对硬件进行优化，以充分利用稀疏计算的优势，如硬件级的并行性和流水线设计。 4. **实验验证**：通过严谨的实验对比，证明在实际场景下的对象检测和语义分割任务中，SBNet在保持或接近原有精度的同时，实现了显著的运行速度提升。 5. **可扩展性**：SBNet方法应当具有良好的可扩展性，能够适应不同规模和复杂度的深度学习模型，以满足自动驾驶系统不断增长的需求。 SBNet是一种创新的框架，通过利用计算稀疏性来提升深度学习模型的实时性能，为无人驾驶等实时应用场景提供了有效的解决方案。它的成功在于将理论优化与实际应用相结合，展示了在不牺牲精度的前提下提高计算效率的可能性。

SBNet: Sparse Blocks Network for Fast Inference

Mengye Ren

∗ 1,2

, Andrei Pokrovsky

∗ 1

, Bin Yang

∗ 1,2

, Raquel Urtasun

1,2

Uber Advanced Technologies Group

University of Toronto

{mren3,andrei,byang10,urtasun}@uber.com

Abstract

Conventional deep convolutional neural networks

(CNNs) apply convolution operators uniformly in space

across all feature maps for hundreds of layers - this incurs

a high computational cost for real-time applications. For

many problems such as object detection and semantic

segmentation, we are able to obtain a low-cost computation

mask, either from a priori problem knowledge, or from a

low-resolution segmentation network. We show that such

computation masks can be used to reduce computation

in the high-resolution main network. Variants of sparse

activation CNNs have previously been explored on small-

scale tasks and showed no degradation in terms of object

classiﬁcation accuracy, but often measured gains in terms

of theoretical FLOPs without realizing a practical speed-

up when compared to highly optimized dense convolution

implementations. In this work, we leverage the sparsity

structure of computation masks and propose a novel

tiling-based sparse convolution algorithm. We veriﬁed the

effectiveness of our sparse CNN on LiDAR-based 3D object

detection, and we report signiﬁcant wall-clock speed-ups

compared to dense convolution without noticeable loss of

accuracy.

1. Introduction

Deep convolutional neural networks (CNNs) have led

to major breakthroughs in many computer vision tasks

[21]. While model accuracy consistently improves with the

number of layers [11], as current standard networks use over

a hundred convolution layers, the amount of computation

involved in deep CNNs can be prohibitively expensive for

real-time applications such as autonomous driving.

Spending equal amount of computation at all spatial lo-

cations is a tremendous waste, since spatial sparsity is ubiq-

uitous in many applications: in autonomous driving, only

∗

Equal contribution.

Code available at https://github.com/uber/sbnet

Gather

Scatter

Convolution

Figure 1: Our proposed tiled sparse convolution module

the areas on the road matter for object detection; in video

segmentation, only occluded and fast-moving pixels require

recomputation; in 3D object classiﬁcation [34], sparsity

is directly encoded in the inputs as voxel occupancy. In

these examples, spatial sparsity can be represented as binary

computation masks where ones indicate active locations

that need more computation and zeros inactive. In cases

where such masks are not directly available from the inputs,

we can predict them in the form of visual saliency [16]

or objectness prior [20] by using another relatively cheap

network or even a part of the main network itself [4, 25].

These binary computation masks can be efﬁciently in-

corporated into the computation of deep CNNs: instead of

convolving the input features at every location, we propose

to use the masks to guide the convolutional ﬁlters. Compu-

tation masks can also be considered as a form of attention

mechanism where the attention weights are binary. While

most current uses of attention in computer vision have been

predominantly targeted at better model interpretability and

higher prediction accuracy, our work highlights the beneﬁt

of attentional inference speed-up.

In this work, we leverage structured sparsity patterns of

computation masks and propose Sparse Blocks Networks

(SBNet), which computes convolution on a blockwise de-

composition of the mask. We implemented our proposed

sparse convolution kernels (fragments of parallel code) on

graphics processing unit (GPU) and we report wall-clock

time speed-up compared against state-of-the-art GPU dense

arXiv:1801.02108v2 [cs.CV] 7 Jun 2018

下载后可阅读完整内容，剩余9页未读，立即下载

电动汽车控制与安全

粉丝: 269
资源: 4186

SBNet:利用稀疏块网络加速深度推理

SBNet：稀疏块网络加速推理技术解析

scikit-sparse扩展库：Python稀疏矩阵处理新选择

Python压缩包文件sparse-0.8.0-py2.py3安装指南

SBNet- Sparse Blocks Network for Fast Inference.zip

CVPR-2009-Sparse Subspace Clustering.pdf

Sparse Adaptive Filters for Echo Cancellation.pdf

Efficient Sparse Pose Adjustment for 2D Mapping.pdf

multi-dimensional classification via sparse label encoding.pdf

hierarchical-group-sparse-regularization-master.zip

Java-Sparse-Matrix-master.zip_matrix sparse java

最新资源