SPLATNet：高效稀疏格网处理点云的深度学习架构

需积分: 9 68 浏览量更新于2024-09-07 收藏 8.9MB PDF 举报

SPLATNet（Sparse Lattice Networks for Point Cloud Processing）是一项在2018年计算机视觉与模式识别（CVPR）会议上发表的研究成果，针对激光雷达点云数据的深度学习处理方法。该论文的作者包括Hang Su、Varun Jampani（均来自UMass Amherst）、Deqing Sun和Subhransu Maji（均来自NVIDIA），以及Evangelos Kalogerakis和Ming-Hsuan Yang（分别来自UMass Amherst和UC Merced），Jan Kautz来自NVIDIA。SPLATNet的核心创新在于提出了一种网络架构，直接操作高维稀疏格网表示的点云数据。传统的卷积神经网络（CNN）在处理高维格网上的点云时，会面临内存和计算效率的问题，因为随着格网尺寸的增长，所需的存储空间和计算量呈指数级增长。为解决这个问题，SPLATNet引入了稀疏双边卷积层作为基本构建块。这些稀疏卷积层利用索引结构，只对格网中的占用部分进行运算，显著降低了计算复杂性和内存需求。稀疏卷积层的另一个关键特性是其灵活性，允许对格网结构进行动态指定，从而支持层次化的特征学习，实现空间感知，并结合2D和3D推理。这种设计使得SPLATNet能够同时处理点基和图像基的表示，而且可以进行端到端的训练，即从输入数据到最终预测结果的全程优化。 SPLATNet在3D分割任务上展示了出色的性能，证明了其在处理大型、密集且复杂的点云数据集时的高效性和准确性。通过使用稀疏格网和稀疏卷积，SPLATNet不仅提高了模型的效率，还促进了点云处理领域的研究，推动了深度学习在实际应用中的广泛应用，如自动驾驶、机器人导航和环境感知等领域。这项工作为后续的研究者提供了一个有效的框架，用于开发更高效、更灵活的点云处理算法。

Splat

Input

Convolve

Segmentation

Slice

Figure 2: Bilateral Convolution Layer. Splat: BCL ﬁrst

interpolates input features F onto a d

-dimensional permu-

tohedral lattice deﬁned by the lattice features L at input

points. Convolve: BCL then does d

-dimensional convolu-

tion over this sparsely populated lattice. Slice: The ﬁltered

signal is then interpolated back onto the input signal. For

illustration, input and output are shown as point cloud and

the corresponding segmentation labels.

ways be desirable in man-made object segmentation and

classiﬁcation tasks where large deformations may change

the underlying shape or part functionalities and semantics.

We refer to Bronstein et al. [7] for an excellent review of

spectral, patch- and graph-based methods.

Joint 2D-3D networks. FusionNet [18] combines shape

classiﬁcation scores from a volumetric and a multi-view

network, yet this fusion happens at a late stage, after the

ﬁnal fully connected layer of these networks, and does not

jointly consider their intermediate local and global feature

representations. In our case, the 2D and 3D feature repre-

sentations are mapped onto the same lattice, enabling end-

to-end learning from both types of input representations.

3. Bilateral Convolution Layer

In this section, we brieﬂy review the Bilateral Convo-

lution Layer (BCL) that forms the basic building block of

our SPLATNet architecture for point clouds. BCL pro-

vides a way to incorporate sparse high-dimensional ﬁlter-

ing inside neural networks. In [22, 25], BCL was proposed

as a learnable generalization of bilateral ﬁltering [43, 2],

hence the name ‘Bilateral Convolution Layer’. Bilateral

ﬁltering involves a projection of a given 2D image into a

higher-dimensional space (e.g., space deﬁned by position

and color) and is traditionally limited to hand-designed ﬁl-

ter kernels. BCL provides a way to learn ﬁlter kernels in

high-dimensional spaces for bilateral ﬁltering. BCL is also

shown to be useful for information propagation across video

frames [21]. We observe that BCL has several favorable

properties to ﬁlter data that is inherently sparse and high-

dimensional, like point clouds. Here, we brieﬂy describe

how a BCL works and then discuss its properties.

3.1. Inputs to BCL

Let F ∈ R

n×d

be the given input features to a BCL,

where n denotes the number of input points and d

denotes

the dimensionality of input features at each point. For 3D

point clouds, input features can be low-level features such

as color, position, etc., and can also be high-level features

such as features generated by a neural network.

One of the interesting characteristics of BCL is that it

allows a ﬂexible speciﬁcation of the lattice space in which

the convolution operates. This is speciﬁed as lattice fea-

tures at each input point. Let L ∈ R

n×d

denote lattice

features at input points with d

denoting the dimensionality

of the feature space in which convolution operates. For in-

stance, the lattice features can be point position and color

(XY ZRGB) that deﬁne a 6-dimensional ﬁltering space for

BCL. For standard 3D spatial ﬁltering of point clouds, L is

given as the position (XY Z) of each point. Thus BCL takes

input features F and lattice features L of input points and

performs d

-dimensional ﬁltering of the points.

3.2. Processing steps in BCL

As illustrated in Figure 2, BCL has three processing

steps, splat, convolve and slice, that work as follows.

Splat. BCL ﬁrst projects the input features F onto the d

dimensional lattice deﬁned by the lattice features L, via

barycentric interpolation. Following [1], BCL uses a per-

mutohedral lattice instead of a standard Euclidean grid for

efﬁciency purposes. The size of lattice simplices or space

between the grid points is controlled by scaling the lattice

features ΛL, where Λ is a diagonal d

× d

scaling matrix.

Convolve. Once the input points are projected onto the d

dimensional lattice, BCL performs d

-dimensional convolu-

tion on the splatted signal with learnable ﬁlter kernels. Just

like in standard spatial CNNs, BCL allows an easy speciﬁ-

cation of ﬁlter neighborhood in the d

-dimensional space.

Slice. The ﬁltered signal is then mapped back to the input

points via barycentric interpolation. The resulting signal

can be passed on to other BCLs for further processing. This

step is called ‘slicing’. BCL allows slicing the ﬁltered sig-

nal onto a different set of points other than the input points.

This is achieved by specifying a different set of lattice fea-

tures L

out

∈ R

m×d

at m output points of interest.

All the above three processing steps in BCL can be writ-

ten as matrix multiplications:

= S

slice

conv

splat

, (1)

where F

denotes the c

column/channel of the input fea-

ture F and

denotes the corresponding ﬁltered signal.

3.3. Properties of BCL

There are several properties of BCL that makes it par-

ticularly convenient for point cloud processing. Here, we

mention some of those properties:

• The input points to BCL need not be ordered or lie on

a grid as they are projected onto a d

-dimensional grid

deﬁned by lattice features L

剩余11页未读，继续阅读

nihate

粉丝: 2015
资源: 24

SPLATNet：高效稀疏格网处理点云的深度学习架构

splatnet2statink:从SplatNet 2应用程序获取战斗数据并将其上传到stat.ink

Python-SPLATNetSparseLatticeNetworksforPointCloudProcessingCVPR2018

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

SCPNet: Semantic Scene Completion on Point Cloud用的算法是什么是基于transformer的吗？

香农代码的matlab-sparse_lowRank_regression:sparse_lowRank_regression

Sparse Modeling for Image and Vision Processing

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

数据结构精讲：Sparse Table与线段树算法

三维耳廓识别：Sparse+ICP算法的应用与优势

深度补全技术研究与应用：Sparse-Depth-Completion-master

最新资源