Splat
Input
Convolve
Segmentation
Slice
Figure 2: Bilateral Convolution Layer. Splat: BCL first
interpolates input features F onto a d
l
-dimensional permu-
tohedral lattice defined by the lattice features L at input
points. Convolve: BCL then does d
l
-dimensional convolu-
tion over this sparsely populated lattice. Slice: The filtered
signal is then interpolated back onto the input signal. For
illustration, input and output are shown as point cloud and
the corresponding segmentation labels.
ways be desirable in man-made object segmentation and
classification tasks where large deformations may change
the underlying shape or part functionalities and semantics.
We refer to Bronstein et al. [7] for an excellent review of
spectral, patch- and graph-based methods.
Joint 2D-3D networks. FusionNet [18] combines shape
classification scores from a volumetric and a multi-view
network, yet this fusion happens at a late stage, after the
final fully connected layer of these networks, and does not
jointly consider their intermediate local and global feature
representations. In our case, the 2D and 3D feature repre-
sentations are mapped onto the same lattice, enabling end-
to-end learning from both types of input representations.
3. Bilateral Convolution Layer
In this section, we briefly review the Bilateral Convo-
lution Layer (BCL) that forms the basic building block of
our SPLATNet architecture for point clouds. BCL pro-
vides a way to incorporate sparse high-dimensional filter-
ing inside neural networks. In [22, 25], BCL was proposed
as a learnable generalization of bilateral filtering [43, 2],
hence the name ‘Bilateral Convolution Layer’. Bilateral
filtering involves a projection of a given 2D image into a
higher-dimensional space (e.g., space defined by position
and color) and is traditionally limited to hand-designed fil-
ter kernels. BCL provides a way to learn filter kernels in
high-dimensional spaces for bilateral filtering. BCL is also
shown to be useful for information propagation across video
frames [21]. We observe that BCL has several favorable
properties to filter data that is inherently sparse and high-
dimensional, like point clouds. Here, we briefly describe
how a BCL works and then discuss its properties.
3.1. Inputs to BCL
Let F ∈ R
n×d
f
be the given input features to a BCL,
where n denotes the number of input points and d
f
denotes
the dimensionality of input features at each point. For 3D
point clouds, input features can be low-level features such
as color, position, etc., and can also be high-level features
such as features generated by a neural network.
One of the interesting characteristics of BCL is that it
allows a flexible specification of the lattice space in which
the convolution operates. This is specified as lattice fea-
tures at each input point. Let L ∈ R
n×d
l
denote lattice
features at input points with d
l
denoting the dimensionality
of the feature space in which convolution operates. For in-
stance, the lattice features can be point position and color
(XY ZRGB) that define a 6-dimensional filtering space for
BCL. For standard 3D spatial filtering of point clouds, L is
given as the position (XY Z) of each point. Thus BCL takes
input features F and lattice features L of input points and
performs d
l
-dimensional filtering of the points.
3.2. Processing steps in BCL
As illustrated in Figure 2, BCL has three processing
steps, splat, convolve and slice, that work as follows.
Splat. BCL first projects the input features F onto the d
l
-
dimensional lattice defined by the lattice features L, via
barycentric interpolation. Following [1], BCL uses a per-
mutohedral lattice instead of a standard Euclidean grid for
efficiency purposes. The size of lattice simplices or space
between the grid points is controlled by scaling the lattice
features ΛL, where Λ is a diagonal d
l
× d
l
scaling matrix.
Convolve. Once the input points are projected onto the d
l
-
dimensional lattice, BCL performs d
l
-dimensional convolu-
tion on the splatted signal with learnable filter kernels. Just
like in standard spatial CNNs, BCL allows an easy specifi-
cation of filter neighborhood in the d
l
-dimensional space.
Slice. The filtered signal is then mapped back to the input
points via barycentric interpolation. The resulting signal
can be passed on to other BCLs for further processing. This
step is called ‘slicing’. BCL allows slicing the filtered sig-
nal onto a different set of points other than the input points.
This is achieved by specifying a different set of lattice fea-
tures L
out
∈ R
m×d
l
at m output points of interest.
All the above three processing steps in BCL can be writ-
ten as matrix multiplications:
ˆ
F
c
= S
slice
B
conv
S
splat
F
c
, (1)
where F
c
denotes the c
th
column/channel of the input fea-
ture F and
ˆ
F
c
denotes the corresponding filtered signal.
3.3. Properties of BCL
There are several properties of BCL that makes it par-
ticularly convenient for point cloud processing. Here, we
mention some of those properties:
• The input points to BCL need not be ordered or lie on
a grid as they are projected onto a d
l
-dimensional grid
defined by lattice features L
in
.