Published as a conference paper at ICLR 2018
As mentioned before, the output of the spherical correlation is a function on
SO(3)
. This is perhaps
somewhat counterintuitive, and indeed the conventional definition of spherical convolution gives
as output a function on the sphere. However, as shown in Appendix B, the conventional definition
effectively restricts the filter to be circularly symmetric about the Z axis, which would greatly limit
the expressive capacity of the network.
Rotation of SO(3) Signals
We defined the rotation operator
L
R
for spherical signals (eq. 1), and
used it to define spherical cross-correlation (eq. 4). To define the
SO(3)
correlation, we need to
generalize the rotation operator so that it can act on signals defined on
SO(3)
. As we will show,
naively reusing eq. 1 is the way to go. That is, for f : SO(3) → R
K
, and R, Q ∈ SO(3):
[L
R
f](Q) = f (R
−1
Q). (5)
Note that while the argument
R
−1
x
in Eq. 1 denotes the rotation of
x ∈ S
2
by
R
−1
∈ SO(3)
, the
analogous term R
−1
Q in Eq. 5 denotes to the composition of rotations (i.e. matrix multiplication).
Rotation Group Correlation
Using the same analogy as before, we can define the correlation of
two signals on the rotation group, f, ψ : SO(3) → R
K
, as follows:
[ψ ? f](R) = hL
R
ψ, fi =
Z
SO(3)
K
X
k=1
ψ
k
(R
−1
Q)f
k
(Q)dQ. (6)
The integration measure
dQ
is the invariant measure on
SO(3)
, which may be expressed in ZYZ-Euler
angles as dα sin(β)dβdγ/(8π
2
) (see Appendix A).
Equivariance
As we have seen, correlation is defined in terms of the rotation operator
L
R
. This
operator acts naturally on the input space of the network, but what justification do we have for using
it in the second layer and beyond?
The justification is provided by an important property, shared by all kinds of convolution and
correlation, called equivariance. A layer
Φ
is equivariant if
Φ ◦ L
R
= T
R
◦ Φ
, for some operator
T
R
.
Using the definition of correlation and the unitarity of L
R
, showing equivariance is a one liner:
[ψ ? [L
Q
f]](R) = hL
R
ψ, L
Q
fi = hL
Q
−1
R
ψ, fi = [ψ ? f](Q
−1
R) = [L
Q
[ψ ? f]](R).
(7)
The derivation is valid for spherical correlation as well as rotation group correlation.
4 FAST SPHERICAL CORRELATION WITH G-FFT
It is well known that correlations and convolutions can be computed efficiently using the Fast Fourier
Transform (FFT). This is a result of the Fourier theorem, which states that
[
f ∗ ψ =
ˆ
f ·
ˆ
ψ
. Since the
FFT can be computed in O(n log n) time and the product · has linear complexity, implementing the
correlation using FFTs is asymptotically faster than the naive O(n
2
) spatial implementation.
For functions on the sphere and rotation group, there is an analogous transform, which we will refer
to as the generalized Fourier transform (GFT) and a corresponding fast algorithm (GFFT). This
transform finds it roots in the representation theory of groups, but due to space constraints we will
not go into details here and instead refer the interested reader to Sugiura (1990) and Folland (1995).
Conceptually, the GFT is nothing more than the linear projection of a function onto a set of orthogonal
basis functions called “matrix element of irreducible unitary representations”. For the circle (
S
1
) or
line (
R
), these are the familiar complex exponentials
exp(inθ)
. For
SO(3)
, we have the Wigner D-
functions
D
l
mn
(R)
indexed by
l ≥ 0
and
−l ≤ m, n ≤ l
. For
S
2
, these are the spherical harmonics
3
Y
l
m
(x) indexed by l ≥ 0 and −l ≤ m ≤ l.
Denoting the manifold (
S
2
or
SO(3)
) by
X
and the corresponding basis functions by
U
l
(which is
either vector-valued (
Y
l
) or matrix-valued (
D
l
)), we can write the GFT of a function
f : X → R
as
ˆ
f
l
=
Z
X
f(x)U
l
(x)dx. (8)
3
Technically,
S
2
is not a group and therefore does not have irreducible representations, but it is a quotient of
groups SO(3)/ SO(2) and we have the relation Y
l
m
= D
l
m0
|
S
2
4