ClusterNet: Deep Hierarchical Cluster Network with Rigorously
Rotation-Invariant Representation for Point Cloud Analysis
Chao Chen
1
Guanbin Li
1∗
Ruijia Xu
1
Tianshui Chen
1,2
Meng Wang
3
Liang Lin
1,2
1
Sun Yat-sen University
2
DarkMatter AI Research
3
Hefei University of Technology
chench227@mail2.sysu.edu.cn, liguanbin@mail.sysu.edu.cn, xurj3@mail2.sysu.edu.cn
tianshuichen@gmail.com, wangmeng@hfut.edu.cn, linliang@ieee.org
Abstract
Current neural networks for 3D object recognition are
vulnerable to 3D rotation. Existing works mostly rely on
massive amounts of rotation-augmented data to alleviate
the problem, which lacks solid guarantee of the 3D rotation
invariance. In this paper, we address the issue by introduc-
ing a novel point cloud representation that can be mathe-
matically proved rigorously rotation-invariant, i.e., identi-
cal point clouds in different orientations are unified as a
unique and consistent representation. Moreover, the pro-
posed representation is conditional information-lossless,
because it retains all necessary information of point cloud
except for orientation information. In addition, the pro-
posed representation is complementary with existing net-
work architectures for point cloud and fundamentally im-
proves their robustness against rotation transformation. Fi-
nally, we propose a deep hierarchical cluster network called
ClusterNet to better adapt to the proposed representation.
We employ hierarchical clustering to explore and exploit
the geometric structure of point cloud, which is embed-
ded in a hierarchical structure tree. Extensive experimen-
tal results have shown that our proposed method greatly
outperforms the state-of-the-arts in rotation robustness on
rotation-augmented 3D object classification benchmarks.
1. Introduction
Rotation transformation is natural and common in 3D
world, however, it gives rise to an intractable challenge for
3D recognization. Theoretically, since SO(3)
1
is an infinite
∗
Corresponding author is Guanbin Li. This work was supported in part
by the State Key Development Program under Grant 2016YFB1001004,
in part by the National Natural Science Foundation of China under Grant
No.61602533 and No.61702565, in part by the Fundamental Research
Funds for the Central Universities under Grant No.18lgpy63. This work
was also sponsored by SenseTime Research Fund.
1
3D rotation group, denoted as SO(3), contains all rotation transforma-
tions in R
3
under the operation of composition.
group, a 3D object possesses rotated clones in infinite atti-
tudes, thus a machine learning model is obliged to extract
features from an extremely huge input space. For exam-
ple, in 3D object classification task, the category label of
an object is invariant against arbitrary rotation transforma-
tion in majority situations. However, from the perspective
of a classification model, an object and its rotated clone are
distinct in input metric space, hence the model, such as neu-
ral network based methods, should have enough capacity to
learn rotation invariance from data and then approximate a
complex function that maps identical objects in infinite atti-
tudes to similar features in feature metric space.
To alleviate the curse of rotation, a straightforward
method is to design a model with high capacity, such as a
deep neural network with considerable layers, and feed the
model with great amounts of rotation-augmented data [1]
based on a well-designed augmentation pipeline. Although
data augmentation is effective to some extent, it is computa-
tionally expensive in training phase and lacks solid guaran-
tee of rotation robustness. [11, 18] apply spatial transformer
network [5] to canonicalize the input data before feature ex-
traction, which improves the rotation-robustness of model
but still inherits all the defects of the data augmentation.
[16] proposes a rotation-equivariant network for 3D point
clouds using a special convolutional operation with local
rotation invariance as a basic block. The method attempts
to equip the neural network with rotation-symmetry. How-
ever, it is hard to guarantee the capacity of such network to
satisfy all rotation-equivariant constraints in each layer.
We address the issue by introducing a novel Rigorous
Rotation-Invariant (RRI) representation of point cloud.
Identical objects in different orientations are unified as a
consistent representation, which implies that the input space
is heavily reduced and the 3D recognization tasks become
much easier. It can be mathematically proved that the pro-
posed representation is rigorously rotation-invariant, and
information-lossless under a mild condition. Given any
data point in point cloud and a non-collinear neighbor ar-
4994