Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
Weijing Shi and Ragunathan (Raj) Rajkumar
Carnegie Mellon University
Pittsburgh, PA 15213
{weijings, rajkumar}@cmu.edu
Abstract
In this paper, we propose a graph neural network to
detect objects from a LiDAR point cloud. Towards this
end, we encode the point cloud efficiently in a fixed ra-
dius near-neighbors graph. We design a graph neural net-
work, named Point-GNN, to predict the category and shape
of the object that each vertex in the graph belongs to. In
Point-GNN, we propose an auto-registration mechanism to
reduce translation variance, and also design a box merg-
ing and scoring operation to combine detections from mul-
tiple vertices accurately. Our experiments on the KITTI
benchmark show the proposed approach achieves leading
accuracy using the point cloud alone and can even sur-
pass fusion-based algorithms. Our results demonstrate the
potential of using the graph neural network as a new ap-
proach for 3D object detection. The code is available at
https://github.com/WeijingShi/Point-GNN.
1. Introduction
Understanding the 3D environment is vital in robotic per-
ception. A point cloud that composes a set of points in space
is a widely-used format for 3D sensors such as LiDAR. De-
tecting objects accurately from a point cloud is crucial in
applications such as autonomous driving.
Convolutional neural networks that detect objects from
images rely on the convolution operation. While the con-
volution operation is efficient, it requires a regular grid as
input. Unlike an image, a point cloud is typically sparse and
not spaced evenly on a regular grid. Placing a point cloud on
a regular grid generates an uneven number of points in the
grid cells. Applying the same convolution operation on such
a grid leads to potential information loss in the crowded
cells or wasted computation in the empty cells.
Recent breakthroughs in using neural networks [
3][22]
allow an unordered set of points as input. Studies take
advantage of this type of neural network to extract point
cloud features without mapping the point cloud to a grid.
However, they typically need to sample and group points
Figure 1. Three point cloud representations and their common pro-
cessing methods.
iteratively to create a point set representation. The re-
peated grouping and sampling on a large point cloud can
be computationally costly. Recent 3D detection approaches
[
10][21][16] often take a hybrid approach to use a grid and
a set representation in different stages. Although they show
some promising results, such hybrid strategies may suffer
the shortcomings of both representations.
In this work, we propose to use a graph as a compact
representation of a point cloud and design a graph neural
network called Point-GNN to detect objects. We encode
the point cloud natively in a graph by using the points as the
graph vertices. The edges of the graph connect neighbor-
hood points that lie within a fixed radius, which allows fea-
ture information to flow between neighbors. Such a graph
representation adapts to the structure of a point cloud di-
rectly without the need to make it regular. A graph neural
network reuses the graph edges in every layer, and avoids
grouping and sampling the points repeatedly.
Studies [
15][9][2][17] have looked into using graph
neural network for the classification and the semantic seg-
mentation of a point cloud. However, little research has
looked into using a graph neural network for the 3D object
detection in a point cloud. Our work demonstrates the fea-
sibility of using a GNN for highly accurate object detection
in a point cloud.
Our proposed graph neural network Point-GNN takes
the point graph as its input. It outputs the category and
bounding boxes of the objects to which each vertex be-
longs. Point-GNN is a one-stage detection method that de-
tects multiple objects in a single shot. To reduce the trans-
lation variance in a graph neural network, we introduce an
1711