HDNET: Exploiting HD Maps for 3D Object
Detection
Bin Yang
1,2
Ming Liang
1
Raquel Urtasun
1,2
1
Uber Advanced Technologies Group
2
University of Toronto
{byang10, ming.liang, urtasun}@uber.com
Abstract: In this paper we show that High-Definition (HD) maps provide strong
priors that can boost the performance and robustness of modern 3D object detec-
tors. Towards this goal, we design a single stage detector that extracts geometric
and semantic features from the HD maps. As maps might not be available every-
where, we also propose a map prediction module that estimates the map on the fly
from raw LiDAR data. We conduct extensive experiments on KITTI [1] as well as
a large-scale 3D detection benchmark containing 1 million frames, and show that
the proposed map-aware detector consistently outperforms the state-of-the-art in
both mapped and un-mapped scenarios. Importantly the whole framework runs at
20 frames per second.
Keywords: 3D Object Detection, HD Maps, Autonomous Driving
1 Introduction
Autonomous vehicles have the potential of providing cheaper and safer transportation. A typical
autonomous system is composed of the following functional modules: perception, prediction, plan-
ning and control [2]. Perception is concerned with detecting the objects of interest (e.g. vehicles) in
the scene and track them over time. The prediction module estimates the intentions and trajectories
of all actors into the future. Motion planning is responsible for producing a trajectory that is safe,
while control outputs the commands necessary for the self-driving vehicle to execute such trajectory.
3D object detection is a fundamental task in perception systems. Modern 3D object detectors [3, 4]
exploit LiDAR as input as it provides good geometric cues and eases 3D localization when compared
to camera-only approaches. In the context of real-time applications, single-shot detectors [5, 6, 7]
have been shown to be more promising than proposal-based methods [8, 4] as they are very efficient
and can produce very accurate estimates. However, object detection is far from solved as many
challenges remain, such as dealing with occlusion and the sparsity of the LiDAR at long range.
Most self-driving systems have access to High-Definition (HD) maps that contain geometric and
semantic information about the environment. While HD maps are widely used by motion planning
systems [9, 10], they are vastly ignored by perception systems [11]. In this paper we argue that
HD maps provide strong priors that can boost the performance and robustness of modern object
detectors. Towards this goal, we derive an efficient and effective single-stage detector that operates
in Bird’s Eye View (BEV) and fuses LiDAR information with rasterized maps. Bird’s eye view is
a good representation for 3D LiDAR as it is amenable to efficient inference and retains the metric
space. Since HD maps might not be available everywhere, we also propose a map prediction module
that estimates the map geometry and semantics from a single online LiDAR sweep.
Our experiments on the public KITTI BEV object detection benchmark [1] and a large-scale 3D
object detection benchmark TOR4D [3, 12] show that we can achieve significant Average Precision
(AP) gain on top of a state-of-the-art detector by exploiting HD maps. On TOR4D when HD maps
are available, we achieve 2.42%, 3.43% and 5.49% AP gains for ranges over 0-70 m, 30-50 m
and 50-70 m respectively. On KITTI, where HD maps are unavailable, we show that when using
a pre-trained map prediction module (trained on a different continent) we can still get 2.87% AP
gain, surpassing all competing methods including those which also exploit cameras. Importantly,
the proposed map-aware detector runs at 20 frames per second.
2nd Conference on Robot Learning (CoRL 2018), Z
¨
urich, Switzerland.