MV3D：自动驾驶中的高精度多视角3D物体检测技术

需积分: 16 127 浏览量更新于2024-09-07 收藏 2.82MB PDF 举报

本文主要探讨的是自动驾驶领域中的一项关键技术——多视角3D物体检测（Multi-View3D Object Detection）。在现代自动驾驶系统中，高精度的三维对象识别是至关重要的，因为它能帮助车辆实时理解和避开障碍物，确保行驶安全。文章的焦点是作者提出的MV3D（Multi-View3D Network）框架，这是一个集成了激光雷达点云(LIDAR)和彩色图像(RGB)感知融合的创新方法。 MV3D的核心思想在于利用两种不同的传感器数据源的优势。首先，LIDAR提供了精确的3D空间信息，其点云数据对于三维空间的建模至关重要。然而，LIDAR数据通常是稀疏的，因此作者提出了一种紧凑的多视图表示方式来编码这些点云，使得处理更为高效。这种编码方法能够有效地保留3D点云的结构和空间关系。为了进一步提升检测性能，MV3D网络分为两个子网络：一个负责3D物体候选框的生成，即从鸟瞰视图的3D点云中提取潜在的物体位置和尺寸信息；另一个则专注于多视图特征融合。这一设计允许网络在不同视图之间进行区域级特征交互，增强了特征的表达能力和对复杂场景的理解。通过深度融合策略，网络能够整合来自多个视角的特征，增强对目标物体的识别能力。实验部分，作者在具有挑战性的KITTI基准测试上进行了评估，结果表明，与当前最先进的3D物体检测算法相比，MV3D方法在平均精度指标（Average Precision, AP）上实现了约25%至30%的显著提升。这证实了MV3D在自动驾驶场景下的优越性能，并为后续研究提供了一个强有力的参考点。总结来说，本文的核心贡献在于设计了一个有效的多视角3D物体检测网络，它通过结合激光雷达和视觉信息，提高了自动驾驶系统在三维空间中物体识别的精度和鲁棒性，对于推动自动驾驶技术的实际应用具有重要意义。在未来的研究中，这种融合感知和深度学习的技术可能会继续发展，以适应更复杂的道路环境和更高的安全标准。

Multi-View 3D Object Detection Network for Autonomous Driving

Xiaozhi Chen

, Huimin Ma

, Ji Wan

, Bo Li

, Tian Xia

Department of Electronic Engineering, Tsinghua University

Baidu Inc.

{chenxz12@mails., mhmpub@}tsinghua.edu.cn, {wanji, libo24, xiatian}@baidu.com

Abstract

This paper aims at high-accuracy 3D object detection in

autonomous driving scenario. We propose Multi-View 3D

networks (MV3D), a sensory-fusion framework that takes

both LIDAR point cloud and RGB images as input and pre-

dicts oriented 3D bounding boxes. We encode the sparse

3D point cloud with a compact multi-view representation.

The network is composed of two subnetworks: one for 3D

object proposal generation and another for multi-view fea-

ture fusion. The proposal network generates 3D candidate

boxes efﬁciently from the bird’s eye view representation of

3D point cloud. We design a deep fusion scheme to combine

region-wise features from multiple views and enable inter-

actions between intermediate layers of different paths. Ex-

periments on the challenging KITTI benchmark show that

our approach outperforms the state-of-the-art by around

25% and 30% AP on the tasks of 3D localization and 3D

detection. In addition, for 2D detection, our approach ob-

tains 10.3% higher AP than the state-of-the-art on the hard

data among the LIDAR-based methods.

1. Introduction

3D object detection plays an important role in the vi-

sual perception system of Autonomous driving cars. Mod-

ern self-driving cars are commonly equipped with multiple

sensors, such as LIDAR and cameras. Laser scanners have

the advantage of accurate depth information while cameras

preserve much more detailed semantic information. The fu-

sion of LIDAR point cloud and RGB images should be able

to achieve higher performance and safty to self-driving cars.

The focus of this paper is on 3D object detection utiliz-

ing both LIDAR and image data. We aim at highly accu-

rate 3D localization and recognition of objects in the road

scene. Recent LIDAR-based methods place 3D windows

in 3D voxel grids to score the point cloud [26, 7] or ap-

ply convolutional networks to the front view point map in

a dense box prediction scheme [17]. Image-based meth-

ods [4, 3] typically ﬁrst generate 3D box proposals and

then perform region-based recognition using the Fast R-

CNN [10] pipeline. Methods based on LIDAR point cloud

usually achieve more accurate 3D locations while image-

based methods have higher accuracy in terms of 2D box

evaluation. [11, 8] combine LIDAR and images for 2D

detection by employing early or late fusion schemes. How-

ever, for the task of 3D object detection, which is more chal-

lenging, a well-designed model is required to make use of

the strength of multiple modalities.

In this paper, we propose a Multi-View 3D object detec-

tion network (MV3D) which takes multimodal data as input

and predicts the full 3D extent of objects in 3D space. The

main idea for utilizing multimodal information is to perform

region-based feature fusion. We ﬁrst propose a multi-view

encoding scheme to obtain a compact and effective repre-

sentation for sparse 3D point cloud. As illustrated in Fig. 1,

the multi-view 3D detection network consists of two parts:

a 3D Proposal Network and a Region-based Fusion Net-

work. The 3D proposal network utilizes a bird’s eye view

representation of point cloud to generate highly accurate

3D candidate boxes. The beneﬁt of 3D object proposals

is that it can be projected to any views in 3D space. The

multi-view fusion network extracts region-wise features by

projecting 3D proposals to the feature maps from mulitple

views. We design a deep fusion approach to enable inter-

actions of intermediate layers from different views. Com-

bined with drop-path training [15] and auxiliary loss, our

approach shows superior performance over the early/late fu-

sion scheme. Given the multi-view feature representation,

the network performs oriented 3D box regression which

predict accurate 3D location, size and orientation of objects

in 3D space.

We evaluate our approach for the tasks of 3D proposal

generation, 3D localization, 3D detection and 2D detec-

tion on the challenging KITTI [9] object detection bench-

mark. Experiments show that our 3D proposals signiﬁ-

cantly outperforms recent 3D proposal methods 3DOP [4]

and Mono3D [3]. In particular, with only 300 proposals, we

obtain 99.1% and 91% 3D recall at Intersection-over-Union

(IoU) threshold of 0.25 and 0.5, respectively. The LIDAR-

arXiv:1611.07759v3 [cs.CV] 22 Jun 2017

下载后可阅读完整内容，剩余8页未读，立即下载

CKanard

粉丝: 0
资源: 2

MV3D：自动驾驶中的高精度多视角3D物体检测技术

Python-基于立体声RCNN的自动驾驶三维物体检测

VoxelNet End-to-End

Python-MonoGRNet一种用于单目三维物体检测和定位的几何推理网络KITTI

3d 物体检测 卡车汽车

自动驾驶2D目标检测和3D目标检测区别

多视图 3d 目标检测

写几部分关于自动驾驶技术中路况信息检测的PPT里的内容

yolov5 实现自动驾驶

自动驾驶目标检测yolov8

最新资源

3d 物体检测卡车汽车