深度增强：自动驾驶中基于伪激光雷达的3D目标检测

下载需积分: 32 | PDF格式 | 6.68MB | 更新于2024-07-09 | 177 浏览量 | 举报

"这篇论文是关于在自动驾驶中利用伪激光雷达（Pseudo-LiDAR）进行精确3D物体检测的研究，发表于ICLR 2020年会议。研究团队来自康奈尔大学和俄亥俄州立大学，旨在通过改进立体图像的深度估计，提高伪激光雷达技术的性能，尤其是对于远距离物体的探测精度，从而弥补与昂贵的LiDAR传感器之间的性能差距。他们还探讨了利用额外的图像信息来增强伪激光雷达效果的可能性。" 在自动驾驶领域，3D物体检测是不可或缺的关键技术，主要用于识别车辆、行人等目标。传统的解决方案依赖于高成本的LiDAR（光探测和测距）传感器，以获取精确的深度信息。然而，近期出现的伪激光雷达技术提供了一种成本更低的替代方案，它基于立体图像生成类似LiDAR的深度信息。尽管伪激光雷达展现出了潜力，但其在远距离物体检测上的准确性仍然不足，这是当前的主要挑战。论文作者针对这一问题，对立体网络架构和损失函数进行了适应性调整，以更好地匹配精确深度估计的需求，尤其是针对远处物体。这表明他们可能对网络结构进行了优化，增强了对远距离特征的捕获能力，并调整了损失函数，使得网络更加关注远处物体的深度预测。此外，研究人员还探索了如何利用额外的图像信息来增强伪激光雷达的效果。这可能包括利用颜色、纹理或其他视觉特性来辅助深度估计，以提高整体的3D检测精度。这种融合不同信息源的方法可以提高对复杂场景的理解，进一步缩小伪激光雷达与真实LiDAR在性能上的差距。这篇论文贡献了对伪激光雷达技术的重要改进，提升了自动驾驶中的3D物体检测能力，特别是在远距离物体的检测上，为实现更经济且高效的自动驾驶系统提供了新的思路。这些研究成果对于推动自动驾驶技术的发展，降低技术成本，以及提高道路安全具有重要意义。

Published as a conference paper at ICLR 2020

Depth

Correction

Corrected Pseudo-LiDAR Point CloudSparse LiDAR Point Cloud

Dense Pseudo-LiDAR Point Cloud

KNN

graph

Weight Sharing

Feature

Extractor

left

feature

map

right

feature

map

3D conv

softmax

Disparity

Cost

Volume

Convert

Depth

Cost

Volume

Dense Predicted Depth Map

Feature

Extractor

Left Image

<latexit sha1_base64="ac35sPpnobXhLFI3vIXtMm+1InU=">AAAB6nicbVDLSgNBEOzxGRMfMR69DEYhp7AbD3qSgBe9RTQPSJYwO5lNhszOLjOzQljiF+jFgxK8+kXe/Bsnj4MmFjQUVd10d/mx4No4zjdaW9/Y3NrO7GRzu3v7B/nDQkNHiaKsTiMRqZZPNBNcsrrhRrBWrBgJfcGa/vB66jcfmdI8kg9mFDMvJH3JA06JsdL9bVd080Wn7MyAV4m7IMVqYfL8VDrN1br5r04voknIpKGCaN12ndh4KVGGU8HG2U6iWUzokPRZ21JJQqa9dHbqGJ9ZpYeDSNmSBs/U3xMpCbUehb7tDIkZ6GVvKv7ntRMTXHopl3FimKTzRUEisInw9G/c44pRI0aWEKq4vRXTAVGEGptO1obgLr+8ShqVsntertzZNK5gjgwcwwmUwIULqMIN1KAOFPrwAm/wjgR6RRP0MW9dQ4uZI/gD9PkDNQyP9A==</latexit>

Right Image

<latexit sha1_base64="emBOe26klE0Zw/kQRczFLd//sRU=">AAAB6nicbVDLSgNBEOzxGRMfMR69DEYhp7AbD3qSgBe9RTQPSJYwO5lNhszOLjOzQljiF+jFgxK8+kXe/Bsnj4MmFjQUVd10d/mx4No4zjdaW9/Y3NrO7GRzu3v7B/nDQkNHiaKsTiMRqZZPNBNcsrrhRrBWrBgJfcGa/vB66jcfmdI8kg9mFDMvJH3JA06JsdL9bVd180Wn7MyAV4m7IMVqYfL8VDrN1br5r04voknIpKGCaN12ndh4KVGGU8HG2U6iWUzokPRZ21JJQqa9dHbqGJ9ZpYeDSNmSBs/U3xMpCbUehb7tDIkZ6GVvKv7ntRMTXHopl3FimKTzRUEisInw9G/c44pRI0aWEKq4vRXTAVGEGptO1obgLr+8ShqVsntertzZNK5gjgwcwwmUwIULqMIN1KAOFPrwAm/wjgR6RRP0MW9dQ4uZI/gD9PkDPiSP+g==</latexit>

<latexit sha1_base64="Nd9wGqAnyDkpEBh0VAa7gM/R0VQ=">AAAB6nicbVDLTgJBEOzFF4IPxKOXiWjCieziQU+GxItHjPJIYENmh1mYMDuzmZk1IRv8Ar140BCvfpE3/8bhcVCwkk4qVd3p7gpizrRx3W8ns7G5tb2T3c3l9/YPDgtHxaaWiSK0QSSXqh1gTTkTtGGY4bQdK4qjgNNWMLqZ+a1HqjST4sGMY+pHeCBYyAg2Vrof9nivUHIr7hxonXhLUqoVp89P5bN8vVf46vYlSSIqDOFY647nxsZPsTKMcDrJdRNNY0xGeEA7lgocUe2n81Mn6NwqfRRKZUsYNFd/T6Q40nocBbYzwmaoV72Z+J/XSUx45adMxImhgiwWhQlHRqLZ36jPFCWGjy3BRDF7KyJDrDAxNp2cDcFbfXmdNKsV76JSvbNpXMMCWTiBUyiDB5dQg1uoQwMIDOAF3uDd4c6rM3U+Fq0ZZzlzDH/gfP4AZEaQEw==</latexit>

<latexit sha1_base64="+h3HGUomtKmMBqcbNGjX0uzF+Hk=">AAAB6nicbVDLTgJBEOzFF4IPxKOXiWjCieziQU+GxItHjPJIYENmh1mYMDuzmZk1IRv8Ar140BCvfpE3/8bhcVCwkk4qVd3p7gpizrRx3W8ns7G5tb2T3c3l9/YPDgtHxaaWiSK0QSSXqh1gTTkTtGGY4bQdK4qjgNNWMLqZ+a1HqjST4sGMY+pHeCBYyAg2Vrof9lSvUHIr7hxonXhLUqoVp89P5bN8vVf46vYlSSIqDOFY647nxsZPsTKMcDrJdRNNY0xGeEA7lgocUe2n81Mn6NwqfRRKZUsYNFd/T6Q40nocBbYzwmaoV72Z+J/XSUx45adMxImhgiwWhQlHRqLZ36jPFCWGjy3BRDF7KyJDrDAxNp2cDcFbfXmdNKsV76JSvbNpXMMCWTiBUyiDB5dQg1uoQwMIDOAF3uDd4c6rM3U+Fq0ZZzlzDH/gfP4AbV6QGQ==</latexit>

disp

<latexit sha1_base64="0UTOjvAXMEJksx/hVAkO9/WjhJo=">AAAB9HicbVA7SwNBEN7zGeMrammzGgSrcBcLrSSQxjKCeUByhL29SbJk7+HuXDQcKaytxcZCEVt/jJ3/xs2j0MQPBj6+b4aZ+bxYCo22/W0tLa+srq1nNrKbW9s7u7m9/ZqOEsWhyiMZqYbHNEgRQhUFSmjECljgSah7/fLYrw9AaRGFNziMwQ1YNxQdwRkayS23Wwj3mPpCx6N2Lm8X7AnoInFmJF86upOP5aeHSjv31fIjngQQIpdM66Zjx+imTKHgEkbZVqIhZrzPutA0NGQBaDedHD2iJ0bxaSdSpkKkE/X3RMoCrYeBZzoDhj09743F/7xmgp0LNxVhnCCEfLqok0iKER0nQH2hgKMcGsK4EuZWyntMMY4mp6wJwZl/eZHUigXnrFC8Nmlckiky5JAck1PikHNSIlekQqqEk1vyTF7JmzWwXqx362PaumTNZg7IH1ifP32HlYs=</latexit>

depth

<latexit sha1_base64="GkKr4lOisyHRnKH92XrPamM93Wk=">AAAB9XicbVA9SwNBEN3zM8avqKXNaRCswl0stJJAGssI5gOSM+ztTZIle3vH7pwxHCmsrQUbC0Vs/S92/hs3H4UmPhh4vDfDzDw/Flyj43xbS8srq2vrmY3s5tb2zm5ub7+mo0QxqLJIRKrhUw2CS6giRwGNWAENfQF1v18e+/U7UJpH8gaHMXgh7Ure4YyikW7L7RbCPaYBxNgbtXN5p+BMYC8Sd0bypaOBeCw/PVTaua9WELEkBIlMUK2brhOjl1KFnAkYZVuJhpiyPu1C01BJQ9BeOrl6ZJ8YJbA7kTIl0Z6ovydSGmo9DH3TGVLs6XlvLP7nNRPsXHgpl3GCINl0UScRNkb2OAI74AoYiqEhlClubrVZjyrK0ASVNSG48y8vklqx4J4VitcmjUsyRYYckmNySlxyTkrkilRIlTCiyDN5JW/WwHqx3q2PaeuSNZs5IH9gff4AQOuV+g==</latexit>

<latexit sha1_base64="Os4sBqyRMZjol0WZQqJOURH20t0=">AAAB6HicdVDLSgNBEJyNrxhfUY9eBoPgadmNhyQnA3rwmIB5YLKE2UknGTM7u8zMCmHJF3jxoEg8+jdevfk3ziYKPgsaiqpuurr9iDOlHefNyiwtr6yuZddzG5tb2zv53b2mCmNJoUFDHsq2TxRwJqChmebQjiSQwOfQ8sdnqd+6AalYKC71JAIvIEPBBowSbaT6VS9fcOzKScUtl/Bv4trOHIXTl1mKp1ov/9rthzQOQGjKiVId14m0lxCpGeUwzXVjBRGhYzKEjqGCBKC8ZB50io+M0seDUJoSGs/VrxMJCZSaBL7pDIgeqZ9eKv7ldWI9KHsJE1GsQdDFokHMsQ5xejXuMwlU84khhEpmsmI6IpJQbX6TM0/4vBT/T5pF2z2xi3WnUD1HC2TRATpEx8hFJVRFF6iGGogiQLfoHj1Y19ad9WjNFq0Z62NmH32D9fwOka2R7g==</latexit>

Figure 5:

The whole pipeline of improved stereo depth estimation:

(top) the stereo depth network (SDN)

constructs a depth cost volume from left-right images and is optimized for direct depth estimation; (bottom) the

graph-based depth correction algorithm (GDC) reﬁnes the depth map by leveraging sparser LiDAR signal. The

gray arrows indicates the observer’s view point. We superimpose the (green) ground-truth 3D box of a car, the

same one in Figure 1. The corrected points (blue; bottom right) are perfectly located inside the ground truth box.

throughout — which is clearly violated by the reciprocal depth to disparity relation (Figure 2). For

example, it may be completely appropriate to locally smooth two neighboring pixels with disparity

85 and 86 (changing the depth by a few cm to smooth out a surface), whereas applying the same

kernel for two pixels with disparity 5 and 6 could easily move the 3D points by 10m or more.

Taking this insight and the central assumption of convolutions — all neighborhoods can be operated

upon in an identical manner — into account, we propose to instead construct the depth cost volume

depth

, in which

depth

(u, v, z, :)

will encode features describing how likely the depth

Z(u, v)

pixel

(u, v)

. The subsequent 3D convolutions will then operate on the grid of depth, rather than

disparity, affecting neighboring depths identically, independent of their location. The resulting 3D

tensor S

depth

is then used to predict the pixel depth similar to Equation 3

Z(u, v) =

softmax(−S

depth

(u, v, z)) × z.

We construct the new depth volume,

depth

, based on the intuition that

depth

(u, v, z, :)

and

disp



u, v,

× b

, :



should lead to equivalent “cost”. To this end, we apply a bilinear interpolation

to construct

depth

from

disp

using the depth-to-disparity transform in Equation 2. Speciﬁcally, we

consider disparity in the range of

[0, 191]

following PSMNet (Chang & Chen, 2018), and consider

depth in the range of

[1m, 80m]

and set the grid of depth in

depth

to be 1m. Figure 5 (top) depicts

our stereo depth network (SDN) pipeline. Crucially, all convolution operations are operated on

depth

exclusively. Figure 4 compares the median values of absolute depth estimation errors using

the disparity cost volume (i.e., PSMNet) and the depth cost volume (SDN) (see subsection D.5 for

detailed numbers). As expected, for faraway depth, SDN leads to drastically smaller errors with only

marginal increases in the very near range (which disparity based methods over-optimize). See the

appendix for the detailed setup and more discussions.

4 DEPTH CORRECTION

Our SDN signiﬁcantly improves depth estimation and more precisely renders the object contours

(see Figure 3). However, there is a fundamental limitation in stereo because of the discrete nature of

pixels: the disparity, being the difference in the horizontal coordinate between corresponding pixels,

has to be quantized at the level of individual pixels while the depth is continuous. Although the

quantization error can be alleviated with higher resolution images, the computational depth prediction

cost scales cubically with resolution— pushing the limits of GPUs in autonomous vehicles.

We therefore explore a hybrid approach by leveraging a cheap LiDAR with extremely sparse (e.g.,

4 beams) but accurate depth measurements to correct this bias. We note that such sensors are too

剩余21页未读，继续阅读

潜夙

粉丝: 0

深度增强：自动驾驶中基于伪激光雷达的3D目标检测

自动驾驶中的3D目标检测

# End-to-end Pseudo-LiDAR for Image-Based 3D Object Detection Th

Pseudo-LiDAR from Visual Depth Estimation.pdf

pseudo-LiDAR_e2e:伪LiDAR_e2e

The Radial Basis Function (RBF) using pseudo-inverse method for Simulink: The Radial Basis Function (RBF) using pseudo-inverse method for Simulink。-matlab开发

Fuzzy clustering based pseudo-swept volume decomposition for hexahedral meshing

pseudo-3d-tensorflow:伪 3d 残差网络的 Tensorflow 实现

pseudo-3d-residual-networks-mxnet:支持伪3d残差网络（P-3D），sport1m和Kinetics预训练模型的mxnet部署版本

uib-ui-ux-pseudo-element-style-moseqmoseg：uib-ui-ux-pseudo-element-style-moseqmoseg由GitHub Classroom创建

MATLAB实现伪距单点定位_MATLAB_to_achieve_pseudo-distance_s_-Pseudo--

最新资源