深度学习SLAM：MagicPoint与MagicWarp详解

需积分: 15 134 浏览量更新于2024-08-05 收藏 7.44MB PDF 举报

“MagicPoint-2017.pdf”是一篇顶会论文，主要探讨了深度学习在几何SLAM（Simultaneous Localization and Mapping）中的应用，由Magic Leap公司的研究人员Daniel DeTone、Tomasz Malisiewicz和Andrew Rabinovich共同撰写。正文：这篇论文是“superglue三部曲”的第一部分，关注的是2017年提出的MagicPoint系统。该系统采用深度卷积神经网络来处理单个图像，提取出显著的2D关键点。这些关键点被称为“SLAM-ready”，因为它们在设计上是孤立的，并且在整个图像中均匀分布。这一特性对于SLAM算法至关重要，因为它允许算法更有效地理解和重建环境。在论文中，作者对比了MagicPoint网络与传统的点检测器，发现在图像噪声存在的情况下，深度学习方法具有显著的性能优势。传统的点检测方法可能会受到噪声的严重影响，而MagicPoint网络则能更好地抵抗这种干扰，提取出稳定的关键点。接着，作者介绍了第二个网络——MagicWarp，它是专门用于处理由MagicPoint网络生成的点图像对，估计它们之间的同构变换。这个变换引擎的独特之处在于它不依赖局部点描述符，仅依靠点的位置信息进行变换估计。这种方法简化了变换的计算过程，可能提高了鲁棒性和准确性。训练这两个网络时，研究人员使用了简单的合成数据，这降低了实际数据采集的复杂性，同时也使得网络能够在各种环境中泛化。通过这种方式，MagicPoint和MagicWarp可以协同工作，为SLAM系统提供更稳定、更准确的点跟踪和几何估计。这篇论文为深度学习在计算机视觉领域的应用开辟了新的道路，特别是在SLAM技术中。MagicPoint和MagicWarp的结合不仅提高了关键点检测的效率，还降低了对传统特征描述符的依赖，为增强现实（AR）和机器人导航等领域提供了强大的工具。

VGG-like

Encoder

Softmax

Drop Last Dim

Reshape

8*8+1

M/8

N/8

8*8+1

8*8

M/8

N/8

M/8

N/8

Input Image

MagicPoint

Heatmap

Figure 2: MagicPoint architecture. The MagicPoint network operates on grayscale images and outputs a

“point-ness” probability for each pixel. We use a VGG-style encoder combined with an explicit decoder. Each

spatial location in the ﬁnal 15x20x65 tensor represents a probability distribution over a local 8x8 region plus a

single dustbin channel which represents no point being detected (8 ∗ 8 + 1 = 65). The network is trained using

a standard cross entropy loss, using point supervision from the 2D shape renderer (see examples in Figure 3).

3 Deep Point-Based Tracking Overview

The general architecture of our Deep Point-Based Tracking system is shown in Figure 1. There are

two convolutional neural networks that perform the majority of computation in the tracking system:

MagicPoint and MagicWarp. We discuss these two models in detail below.

3.1 MagicPoint Overview

MagicPoint Motivation. The ﬁrst step in most sparse SLAM pipelines is to detect stable 2D interest

point locations in the image. This step is traditionally performed by computing corner-like gradient

response maps such as the second moment matrix [15] or difference of Gaussians [16] and detecting

local maxima. The process is typically repeated at various image scales. Additional steps may

be performed to evenly distribute detections throughout the image, such as requiring a minimum

number of corners within an image cell [6]. This process typically involves a high amount of domain

expertise and hand engineering, which limits generalization and robustness. Ideally, interest points

should be detected in high sensor noise scenarios and low light. Lastly, we should get a conﬁdence

score for each point we detect that can be used to help reject spurious points and up-weigh conﬁdent

points later in the SLAM pipeline.

MagicPoint Architecture. We designed a custom convolutional network architecture and training

data pipeline to help meet the above criteria. Ultimately, we want to map an image I to a point

response image P with equivalent resolution, where each pixel of the output corresponds to a prob-

ability of “corner-ness” for that pixel in the input. The standard network design for dense prediction

involves an encoder-decoder pair, where the spatial resolution is decreased via pooling or strided

convolution, and then upsampled back to full resolution via upconvolution operations, such as done

in [17]. Unfortunately, upsampling layers tend to add a high amount of compute, thus we designed

the MagicPoint with an explicit decoder

to reduce the computation of the model. The convolu-

tional neural network uses a VGG style encoder to reduce the dimensionality of the image from

120x160 to 15x20 cell grid, with 65 channels for each spatial position. In our experiments we chose

the QQVGA resolution of 120x160 to keep the computation small. The 65 channels correspond to

local, non-overlapping 8x8 grid regions of pixels plus an extra dustbin channel which corresponds

to no point being detected in that 8x8 region. The network is fully convolutional, using 3x3 con-

volutions followed by BatchNorm normalization and ReLU non-linearity. The ﬁnal conv layer is a

1x1 convolution and more details are shown in Figure 2.

MagicPoint Training. What parts of an image are interest points? They are typically deﬁned by

computer vision and SLAM researchers as uniquely identiﬁable locations in the image that are stable

across a variety of viewpoint, illumination, and image noise variations. Ultimately, when used as a

preprocessing step for a Sparse SLAM system, they must detect points that work well for a given

SLAM system. Designing and choosing hyper parameters of point detection algorithms requires

expert and domain speciﬁc knowledge, which is why we have not yet seen a single dominant point

extraction algorithm persisting across many SLAM systems.

There is no large database of interest point labeled images that exists today. To avoid an expensive

data collection effort, we designed a simple renderer based on available OpenCV [19] functions.

We render simple geometric shapes such as triangles, quadrilaterals, stars, lines, checkerboards, 3D

Our decoder has no parameters, and is known as “sub-pixel convolution” [18] or “depth to space” inside

TensorFlow.

剩余13页未读，继续阅读

图灵动力

粉丝: 13
资源: 5

深度学习SLAM：MagicPoint与MagicWarp详解

基于cnn卷积神经网络的特征点提取与相机估计研究.pdf

程序设计模式实例分析

txt2tags text converter-开源

superpoint-pytorch:Superpoint https的Pytorch实现

李白高力士脱靴李白贺知章告别课本剧.pptx

Spring Cloud 学习过程记录，含多方面知识及系列教程.zip

C语言项目之超级万年历系统源码.zip

Jupyter_OReilly书的代码存储库.zip

51单片机加减乘除计算器系统设计（proteus8.17,keil5），复制粘贴就可以运行

《中国房地产统计年鉴》面板数据资源-精心整理.zip

最新资源