视觉语义里程计：中长距离跟踪与自动驾驶性能提升

需积分: 9 127 浏览量更新于2024-07-17 收藏 3.28MB PDF 举报

视觉语义里程计（VSO: VisualSemanticOdometry）是一种创新的计算机视觉方法，它在视觉导航领域引起了广泛关注。传统的视觉 Odometry（VO）依赖于图像之间的像素级匹配来估计相机姿态和地图。这种方法通常分为两类：直接法，通过直接优化相机运动参数；间接法，先检测特征点再进行跟踪。然而，这些方法在长期内可能会因为光照变化、遮挡或环境重复性不足而出现数据关联困难。现有的直接和间接视觉 Odometry 系统主要依赖短期跟踪，即帧与帧之间的连续约束，以保持相机的稳定估计。为了克服这种局限，VSO框架引入了新的视角，即利用语义信息作为中长期的跟踪支持。语义信息可以是物体识别、场景理解或者语义地图，它们能够提供更稳定的匹配线索，尤其是在视觉特征变得不可靠时。 VSO的关键在于如何有效地融合视觉和语义信息。这可能包括开发算法来识别和跟踪具有语义意义的显著点（如道路标志、建筑物轮廓等），并将这些特征与视觉特征相结合，形成更为鲁棒的数据关联模型。这样做的好处是，即使在视觉特征消失或难以匹配的情况下，语义信息也能提供额外的定位线索，从而提高定位精度和稳定性。在自动驾驶等实际应用中，VSO展示了显著的优势。它能够在没有明显视觉特征的区域（如室内、夜晚或雨天）维持相对连续的跟踪，并且在遇到环路闭合（loop closures）时，能够更好地校准之前的位置估计。实验结果证明，相比于当前最先进的方法，VSO在处理复杂真实世界场景时表现出更高的性能，仅通过集成语义信息就实现了显著的提升。 VSO为视觉导航系统提供了一种增强型的解决方案，它不仅扩展了传统视觉方法的应用范围，而且提高了导航系统的可靠性和鲁棒性，对于未来的自动驾驶、机器人导航和增强现实等领域具有重要的推动作用。

4 K.-N. Lianos, J.L. Sch¨onberger, M. Pollefeys, and T. Sattler

or to obtain richer map representations [39, 42]. Conversely, VO can be used to

improve object detection [11,15,25,38]. Most similar to our approach are object-

based SLAM [3, 5, 7, 15, 40, 43] and Structure-from-Motion [4, 16] approaches.

They use object detections as higher-level semantic features to improve camera

pose tracking [4,7,15,16] and / or to detect and handle loop closures [3,5,15,43].

While some approaches rely on a database of speciﬁc objects that are detected

online [7, 15,43], others use generic object detectors [3,5,16]. The former require

that all objects are known and mapped beforehand. The latter need to solve a

data association problem to resolve the ambiguities arising from detecting the

same object class multiple times in an image. Bowman et al. were the ﬁrst to

jointly optimize over continuous camera poses, 3D point landmarks, and object

landmarks (represented by bounding volumes [5, 16]) as well as over discrete

data associations [5]. They use a probabilistic association model to avoid the

need for hard decisions. In contrast, our approach does not need a discrete data

association by considering continuous distances to object boundaries rather than

individual object detections. By focusing on the boundaries of semantic objects,

we are able to handle a larger corpus of semantic object classes. Speciﬁcally,

we are able to use both convex objects as well as semantic classes that cannot

be described by bounding boxes, such as street, sky, and building. Compared

to [5], who focus on handling loop closures, our approach aims at reducing drift

through medium-term continuous data associations.

Semantic image-to-model alignment methods use semantics to align images

with 3D models [8, 45, 50, 51]. Cohen et al. stitch visually disconnected models

by measuring the quality of an alignment using 3D point projections into a se-

mantically segmented image. Taneja et al. estimate an initial alignment between

a panorama and a 3D model based on semantic segmentation [50]. They then

alternate between improving the segmentation and the alignment. Most closely

related to our approach is concurrent work by Toft et al. [51], who project se-

mantically labeled 3D points into semantically segmented images. Similar to us,

they construct error maps for each class via distance ﬁelds. Given an initial guess

for the camera pose, the errors associated with the 3D points are then used to

reﬁne the pose. They apply their approach to visual localization and thus assume

a pre-built and pre-labeled 3D model. In contrast, our approach is designed for

VO and optimizes camera poses via a semantic error term while simultaneously

constructing a labeled 3D point cloud. Toft et al. incrementally include more

classes in the optimization and ﬁx parts of the pose at some point. In contrast,

our approach directly considers all classes.

3 Visual Semantic Odometry

The goal of this paper is to reduce drift in visual odometry by establishing

continuous medium-term correspondences. Since both direct and indirect VO

approaches are often not able to track a point over a long period of time contin-

uously, we use scene semantics to establish such correspondences.

剩余16页未读，继续阅读

CLM_Only

粉丝: 21
资源: 47

视觉语义里程计：中长距离跟踪与自动驾驶性能提升

visual_odometry_deep_learning：使用具有深度学习功能的视觉测距法绘制地图

Anti Virus专题之病毒的原理.txt

linux-semantic是一个Haskell库和命令行工具用于解析分析和比较源代码

vso-colinsalmcorner-extensions:Colin 的 VSO ALM 角扩展

VSO-TeamCity-ChromeExt:将TeamCity构建结果集成到Visual Studio Online设置中

vso.converters.v2.1-Cerberus.rar

do6-vso-browser:EAN堆栈应用程序，用于与VSO工作项进行交互

Shorten Azure DevOps URL with vso.io-crx插件

VSO16基础入门视频教程VISIO2016培训.pdf

vsts-node-api, node.js的Visual Studio 团队服务客户端.zip

最新资源