在线场景坐标回归：提升RGB-D相机实时重定位性能

需积分: 0 112 浏览量更新于2024-06-29 收藏 4.71MB PDF 举报

本文档标题为《Let's Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation》。该研究聚焦于计算机视觉领域的在线相机重定位技术，特别是场景坐标回归（Scene Coordinate Regression, SCoRe）网络在实时三维（RGB-D）相机定位中的应用。在许多应用场景中，如机器人导航、增强现实（AR）或自主驾驶，相机需要能够在无需昂贵的线下训练数据的情况下，在线进行精确的场景定位，尤其是在复杂和未知环境中。传统的关键帧匹配方法虽然可以在一定程度上实现在线定位，但它们往往在离开训练轨迹后表现下降，且在纹理稀疏的区域难以提供稳定的匹配。相比之下，SCoRe方法因其对新姿态的泛化能力和利用密集对应关系提高鲁棒性而受到关注。近期的研究已经展示了如何在不同场景之间适应SCoRe模型，使得其在线性能得以优化。然而，这些SCoRe方法主要依赖于针对室内环境设计的手工特征，导致它们在更艰难的户外场景中表现不佳。为了解决这个问题，本论文提出了一个创新的方法，旨在通过改进SCoRe网络的适应性，使其能够更好地处理户外环境下的挑战，包括光照变化、复杂背景和动态元素。作者们提出了一种策略，可能是通过深度学习技术来学习和提取更通用的特征，或者开发一种自适应框架，能够在实时场景中动态调整模型参数，以提升室外定位的精度和稳定性。论文的主要贡献可能包括： 1. **场景适应性SCoRe**：开发了一种算法，使网络能够自动适应新的室外环境，通过学习或迁移学习的方式，将已有的室内训练知识扩展到室外场景。 2. **特征提取与融合**：探讨了如何结合深度学习特征提取技术，如卷积神经网络（CNN），与传统的手工特征，以增强模型的泛化能力。 3. **在线学习和更新**：提出了一种在线学习机制，允许模型在实际使用过程中持续接收新数据，从而不断优化其性能，即使在遇到未见过的场景或变化时也能保持较高的定位精度。 4. **评估与实验**：详细描述了实验设置和基准测试，通过对比与当前最佳方法的性能，验证了新方法的有效性和实用性。总结来说，这篇论文为解决RGB-D相机在线定位在户外场景中的问题提供了新的解决方案，通过改进SCoRe网络的适应性和特征提取，有望推动计算机视觉领域在实际应用中的稳健性与准确性。这对于推动未来自动驾驶车辆、无人机以及增强现实设备在复杂环境中的定位技术具有重要意义。

single scene from one of our datasets (see §3). Further de-

tails about the architecture and precisely how we train our

networks can be found in the supplementary material.

2.3. Online ScoreNet Prediction Adaptation

Problem Formulation. A ScoreNet trained ofﬂine on an

RGB-D sequence of a scene, as in §2.2, can later be used

to relocalise new images in the same scene. This targets an

ofﬂine formulation of the relocalisation problem, in which

both training and testing are performed on the same scene,

and there are no constraints on the time available for train-

ing. However, this formulation does not take into account

the practical requirements on a camera relocaliser for live

scenarios such as interactive dense SLAM [53], in which

it is infeasible to spend hours or even days training a relo-

caliser on the scene of interest; rather, a relocaliser must be

trained online as the user moves around the scene, and then

be usable immediately when camera tracking fails.

To address such scenarios, we target the alternative on-

line formulation of the relocalisation problem proposed by

Cavallari et al. [13], in which there are three stages: ofﬂine

training (‘pre-training’), online training and testing. Ofﬂine

training is performed on sequences of RGB-D frames (with

known poses) from one or more scenes, generally other than

the target scene. Online training is then performed on a

single RGB-D sequence (again with known poses, e.g. as

produced by a camera tracker) from the target scene. Fi-

nally, testing is performed on a single RGB or RGB-D im-

age whose pose is to be determined. (For interactive SLAM,

the idea is that a user will move around the scene at online

training time, either training a new relocaliser online, or

adapting a pre-trained relocaliser online to function in the

target scene. If and when camera tracking fails, the trained

relocaliser can then be used to recover the camera pose.)

Cavallari et al. [13, 12] described their online training

stage as ‘adaptation’ because they were adapting a pre-

trained regression forest to relocalise in the target scene.

In particular, they showed that the branching structure of

a scene coordinate regression forest can be seen as a scene-

independent way of clustering the pixels in an image based

on their appearance. Based on this insight, they adapted a

pre-trained forest to a new scene by emptying the reservoirs

in its leaves and reﬁlling them with points from the new

scene at online training time, and then using the forest to

look up the reservoirs again to provide correspondences at

test time. Inspired by this approach, we show in this paper

how to adapt the predictions of a ScoreNet so as to allow

these relocalisers too to be deployed in an online context.

Reservoir Prediction. The adaptation scheme described

in [13, 12] was highly effective, but relied on the fact that

their forest does not predict points in any particular scene

directly, but instead predicts leaves containing reservoirs of

points, which can then be used to generate the needed cor-

respondences. These reservoirs can be reﬁlled with points

from the new scene, which is what allowed their method to

work, but it is not straightforward to see how it can be trans-

ferred to ScoreNets that directly predict individual points in

the pre-training scene. To achieve this, we thus propose a

new scheme that, rather than clustering pixels into leaves

based on routing their associated feature vectors down a

regression forest, clusters them into cells in a grid placed

over their associated predictions in the pre-training scene

(see Figure 1). Note that this implicitly clusters pixels in

the input image based on their predicted pre-training scene

locations, rather than directly based on their appearance. In-

tuitively, a ScoreNet, which has been deliberately trained to

map similar-looking pixels in an image to similar 3D points

in the pre-training scene, can in practice do this for images

of any scene, not just the one on which it was trained, and

hence pre-training scene location can be used as a reason-

able proxy for appearance (see §3.2 for a discussion).

As mentioned in §2.2, our ScoreNets take an RGB image

of size w×h as input, and produce as output a w/8×h/8×3

tensor that contains a predicted 3D point (in the scene on

which the ScoreNet was trained) for a regularly-spaced sub-

set of pixels in the image. We initially map each of these

predicted points, p = (p

, p

) ∈ R

, to a grid cell index

as follows. First, we imagine placing a bounded regular cu-

bic grid, with cells of side length ` and an overall side length

of C`, over the pre-training scene, as shown in Figure 1.

(The C and ` values we use can be found in the supplemen-

tary material.) Next, for each dimension k ∈ {x, y, z}, we

compute an index g(p

) ∈ [0 .. C) via

g(p

) = clamp





, 0, C − 1



. (1)

Finally, we combine these three dimension-wise indices

into a grid cell index, G(p), via

G(p) = C

g(p

) + Cg(p

) + g(p

). (2)

This initial raster-based mapping produces grid cell indices

in the range [0 .. C

), but in practice, it is undesirable for

memory reasons to try to allocate a reservoir for every cell

in the grid. Each reservoir may need to store many point

clusters, and must be allocated upfront on the GPU with a

ﬁxed size. As a result, if every cell in the grid must have a

reservoir, then C must be kept small to avoid exceeding the

available GPU memory, limiting the size of scene we can

handle with our approach.

Fortunately, however, there is no need for every grid cell

to have a reservoir: as noted by [54], most cells in a scene

are empty in practice, and we can exploit this observation to

store a sparse set of reservoirs for only those cells that con-

tain predicted points. To achieve this, rather than using the

grid cell indices produced as above directly, we instead al-

locate a ﬁxed-size buffer of N reservoirs upfront, and con-

struct a lookup table T during online training that can be

剩余19页未读，继续阅读

LU?

粉丝: 1
资源: 1

在线场景坐标回归：提升RGB-D相机实时重定位性能

web论文

网上在线考试系统论文

大学在线考试论文

let s= this.myArray[0]; 获取不到数值

还是报错Data.prototype.toString = function () { let year = this.getFullYear(); let moon = this.getMonth() + 1; let sun = this.getData(); let week = this.getDay(); return `${year}-${moon}-${sun} 星期${week}`; };

我这个代码为什么会报错Data.prototype.toString = function() { let year = this.getFullYear(); let moon = this.getMonth() + 1; let sun = this.getData(); let week = this.getDay(); };

Data.prototype.toString = function () { let year = this.getFullYear(); let moon = this.getMonth() + 1; let sun = this.getDate(); let week = this.getDay(); return `${year}-${moon}-${sun} 星期${week}`; };还是报错报Uncaught ReferenceError: Data is not defined at

最新资源