基于FAST+BRIEF特征的高效图像位置识别方法

需积分: 25 104 浏览量更新于2024-09-07 收藏 1.07MB PDF 举报

本文档《Bags of Binary Words for Fast Place Recognition in Image Sequences》发表于IEEE Transactions on Robotics,探讨了一种新颖的视觉位置识别方法。该研究由Dorian Gálvez-López和Juan D. Tardós提出，他们作为IEEE会员共同撰写了这篇短论文。论文的核心思想是利用FAST+BRIEF特征构建的二进制词袋（Bag of Binary Words, BoBW）来实现快速而准确的地点识别，尤其是在图像序列中。传统的视觉定位在长期的自主导航（如SLAM）中面临着挑战，特别是在场景重叠但时间间隔较长后，标准的匹配算法往往无法正确识别先前观察过的区域。作者创新性地提出了一种解决方案，即构建一个基于二进制描述符空间的词汇树（Vocabulary Tree）。这个词汇树能够对特征进行离散化，从而加速几何验证时的对应关系查找，显著提高了识别速度。令人印象深刻的是，尽管使用完全相同的词汇和设置，这项技术在各种不同的数据集上展示了高度的竞争性能，而且不会产生误识别（false positives）。整个处理流程，包括特征提取，仅需每帧22毫秒，在包含26300张图像的序列中运行，比先前的方法快了一个数量级，实现了显著的时间效率提升。关键词包括：地方识别、词袋模型、SLAM（同时定位与建图）、计算机视觉。通过本文的研究，作者不仅提供了一种实用的定位技术，还为视觉SLAM系统提供了重要的性能优化策略，对于实时和高效的机器人导航具有重要意义。

IEEE TRANSACTIONS ON ROBOTICS, VOL. , NO. , MONTH, YEAR. SHORT PAPER 1

Bags of Binary Words for Fast Place Recognition in

Image Sequences

Dorian G

alvez-L

opez and Juan D. Tard

os, Member, IEEE

Abstract—We propose a novel method for visual place recognition using

bag of words obtained from FAST+BRIEF features. For the ﬁrst time, we

build a vocabulary tree that discretizes a binary descriptor space, and

use the tree to speed up correspondences for geometrical veriﬁcation.

We present competitive results with no false positives in very different

datasets, using exactly the same vocabulary and settings. The whole

technique, including feature extraction, requires 22ms per frame in a

sequence with 26300 images, being one order of magnitude faster than

previous approaches.

Index Terms—Place Recognition, Bag of Words, SLAM, Computer

Vision.

I. INTRODUCTION

One of the most signiﬁcant requirements for long-term visual

SLAM (Simultaneous Localization and Mapping) is robust place

recognition. After an exploratory period, when areas non-observed

for long are re-observed, standard matching algorithms fail. When

they are robustly detected, loop closures provide correct data asso-

ciation to obtain consistent maps. The same methods used for loop

detection can be used for robot relocation after track lost, due for

example to sudden motions, severe occlusions or motion blur. In [1]

we concluded that, for small environments, map-to-image methods

achieve nice performance, but for large environments, image-to-image

(or appearance-based) methods such as FAB-MAP [2] scale better.

The basic technique consists in building a database from the images

collected online by the robot, so that the most similar one can be

retrieved when a new image is acquired. If they are similar enough,

a loop closure is detected.

In recent years, many algorithms that exploit this idea have

appeared [2]–[6], basing the image matching on comparing them as

numerical vectors in the bag-of-words space [7]. Bags of words result

in very effective and quick image matchers [8], but they are not a

perfect solution for closing loops, due mainly to perceptual aliasing

[6]. For this reason, a veriﬁcation step is performed later by checking

the matching images to be geometrically consistent, requiring feature

correspondences. The bottleneck of the loop closure algorithms is

usually the extraction of features, which is around ten times more

expensive in computation cycles than the rest of steps. This may

cause SLAM algorithms to run in two decoupled threads: one to

perform the main SLAM functionality, and the other just to detect

loop closures, as in [5].

In this paper, we present a novel algorithm to detect loops and

establishing point correspondences between images in real time, with

a conventional CPU and a single camera. Our approach is based on

bag of words and geometrical check, with several important novelties

that make it much faster than current approaches. The main speed

improvement comes from the use of a slightly modiﬁed version of

the BRIEF descriptor [9] with FAST keypoints [10], as explained in

Section III. The BRIEF descriptor is a binary vector where each bit

is the result of an intensity comparison between a given pair of pixels

around the keypoint. Although BRIEF descriptors are hardly invariant

This research has been partly funded by the European Union under

project RoboEarth FP7-ICT-248942, the Direcci

on General de Investigaci

of Spain under projects DPI2009-13710, DPI2009-07130 and the Ministerio

de Educaci

on (scholarship FPU-AP2008-02272).

The authors are with the Instituto de Investigaci

on en Ingenier

ıa de Arag

(I3A), Universidad de Zaragoza, Mar

ıa de Luna 1, 50018 Zaragoza, Spain.

{dorian, tardos}@unizar.es.

to scale and rotation, our experiments show that they are very robust

for loop closing with planar camera motions, the usual case in mobile

robotics, offering a good compromise between distinctiveness and

computation time.

We introduce a bag of words that discretizes a binary space, and

augment it with a direct index, in addition to the usual inverse index,

as explained in Section IV. To the best of our knowledge, this is the

ﬁrst time a binary vocabulary is used for loop detection. The inverse

index is used for fast retrieval of images potentially similar to a given

one. We show a novel use of the direct index to efﬁciently obtain

point correspondences between images, speeding up the geometrical

check during the loop veriﬁcation.

The complete loop detection algorithm is detailed in Section V.

Similarly to our previous work [5,6], to decide that a loop has been

closed, we verify the temporal consistency of the image matches

obtained. One of the novelties in this paper is a technique to prevent

images collected in the same place from competing among them when

the database is queried. We achieve this by grouping together those

images that depict the same place during the matching.

Section VI contains the experimental evaluation of our work, in-

cluding a detailed analysis of the relative merits of the different parts

in our algorithm. We present comparisons between the effectiveness

of BRIEF and two versions of SURF features [11], the descriptor

most used for loop closing. We also analyze the performance of the

temporal and geometrical consistency tests for loop veriﬁcation. We

ﬁnally present the results achieved by our technique after evaluating

it in ﬁve public datasets with 0 .7 –4 Km long trajectories. We

demonstrate that we can run the whole loop detection procedure,

including the feature extraction, in 52ms in 26300 images (22ms on

average), outperforming previous techniques by more than one order

of magnitude.

A preliminary version of this work was presented in [12]. In the

current paper we enhance the direct index technique and extend the

experimental evaluation of our approach. We also report results in

new datasets and make a comparison with the state-of-the-art FAB-

MAP 2.0 algorithm [13].

II. RELATED WORK

Place recognition based on appearance has obtained great attention

in the robotics community because of the excellent results achieved

[4,5,13,14]. An example of this is the FAB-MAP system [13], which

detects loops with an omnidirectional camera, obtaining a recall of

48.4% and 3.1%, with no false positives, in trajectories 70 Km

and 1000 Km in length. FAB-MAP represents images with a bag

of words, and uses a Chow Liu tree to learn ofﬂine the words’

co-visibility probability. FAB-MAP has become the gold standard

regarding loop detection, but its robustness decreases when the

images depict very similar structures for a long time, which can

be the case when using frontal cameras [5]. In the work of Angeli

et al. [4], two visual vocabularies (for appearance and color) are

created online in an incremental fashion. The two bag-of-words

representations are used together as input of a Bayesian ﬁlter that

estimates the matching probability between two images, taking into

account the matching probability of previous cases. In contrast to

these probabilistic approaches, we rely on a temporal consistency

check to consider previous matches and enhance the reliability of

the detections. This technique has proven successful in our previous

works [5,6]. Our work also differs from the ones above in that we

use a bag of binary words for the ﬁrst time, as well as propose a

technique to prevent images collected close in time and depicting the

same place from competing between them during the matching, so

that we can work at a higher frequency.

下载后可阅读完整内容，剩余8页未读，立即下载

虾米一代

粉丝: 756
资源: 8

基于FAST+BRIEF特征的高效图像位置识别方法

Computer Vision Using Local Binary Patterns

The Base Standard of Application Binary Interface for the ARM Architecture.pdf

LBP-Learning-Multi-scale-Block-Local-Binary-Patterns-for-Face-Recognition.pdf

论文研究-A New Gabor Method for Face Recognition.pdf

Manning.Tika.in.Action.Nov.2011.pdf

discovRE__Efficient_Cross-Architecture_Identification_of_Bugs_in_Binary_Code.pdf

Binary_Codes.pdf

Binary Tree Overview.pdf

Cross-Architecture_Bug_Search_in_Binary_Executables.pdf

binary-protection-schemes.pdf

最新资源