1999年Lowe提出尺度不变特征点（SIFT）在物体识别中的应用

5星 · 超过95%的资源需积分: 44 73 浏览量更新于2024-09-11 收藏 564KB PDF 举报

在1999年的ICCV（国际计算机视觉会议）上，David G. Lowe提出了题为"Object Recognition from Local Scale-Invariant Features"的重要论文。这篇论文标志着一种革新性的对象识别方法的诞生，该方法依赖于一种全新的局部图像特征，这些特征具有显著的不变性特性。它们对于尺度、平移、旋转具有不变性，同时在一定程度上对光照变化和非线性或三维投影也具有一定的稳健性，这与灵长类动物视觉皮层中用于物体识别的神经元特征相呼应。 Lowe的关键贡献在于开发了一种分阶段检测方法，通过在尺度空间中识别稳定的点来高效提取这些特征。他提出的“尺度空间关键点”（Scale-Invariant Feature Transform, SIFT）是通过在多个方向平面和不同尺度下表示模糊图像梯度来创建的，能够容忍局部几何变形。这种技术使得系统能够捕捉到图像中对于识别至关重要的关键细节，即使在复杂的光照条件下也能保持稳定。论文的核心是使用这些关键点作为输入，采用最近邻索引方法来寻找可能的对象匹配候选。这种方法利用了相似特征点之间的局部相似性，通过计算未知参数的低残差最小二乘解来进行最终匹配验证。这种方法不仅提高了识别精度，而且在处理大规模图像数据库时具有很高的效率，因为它依赖于局部特征而不是全局特征，从而降低了计算复杂性。 SIFT算法因其在实际应用中的优异性能而备受推崇，尤其是在计算机视觉、机器人导航、图像检索、3D重建等领域。它成为了计算机视觉研究中的一个里程碑，后续的研究者在此基础上发展出了许多变体和改进，如SURF（Speeded Up Robust Features）、ORB（Oriented FAST and Rotated BRIEF）等，这些都继承了SIFT的不变性和高效性，但进一步优化了计算速度和性能。Lowe的这篇论文不仅奠定了现代计算机视觉领域的一个基础，也对后续的视觉特征检测和匹配技术产生了深远的影响。

Object Recognition from Local Scale-Invariant Features

David G. Lowe

Computer Science Department

University of British Columbia

Vancouver, B.C., V6T 1Z4, Canada

lowe@cs.ubc.ca

Abstract

Proc. of the International Conference on

Computer Vision,

Corfu (Sept. 1999)

An object recognitionsystem has been developed that uses a

newclass of local image features. The features are invariant

to image scaling, translation,and rotation, and partially in-

variant to illuminationchanges and afﬁne or 3D projection.

These features share similar properties with neurons in in-

ferior temporal cortex that are used for object recognition

in primate vision. Features are efﬁciently detected through

a staged ﬁltering approach that identiﬁes stable points in

scale space. Image keys are created that allow for local ge-

ometric deformations by representing blurred image gradi-

ents in multiple orientation planes and at multiple scales.

The keys are used as input to a nearest-neighbor indexing

method that identiﬁes candidateobject matches. Final veri-

ﬁcation of each match is achieved by ﬁnding a low-residual

least-squares solution for the unknown model parameters.

Experimental results show that robust object recognition

can be achieved in cluttered partially-occluded images with

a computation time of under 2 seconds.

1. Introduction

Object recognition in cluttered real-world scenes requires

local image features that are unaffected by nearby clutter or

partial occlusion. The features must be at least partially in-

variant to illumination,3D projective transforms, and com-

mon object variations. On the other hand, the features must

also be sufﬁciently distinctive to identify speciﬁc objects

among many alternatives. The difﬁcultyof the object recog-

nition problem is due in large part to the lack of success in

ﬁnding such image features. However, recent research on

the use of dense local features (e.g., Schmid & Mohr [19])

has shown that efﬁcient recognition can often be achieved

by using local image descriptors sampled at a large number

of repeatable locations.

This paper presents a new method for image feature gen-

eration called the Scale Invariant Feature Transform (SIFT).

This approach transforms an image into a large collection

of local feature vectors, each of which is invariant to image

translation, scaling, and rotation, and partially invariant to

illumination changes and afﬁne or 3D projection. Previous

approaches to local feature generation lacked invariance to

scale and were more sensitive to projective distortion and

illumination change. The SIFT features share a number of

properties in common with the responses of neurons in infe-

rior temporal (IT) cortex in primate vision. This paper also

describes improved approaches to indexing and model ver-

iﬁcation.

The scale-invariant features are efﬁciently identiﬁed by

using a staged ﬁltering approach. The ﬁrst stage identiﬁes

key locations in scale space by looking for locations that

are maxima or minima of a difference-of-Gaussian function.

Each point is used to generate a feature vector that describes

the local image region sampled relative to its scale-space co-

ordinate frame. The features achieve partial invariance to

local variations, such as afﬁne or 3D projections, by blur-

ring image gradient locations. This approach is based on a

model of the behavior of complex cells in the cerebral cor-

tex of mammalian vision. The resulting feature vectors are

called SIFT keys. In the current implementation, each im-

age generates on the order of 1000 SIFT keys, a process that

requires less than 1 second of computation time.

The SIFT keys derived from an image are used in a

nearest-neighbour approach to indexing to identify candi-

date object models. Collections of keys that agree on a po-

tentialmodel pose are ﬁrst identiﬁedthrougha Hough trans-

formhash table, and then througha least-squares ﬁt to a ﬁnal

estimate of model parameters. When at least 3 keys agree

on the model parameters with low residual, there is strong

evidence for the presence of the object. Since there may be

dozens of SIFT keys in the image of a typical object, it is

possible to have substantial levels of occlusion in the image

and yet retain high levels of reliability.

The current object models are represented as 2D loca-

tions of SIFT keys that can undergo afﬁne projection. Suf-

ﬁcient variation in feature location is allowed to recognize

perspective projection of planar shapes at up to a 60 degree

rotationaway from the camera or to allow up to a 20 degree

rotation of a 3D object.

下载后可阅读完整内容，剩余7页未读，立即下载

masikkk

粉丝: 1628

1999年Lowe提出尺度不变特征点（SIFT）在物体识别中的应用

计算机视觉顶级会议ICCV2017论文集

ICCV2015会议精选论文集：计算机视觉研究新进展

深度学习领域ICCV2019论文精选与解析

Learning Compact Geometric Features (ICCV 2017).pdf

ICCV2019oral.xlsx

ICCV2019oral.pdf

iccv7avr.rar

17 ICCV Rank IQA.pdf

bad_weather_iccv_1999.ppt

神经网络结构搜索【ICCV 2019】.zip

最新资源