DSP-SIFT：改进的局部图像描述符，提升匹配性能

需积分: 9 30 浏览量更新于2024-07-18 收藏 4.93MB PDF 举报

"DSPSIFT.pdf" 本文介绍了一种对局部图像描述符（如SIFT）的简单改进方法，该方法在牛津图像匹配基准测试中提高了43.09%的匹配性能，只需要几行代码即可实现。这个改进的效果超过SIFT在相同数据集上相比原始图像强度提供的改进的一半以上。这种方法称为DSP-SIFT（Domain-Size Pooling SIFT），它通过在不同的领域大小以及空间位置上池化梯度方向，生成与原始描述符维度相同的特征。DSP-SIFT的表现比最近报告中超越普通SIFT 11.54%的卷积神经网络（CNN）还要好28.29%，尽管该CNN是在数百万张图像上训练并输出更大尺寸的描述符。传统的尺度选择理论认为，应该在不同尺度上选择特征，但DSP-SIFT中的域大小池化策略却与此相反，这在经典采样理论上有着坚实的基础。作者Jingming Dong和Stefano Soatto在2014年的研究中指出，这一反直觉的策略能够提高特征描述的鲁棒性和区分性。 SIFT（Scale-Invariant Feature Transform）是一种经典的局部图像描述符，它在不变性（如尺度、旋转和光照变化）方面表现出色。然而，DSP-SIFT的引入进一步优化了这一特性，通过在不同尺度上进行池化操作，使得特征对尺度变化的敏感度降低，增强了匹配的准确性。通常，卷积神经网络在图像识别和特征提取任务中表现优秀，因为它们能够学习到丰富的层次特征。然而，DSP-SIFT在没有大量训练数据和复杂网络结构的情况下，依然能够显著提升匹配性能，表明在局部图像描述符的设计上，简单的方法也能取得显著的效果。域大小池化的核心思想是整合不同尺度下的信息，这与传统SIFT只考虑单个最佳尺度的做法不同。通过在多个尺度上平均或最大池化梯度信息，DSP-SIFT能够捕捉到更广泛的图像特征，从而增强描述符的稳定性。这种做法不仅保留了原有的SIFT描述符的维度，还有效地利用了多尺度信息，使得在保持计算效率的同时，提升了匹配性能。 DSP-SIFT是一种高效且强大的图像特征描述符，它通过域大小池化技术，超越了传统的SIFT和现代的深度学习方法。这一发现对于计算机视觉领域的图像匹配、目标检测和识别等应用具有重要意义，尤其是在资源有限或计算复杂度需要控制的场景下，DSP-SIFT可能成为一个极具吸引力的选择。

Additional discussion in relation to details of prior related work is reported in Sect. B.2.

2 Contributions

If SIFT is written as (1), then DSP-SIFT is given by

DSP

(θ|I)[x] =

SIFT

(θ|I, σ)[x]E

(σ)dσ x ∈ Λ (2)

where s > 0 is the size-pooling scale and E is an exponential or other unilateral density function. This is our main

contribution. The process is visualized in Fig. 1. Unlike SIFT, that is computed on a scale-selected lattice Λ(ˆσ),

DSP-SIFT is computed on a regularly sampled lattice Λ. Computed on a different lattice, the above can be considered

as a recipe for DSP-HOG [11]. Computed on a tree, it can be used to extend deformable-parts models (DPM) [16]

to DSP-DPM. Replacing h

SIFT

with other histogram-based descriptor “X” (for instance, SURF [2]), the above yields

DSP-X. Applied to a hidden layer of a convolutional network, it yields a DSP-CNN, or DSP-Deep-Fisher-Network

[39]. The details of the implementation are in Sect. 3.

While the implementation of DS pooling is straightforward, its justiﬁcation is less so. We report the summary

highlights in Sect. 5, that represent contributions to the understanding of pooling and the design and learning of local

descriptors. The detailed derivation is described in Sect. B. It provides a theoretical justiﬁcation for DS pooling and

explicit conditions under which the resulting descriptors are valid. Nevertheless, one cannot forgo empirical validation

on real images, where such conditions are routinely violated. In Sect. 4 we compare DSP-SIFT to alternate approaches.

Motivated by the experiments of [33, 34] that compare local descriptors on wide-baseline matching benchmarks and

show SIFT a clear winner, we choose SIFT as a paragon and compare it to DSP-SIFT on the standard benchmark [33].

Motivated by [17] that compares SIFT to both supervised and unsupervised CNNs trained on Imagenet and Flickr

respectively, with the latter emerging as the clear winner on the same benchmark [33], we submit DSP-SIFT to the

same evaluation protocol. We also run the test on the new synthetic dataset introduce by [17], that yields the same

qualitative assessment. It should be noted that the comparison is unfair in favor of the CNNs, due to its increased

dimension compared to SIFT and DSP-SIFT. Moreover, the best performance of a CNN is obtained using its fourth

layer responses, that contain 8192 coefﬁcients, a 64-fold complexity increase, even without accounting for the cost of

learning, which is none for DSP-SIFT.

Clearly, DS pooling of under-sampled semi-orbits cannot outperform ﬁne sampling, so if we were to retain all

the scale samples instead of aggregating them, performance would further improve. However, computing a large

collection of SIFT descriptors across different scales would incur signiﬁcantly increased computational and storage

cost. To contain the latter, [22] assume that descriptors at different scales populate a linear subspace and ﬁt a high-

dimensional hyperplane. The resulting Scale-less SIFT (SLS) outperforms ordinary SIFT as shown in Fig. 5. However,

the linear subspace assumption breaks when considering large scale changes, so SLS is outperformed by DSP-SIFT

despite the considerable difference in (memory and time) complexity.

3 Implementation and Parameters

Following common practice in evaluation protocols, we use maximally-stable extremal regions (MSER) [30] to detect

candidate regions, afﬁne-normalize them, align them to the dominant orientation, and re-scale them for comparison

with [17]. For a detected scale ˆσ, DSP-SIFT samples N

ˆσ

scales within a neighborhood (λ

ˆσ, λ

ˆσ) around it. For

each scale-sampled patch, a single-scale un-normalized SIFT descriptor (1) is computed on the SIFT scale-space

octave corresponding to the detected scale. By choosing E

to be a uniform density, these raw histograms of gradient

orientations at different scales are accumulated and normalized

to produce DSP-SIFT (2), which is compared to

several descriptors. In the following evaluation, we use λ

= 1/6, λ

= 4/3 and N

ˆσ

= 15. These parameters are

empirically selected on the Oxford dataset [32, 33]. Fig. 4(a) shows that mean average precision (deﬁned in Sect. 4.3)

changes over the scale pooling range. An immediate advantage of DS pooling is observed when more than one scale

We follow the practice of SIFT [29] to normalize, clamp and re-normalize the histograms to make them more robust to contrast changes. The

clamping threshold is set to 0.067 empirically.

剩余17页未读，继续阅读

普通网友

粉丝: 1

DSP-SIFT：改进的局部图像描述符，提升匹配性能

DSP-SIFT代码

SIFT pdf

SIFT算法源码.pdf

48页-智慧园区解决方案.pdf

芋道 yudao ruoyi-vue-pro bmp sql , 更新时间 2025-01-24 ，对应yudao版本2.4.1

YOLOv5在PyTorch ONNX CoreML TFLite.zip

JavaScript项目代码-家庭聚会神器-打牌计分微信小程序

AI+行业应用系列深度研究：AI+办公，智能化时代来临-37页.pdf

svrcore-devel-4.1.3-2.el7.x64-86.rpm.tar.gz

AI大模型落户矿山，智能化形成商业闭环.pdf

最新资源