使用关键点的视觉分类：Bag of Keypoints 方法

需积分: 14 174 浏览量更新于2024-07-18 收藏 854KB PDF 举报

"Visual Categorization with Bags of Keypoints" 这篇论文主要探讨了一种用于视觉分类的新方法——基于关键点的“Bag-of-Words”（BoW）模型。Bag-of-Words模型在文本处理领域中广泛应用，而在计算机视觉中，它被用来表示和识别图像中的对象。论文的作者包括Gabriela Csurka、Christopher R. Dance、Lixin Fan、Jutta Willamowski和Cédric Bray，他们分别在Naver Labs Europe、Nokia Technologies以及Xerox Research Centre Europe等机构工作，从事与立体视觉、3D重建和图像分类相关的项目。在论文中，研究人员提出了一种将物体内容识别问题转化为统计特征表示的方法。他们利用关键点检测算法，如SIFT（Scale-Invariant Feature Transform）或SURF（Speeded Up Robust Features），来提取图像中的局部特征。这些关键点包含了图像的形状、纹理和颜色信息，且对尺度变化、旋转和光照变化具有一定的不变性。接下来，关键点的描述符被送入一个词汇学习过程，即聚类算法（如K-means），将相似的特征聚类在一起形成一个“词汇”。这个过程类似于文本处理中的词汇构建，每个聚类中心代表一个“单词”或“词元”。然后，每个图像可以被表示为一个“词袋”，其中包含其关键点描述符对应的词元频率或出现次数，忽略了它们在图像中的位置信息。论文进一步介绍了如何使用这些BoW表示进行分类。通常会采用一种称为“编码”（coding）的技术，如稀疏编码或局部二值模式直方图（VLAD），将词袋转换为固定长度的向量，便于输入到机器学习模型（如SVM或神经网络）进行训练和分类。该研究在视觉分类任务上取得了显著的成果，表明BoW模型能够有效地捕捉图像的语义信息，即使在忽略关键点的空间布局时也是如此。这种方法在物体识别、场景分类、行人检测等领域有广泛的应用，并且对后续的深度学习方法产生了深远影响，例如卷积神经网络（CNN）在特征提取上的设计也借鉴了关键点和BoW的思想。通过GitHub链接（https://github.com/rmsalinas/fbow）提供的代码，读者可以深入了解和实践BoW模型在实际问题中的应用，包括数据预处理、特征提取、词汇构建、编码过程以及分类器的训练和测试。这些资源对于学习和研究计算机视觉领域的Bag-of-Words模型是非常有价值的。

the statistics of their patch detector. This elegant approach has a number of limita-

tions. Firstly the method is not efficient: even when models are restricted to 6 image

patches and training images only contain up to 30 patches, days of CPU time are re-

quired to learn several categories. Secondly, views of objects used for training must be

segregated, for instance into cars (rear) and cars (side). This is unsurprising given the

use of an explicit 2D model of relative positions.

In section 2 we explain the categorization algorithms and the choice of their com-

ponents. In section 3 we present results from applying of the algorithms to the dataset

of Fergus et al and to a more challenging seven class dataset. We demonstrate that our

approach is robust to the presence of background clutter and produces state-of-the-art

recognition performance.

2. The method

The main steps of our method are:

• Detection and description of image patches

• Assigning patch descriptors to a set of predetermined clusters (a vocabu-

lary) with a vector quantization algorithm

• Constructing a bag of keypoints, which counts the number of patches as-

signed to each cluster

• Applying a multi-class classifier, treating the bag of keypoints as the fea-

ture vector, and thus determine which category or categories to assign to

the image.

Ideally these steps are designed to maximize classification accuracy while minimiz-

ing computational effort. Thus, the descriptors extracted in the first step should be

invariant to variations that are irrelevant to the categorization task (image transforma-

tions, lighting variations and occlusions) but rich enough to carry enough information

to be discriminative at the category level. The vocabulary used in the second step

should be large enough to distinguish relevant changes in image parts, but not so large

as to distinguish irrelevant variations such as noise.

We refer to the quantized feature vectors (cluster centres) as “keypoints” by anal-

ogy with “keywords” in text categorization. However, in our case “words” do not

necessarily have a repeatable meaning such as “eyes”, or “car wheels”, nor is there an

obvious best choice of vocabulary. Rather, our goal is to use a vocabulary that allows

good categorization performance on a given training dataset. Therefore the steps in-

volved in training the system allow consideration of multiple possible vocabularies:

• Detection and description of image patches for a set of labeled training

images

• Constructing a set of vocabularies: each is a set of cluster centres, with re-

spect to which descriptors are vector quantized.

• Extracting bags of keypoints for these vocabularies

• Training multi-class classifiers using the bags of keypoints as feature vec-

tors

剩余16页未读，继续阅读

君莫笑xxx

粉丝: 119
资源: 24

使用关键点的视觉分类：Bag of Keypoints 方法

Text Categorization with Support Vector Machines_ Learning with Many Relevant Fe.pdf

A Visual Descriptor for Scene Categorization

A re-examination of text categorization methods

An Evaluation of Statistical Approaches to Text Categorization

Hierarchical sparse representation based Multi-Instance Semi-Supervised Learning with application to image categorization

Multilingual Text Categorization

parse-categorization

Disaster-Message-Categorization

Emotion-Categorization-experiemnt

Machine Learning in Automated Text Categorization

最新资源