基本图像描述符驱动的极大边距特征映射：一种图像分类新策略

19 浏览量更新于2024-08-26 收藏 221KB PDF 举报

本文探讨了"通过基本图像描述符进行最大边缘特征映射以进行图像分类"这一关键领域。作者 Changchen Zhao 和 Chun-Liang Lin，以及 Weihai Chen，分别来自中国国立交通大学电气工程系和北京航空航天大学自动化科学与电气工程学院，他们的研究着重于突破传统图像分类方法的局限，这些方法往往仅依赖SIFT特征进行后续处理，从而忽视了丰富的低级别图像特征。论文的核心思想是提出一个最大化边际特征映射框架，它将基本图像描述符融入到识别系统中。这种方法旨在通过优化一个目标函数来实现，这个函数的目的是最小化同一类别内的样本间距离，同时最大化不同类别之间的距离，以及重建误差。这样做有助于捕捉和利用更多的图像特征信息，提高分类的准确性和鲁棒性。优化算法是该研究的关键组成部分，作者设计了一个高效的算法来学习转换矩阵，这使得模型能够更好地适应不同的图像特征，并在处理复杂场景和多样性高的图像时展现出更强的性能。实验部分，论文展示了在三个公开数据集上进行的详细比较，这些结果证明了新提出的框架相对于仅依赖SIFT特征的传统方法有显著的优势，特别是在处理具有挑战性的图像分类任务时，如物体识别、人脸识别或场景分类。这篇研究论文对于计算机视觉领域的图像分类技术有着重要的贡献，它不仅提升了图像特征的利用率，还展示了如何通过优化策略来增强图像分类的性能。这对于推动工业技术尤其是现代工业中的计算机视觉应用具有深远的影响。

Maximal Margin Feature Mapping via Basic Image

Descriptors for Image Classiﬁcation

Changchen Zhao

1,2

Chun-Liang Lin

Weihai Chen

Department of Electrical Engineering, National Chung Hsing University, Taichung, 40227, Taiwan

School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China

chunlin@dragon.nchu.edu.tw

Abstract—Computer vision is one of the most important

branches in modern industrial technology. Image classiﬁcation

plays an important role in computer vision since it utilizes

the most advance technique in this area. However, most image

classiﬁcation methods only use the SIFT feature for further pro-

cessing, which hinders the rich useful low-level image attributes

to be captured. This paper proposes a maximal margin feature

mapping framework that incorporates basic descriptors in the

recognition system. This is fulﬁlled by optimizing an objective

function that minimizes intra-class distance and maximizes inter-

class distance as well as the reconstruction error. An efﬁcient

optimization algorithm is proposed to learn the transformation

matrix. Experiments on three publicly available datasets are

conducted. The preliminary results show the effectiveness of the

proposed approach.

Index Terms—image classiﬁcation, maximal margin, feature

mapping, non-convex optimization

I. INTRODUCTION

Recent advance of computer vision technology has beneﬁt-

ted the industrial technology in the area such as vision-based

manipulator [1], video surveillance system [2], and vision

industrial imaging [3]. Image classiﬁcation has been an active

computer vision research area over the past few decades. Also,

it is an important application of machine learning, pattern

recognition. It is a combination of techniques such as feature

extraction, feature encoding, classiﬁer learning, etc. First,

given an input image, various image features are extracted

to capture basic image features such as color, gradient and

intensity. Then, feature encoding method is employed to

generate the image-level representation, which is supposed to

be as discriminative as possible. Finally, a classiﬁer is trained

to assign the new input image a category label.

Image feature plays an important role among the afore-

mentioned techniques. A variety of image features have

been proposed to capture low-level image attributes such

as intensity, illumination, color, gradient, etc. Scale-invariant

feature transform (SIFT) [4] is a famous local image feature

in computer vision to detect and describe local features in

images. SIFT can robustly identify key points of an object even

among clutter and under partial occlusion, because the SIFT

feature descriptor is invariant to uniform scaling, orientation,

and partially invariant to afﬁne distortion and illumination

changes. It has been widely used in image retrieval [5], object

recognition [6], visual tracking [7], and, of course, image

classiﬁcation.

Recent works employed dense SIFT as the images fea-

tures. Different from original SIFT descriptor, dense SIFT

is extracted on image patches divided by densely sampled

grids. These algorithms attempt to use dense SIFT to capture

low-level features of an object for further processing. These

algorithms were proposed based on the assumption that the

SIFT feature forms the preliminary image pattern. These

patterns are partitioned (usually by K-means clustering) in

several clusters, the centroids of these clusters are regarded as

the basic patterns of images. These patterns form a dictionary.

For an image, several patterns are activated via different

feature encoding methods. Some popular encoding methods

include vector quantization [8], kernel codebook encoding [9],

locality constrained linear coding (LLC) [10], Fisher encoding

[11], supervector encoding [12]. The image representation is

generated based on these patterns. Different way of activating

of patterns forms the discriminative performance of image

representation. Hence, image classiﬁcation is fulﬁlled by the

discriminative power of these image representations.

However, SIFT feature has its own limitations. First, dense

SIFT not necessarily captures as rich patterns of the image as

other low-level image features. SIFT is the key point descriptor

that may exist in the edge or corner of an object as well as

invariant to translation and rotation. It exhibits limitations in

capturing color, gradient or intensity features. However, these

image attributes are crucial in identifying objects. Second,

dense SIFT may exist ambiguity. SIFT extracted from the

same image pattern may separate far away from each other

while SIFT extracted from different patterns may aggregate

in the feature space. These limitations block severely the

discriminative power of image representation if one only uses

SIFT feature as the basic image pattern.

In this paper, we aim at learning a mapping function that

maps SIFT features to a high-dimensional feature space such

that the basic assumption in machine learning holds, i.e.,

features extracted from the same pattern are aggregated and

those from different patterns are separated. The mapping

function is formulated by a convex optimization problem

analogous to the auto-encoder. It has three layer of neurons.

The ﬁrst layer takes the SIFT feature as input, the second

layer is the mapped feature, and the output layer reconstructs

the input layer by minimizing the reconstruction error. The

main objective is accomplished by imposing constraints on

the hidden layer via other low-level image features, e.g., HOG

l-))) 

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38519660

粉丝: 4
资源: 949

基本图像描述符驱动的极大边距特征映射：一种图像分类新策略

bag of words用于图像特征描述

计算机视觉特征提取与图像处理总结 计算机视觉.pdf

图像特征选取Detect,图像特征匹配,Visual C++

通过色彩量化实现有效的SIFT描述符

HOG结合SVM进行图像二分类

BagOfWord图像分类

基于边缘的图像配准程序

HOMPC：多传感器遥感图像局部特征描述符结合幅度与相位一致性

基于核空间的SIFT描述符在遥感图像配准中的应用

OpenCV图像处理实战指南：从边缘检测到特征匹配

最新资源

计算机视觉特征提取与图像处理总结计算机视觉.pdf