小样本训练数据下，基于原始SVM的高光谱遥感数据分类方法

94 浏览量更新于2024-08-28 收藏 205KB PDF 举报

随着遥感技术的快速发展，高光谱遥感数据（hyperspectral remote-sensing data）的维度显著增加，这为识别具有相似光谱特征的复杂地表覆盖类型提供了巨大的潜力。然而，这种高维特性使得自动数据分析变得异常复杂。在实际应用中，获取足够数量的训练样本以训练一个分类器对于专家来说是一项挑战，尤其当面临小规模训练数据集（small-size training dataset problem）时，这会限制模型的性能和泛化能力。支持向量机（Support Vector Machine, SVM）作为一种流行的机器学习方法，特别适用于处理此类问题。传统的SVM往往采用对偶形式（dual form）进行优化，通过定义核函数将数据映射到高维空间，从而在低维特征子空间中找到最佳决策边界。然而，对于小规模训练数据，对偶形式可能会导致过拟合，因为其依赖于大量支持向量来精确地拟合训练数据。本文提出了一种基于正则化的SVM分类方法，即所谓的原问题SVM（primal SVM）。相比于对偶形式，原问题SVM可以直接优化模型参数，减少了对支持向量数量的依赖，使得算法在处理小规模训练集时更加稳健。通过调整正则化参数，该方法能够控制模型复杂度，防止过拟合，从而提高分类性能。研究者 Mingmin Chia、Rui Feng 和 Lorenzo Bruzzone 在论文中探讨了如何利用原问题SVM有效地处理高光谱遥感数据的分类任务，特别是在数据稀疏且样本量有限的情况下。他们可能还讨论了算法的具体实现步骤、性能评估方法以及与传统SVM方法的比较分析，展示了在面对小规模训练数据挑战时，如何通过优化算法设计来提升分类准确性和效率。这篇论文关注的核心知识点包括： 1. 高光谱遥感数据的复杂性与小规模训练样本的问题。 2. 原问题SVM的优势：减少对支持向量依赖，适应小规模数据集。 3. 正则化在优化过程中的作用，防止过拟合。 4. 如何通过实证分析展示在高维空间中使用原问题SVM的有效性。 5. 与对偶形式SVM的对比与优化策略。通过阅读这篇论文，读者可以了解到如何解决高光谱遥感数据分类中的小样本问题，并可能启发他们在实际项目中选择合适的SVM变体来提高数据分析的精度和实用性。

Classiﬁcation of hyperspectral remote-sensing data with primal SVM

for small-sized training dataset problem

Mingmin Chi

, Rui Feng

, Lorenzo Bruzzone

Department of Computer Science and Engineering, Fudan University, 220 Han Dan Road, Shanghai 200433, China

Department of Information and Communication Technologies, University of Trento, Italy

Received 1 November 2006; received in revised form 3 February 2008; accepted 6 February 2008

Abstract

With recent technological advances in remote sensing, very high-dimensional (hyperspectral) data are available for a better discrim-

ination among diﬀerent complex land-cover classes having similar spectral signatures. However, this large number of bands makes very

complex the task of automatic data analysis. In the real application, it is diﬃcult and expensive for the expert to acquire enough training

samples to learn a classiﬁer. This results in a classiﬁcation problem with small-size training sample set. Recently, a regularization-based

algorithm is usually proposed to handle such problem, such as Support Vector Machine (SVM), which usually are implemented in the

dual form with Lagrange theory. However, it can be solved directly in primal formulation. In this paper, we introduces an alternative

implementation technique for SVM to address the classiﬁcation problem with small-size training sample set. It has been empirically pro-

ven that the eﬀectiveness of the introduced implementation technique which has been evaluated by benchmark datasets.

Keywords: Primal Support Vector Machine (SVM); Classiﬁcation; Small-size training dataset problem; Hyperspectral remote-sensing data

1. Introduction

One of the most critical problems relating to the super-

vised classiﬁcation of remote-sensing images lies in the def-

inition of a proper size of training set for an accurate

learning of classiﬁers. Since the collection of ground-refer-

ence data is an expensive and complex task, in many cases

the number of train ing samples is insuﬃcient for a proper

learning of classiﬁcation systems. This issue is particularly

critical when hyperspectral images are considered. Such

hyperspectral data are generally made of about 100–200

spectral channels of relatively narrow bandwidths (5–

10 nm). Although high-dimensional features are capable

of better discriminating among the complex (sub)classes,

in the real application, it is diﬃcult and expensive for

experts to acquire enough training samples to learn a clas-

siﬁer. Consequently, it is impossible to meet the require-

ments on the necessary number of train ing samples since

the size of training dataset is relatively ﬁxed.

When the number of (representative) training samples is

relatively small with respect to the number of features (and

thus of classiﬁer parameters to be estimated), the well-

known problem of the curse of dimensionality (i.e., the

Hughes phenomenon Hughes, 1968)

occurs. This results

in the risk of overﬁtting of the training data and can lead

to poor generalization capabilities of the classiﬁer. Conven-

tional classiﬁcation methods, such as the Gaussian Max i-

mum Likelihood algorithm, cannot be applied to

hyperspectral data due to the high dimensionality of the

doi:10.1016/j.asr.2008.02.012

Expanded version of a talk presented at COSPAR on terrestrial

phenomena and land products from space: validation, application and

perspectives (Beijing, China, July 2006).

Corresponding author. Tel.: +86 21 5566228.

E-mail addresses: mmchi@fudan.edu.cn (M. Chi), fengrui@fudan.

edu.cn (R. Feng), lorenzo.bruzzone@ing.unitn.it (L. Bruzzone).

With more disc riminative featu res, classiﬁcation performance is

improved with the increase of the number of labeled samples; if the

number of labeled samples is ﬁxed, the performance otherwise decreases.

www.elsevier.com/locate/asr

Available online at www.sciencedirect.com

Advances in Space Research 41 (2008) 1793–1799

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38744375

粉丝: 372
资源: 2万+

小样本训练数据下，基于原始SVM的高光谱遥感数据分类方法

hyperspectral-classification-with-svm-master.zip_CNN+SVM_SVM高光谱分

Combining clustering and classification for remote-sensing images using unlabeled data

Multi-class-classification-with-SVM

图像矩阵matlab代码-Hyperspectral-Images-Classification-in-Pytorch-with-Multip

classification of hyperspectral image data

Deep Learning-Based Classification of Hyperspectral Data

Text-Classification-through-SVM

Image-Classification-using-SVM

Gradient-Boosting-for-classification-of-the-Titanic-dataset

SVM-classification-localization-master_svm图像分类_svm图像分类_daily167_

最新资源