遥感图像场景分类：大规模基准与最新技术

1星需积分: 49 71 浏览量更新于2024-07-18 收藏 3.29MB PDF 举报

"这篇论文全面回顾了遥感图像场景分类的最新进展，提出了一项大规模的基准数据集，并使用该数据集评估了一些最先进的方法。作者包括 Gong Cheng、Junwei Han（IEEE 高级会员）和 Xiaoqiang Lu（IEEE 高级会员）。" 遥感场景分类是遥感图像处理中的一个关键任务，它在众多应用中起着至关重要的作用，如环境监测、城市规划、灾害管理等。近年来，随着遥感技术的进步和大数据的兴起，场景分类的研究受到了极大的关注。尽管已经有许多工作致力于构建不同的数据集和提出各种分类方法，但对这些研究的系统性回顾和综合分析相对较少。论文首先强调了当前遥感图像场景分类研究存在的问题：数据集规模较小，场景类别和图像数量有限，缺乏图像变化和多样性，以及分类准确度的饱和。这些问题阻碍了新方法特别是深度学习方法的发展。深度学习在图像识别领域取得了巨大成功，但在遥感图像场景分类中，由于数据的局限性，其潜力可能未得到充分挖掘。为了解决这些问题，论文提出了一个新的大规模基准数据集，旨在提供更多的场景类别、更大的图像样本和更丰富的图像变化。这样的数据集可以更好地模拟现实世界中的复杂性和多样性，从而推动算法的创新和性能提升。接下来，论文对现有的多种场景分类方法进行了评估，包括传统的机器学习方法（如支持向量机、随机森林）和基于深度学习的方法（如卷积神经网络、深度信念网络等）。通过比较这些方法在新数据集上的表现，可以揭示各自的优势和局限，为未来的研究提供方向。此外，论文还讨论了可能影响分类性能的关键因素，例如特征表示、网络结构优化、迁移学习的应用以及多模态融合等。这些讨论有助于研究人员理解如何改进模型设计和训练策略，以提高遥感图像场景分类的准确性。这篇综述为遥感图像场景分类领域的研究者提供了一个全面的参考框架，有助于他们了解当前的研究趋势，发现潜在的挑战，并探索新的解决方案。通过这样的系统性工作，有望推动遥感图像处理技术的进步，为实际应用带来更大的价值。

Accepted by Proceedings of the IEEE

multi-class scene classification methods.

III. A

URVEY ON

EMOTE

ENSING

MAGE

CENE

LASSIFICATION

ETHODS

During the last decades, considerable efforts have been made

to develop various methods for the task of scene classification

using satellite or aerial images. As scene classification is usually

carried out in feature space, effective feature representation

plays an important role in constructing high-performance scene

classification methods. We can generally divide existing scene

classification methods into three main categories according to

the features they used: handcrafted feature based methods,

unsupervised feature learning based methods, and deep feature

learning based methods. It should be noted that these three

categories are not necessarily independent and sometimes the

same method exists with different categories.

A. Handcrafted Feature Based Methods

The early works for scene classification are mainly based on

handcrafted features [22, 23, 27, 38, 44, 51, 56, 62, 80, 82,

99-103]. These methods mainly focus on using a considerable

amount of engineering skills and domain expertise to design

various human-engineering features, such as color, texture,

shape, spatial and spectral information, or their combination

that are the primary characteristic of a scene image and hence

carry useful information used for scene classification. Here, we

briefly review several most representative handcrafted features,

including color histograms [99], texture descriptors [104-106],

GIST [107], scale invariant feature transform (SIFT) [108], and

histogram of oriented gradients (HOG) [109].

1) Color histograms: Among all handcrafted features, the

global color histogram feature [99] is almost the simplest, yet an

effective visual feature commonly used in image retrieval and

scene classification [38, 56, 80, 82, 99]. A major advantage of

color histograms, apart from their ease to compute, is that they

are invariant to translation and rotation about the viewing axis.

However, color histograms are not able to convey the spatial

information, so it is very difficult to distinguish the images with

the same colors but different color distribution. Besides, color

histogram feature is also sensitive to small illumination changes

and quantization errors.

2) Texture descriptors: Texture features, such as grey level

co-occurrence matrix (GLCM) [104], Gabor feature [105], and

local binary patterns (LBP) [84, 106, 110], etc., are widely used

for analyzing aerial or satellite images [51, 56, 62, 100-102].

Texture features are commonly computed by placing primitives

in local image subregions and analyzing the relative differences,

so they are quite useful for identifying textural scene images.

3) GIST: GIST descriptor was initially proposed in [107],

which provides a global description for representing the spatial

structure of dominant scales and orientations of a scene. It is

based on calculating the statistics of the outputs of local feature

detectors in spatially distributed subregions. Specifically, in

standard GIST, the images are first convoluted with a number of

steerable pyramid filters. Then, the image is divided into a 4×4

grid for which orientation histograms are extracted. Note that

the GIST descriptor is similar in spirit to the local SIFT

descriptor [108]. Owing to its simplicity and efficiency, GIST is

popularly used for scene representation [111-113].

4) SIFT: SIFT feature [108] describes subregions by gradient

information around identified keypoints. Standard SIFT, also

known as sparse SIFT, is the combination of keypoint detection

and histogram based gradient representation. It generally has

four steps, namely, scale space extrema searching, sub-pixel

keypoint refining, dominant orientation assignment, and feature

description. Except for sparse SIFT descriptor, there also exist

dense SIFT that is computed in uniformly and densely sampled

local regions and several extensions such as PCA-SIFT [114]

and speed-up robust features (SURF) [115]. SIFT feature and its

variants are highly distinctive and invariant to changes in scale,

rotation, and illumination.

5) HOG: HOG feature was first proposed by [109] to

represent objects by computing the distribution of gradient

intensities and orientations in spatially distributed subregions,

which has been acknowledged as one of the best features to

capture the edge or local shape information of the objects. It has

shown great success for many scene classification methods [22,

23, 27, 44, 103, 116, 117]. In addition, in order to further

improve the description ability of HOG for remote sensing

images, several extensions are also developed [118-121].

These human-engineering features have their advantages and

disadvantages [56, 90, 101, 102]. In brief, the color histograms,

texture descriptors, and GIST feature are global features that

describe the overall statistical properties of an entire image

scene in terms of certain spatial cues such as color [56, 99],

texture [104-106], or spatial structure information [107], so they

can be directly used by classifiers for scene classification.

Whereas, SIFT descriptor and HOG feature are local features

that are used for the representations of local structure [108] and

shape information [109]. To represent an entire scene image,

they are generally used as building blocks to construct global

image features, such as the well-known bag-of-visual-words

(BoVW) models [6, 8, 9, 14, 19, 29, 36, 38, 39, 55, 93, 101, 122,

123] and HOG feature-based part models [22, 23, 27, 103]. In

addition, a number of improved feature encoding/pooling

methods have also been proposed in the past few years, such as

Fisher vector coding [10, 14, 84, 86], spatial pyramid matching

(SPM) [124], and probabilistic topic model (PTM) [11, 40, 42,

43, 92, 123], etc.

In real-world applications, scene information is usually

conveyed by multiple cues including spectral, color, texture,

shape, and so on. Every individual cue captures only one aspect

of the scene, so one single type of feature is always inadequate

to represent the content of the entire scene image. Accordingly,

a combination of multiple complementary features for scene

classification [8, 9, 11, 12, 20, 30, 33, 85, 88, 89, 92, 125] is

considered as a potential strategy to improve the performance.

For example, Zhao et al. [11] presented a dirichlet derived

multiple topic model to combine three types of features at a

topic level for scene classification. Zhu et al. [8] proposed a

剩余16页未读，继续阅读

Lana_Lee

粉丝: 0
资源: 2

遥感图像场景分类：大规模基准与最新技术

用matlab实现遥感图像分类

基于tensorflow的遥感影像分类

高分辨率遥感影像场景的多尺度神经网络分类法

遥感图像分类方法综述

遥感应用模型综述课件：01遥感应用模型综述.ppt

遥感图像分类方法研究综述.pdf

深度学习驱动的遥感影像分类综述：关键技术与应用

遥感应用模型综述课件：04高空间分辨率遥感原理概述.ppt

【综述】深度学习用于遥感影像分类.pdf

综述：基于深度学习的遥感图像分类.pdf

最新资源