没有合适的资源?快使用搜索试试~ 我知道了~
首页Deep Learning-Based Classification of Hyperspectral Data
资源详情
资源评论
资源推荐

Deep Learning-Based Classification
of Hyperspectral Data
Yushi Chen, Member, IEEE, Zhouhan Lin, Xing Zhao, Student Member, IEEE,
Gang Wang, Member, IEEE, and Yanfeng Gu, Member, IEEE
Abstract—Classification is one of the most popular topics in
hyperspectral remote sensing. In the last two decades, a huge
number of methods were proposed to deal with the hyperspectral
data classification problem. However, most of them do not hierar-
chically extract deep features. In this paper, the concept of deep
learning is introduced into hyperspectral data classification for the
first time. First, we verify the eligibility of stacked autoencoders by
following classical spectral information-based classification.
Second, a new way of classifying with spatial-dominated informa-
tion is proposed. We then propose a novel deep learning framework
to merge the two features, from which we can get the highest
classification accuracy. The framework is a hybrid of principle
component analysis (PCA), deep learning architecture, and logistic
regression. Specifically, as a deep learning architecture, stacked
autoencoders are aimed to get useful high-level features. Experi-
mental results with widely-used hyperspectral data indicate that
classifiers built in this deep learning-based framework provide
competitive performance. In addition, the proposed joint
spectral–spatial deep neural network opens a new window for
future research, showcasing the deep learning-based methods’ huge
potential for accurate hyperspectral data classification.
Index Terms—Autoencoder (AE), deep learning, feature
extraction, hyperspectral data classification, logistic regression,
stacked autoencoder (SAE), support vector machine (SVM).
I. INTRODUCTION
B
Y COMBINING imaging and spectroscopy technology,
hyperspectral remote sensing can get spatially and spec-
trally continuous data simultaneously. Hyperspectral data are
becoming a valuable tool for monitoring the Earth’s surface [1],
[2], and are used in a wide array of applications. An incomplete
list includes agriculture [3], mineralogy [4], surveillance [5],
physics [6], astronomy [7], chemical imaging [8], and environ-
mental sciences [9], [10]. A common technique in these applica-
tions is the classification of each pixel in hyperspectral data. If
successfully exploited, the hypers pectral data can yield higher
classification accuracies and more detailed class taxonomies
[11]. However, there are several critical problems in the classifi-
cation of hyperspectral data: 1) curse of dimensionality, because
of the high number of spectral channels; 2) limited number of
labeled training samples; and 3) large spatial variabili ty of
spectral signature [12].
A lot of different classification methods have been proposed
to deal with hyperspectral data classification. Traditional
hyperspectral data classification methods use spectral informa-
tion only, and the classi fication algorithms typically include
parallelepiped classification, k-nearest-neighbors, maximum-
likelihood, minimum distance, and logistic regression [13]. The
majority of these above algorithms suffer a lot from the “curse
of dimensionality.” To deal with the high dimensionality
and limited training samples of hyperspectral data [14], some
dimensionality reduct ion-based classification methods were pro-
posed. Transformation is one method available to deal with high
dimensionality [15]–[17]. Band selection is another method
available to mitigate this “curse” [18]–[20].
In [21], a promising classification method, support vector
machine (SVM), is introduced for hyperspectral data classifica-
tion. SVM exhibits low sensitivity to high dimensionality and is
unlikely to suffer from the Hughes phenomenon [22]. In most
cases, SVM-based classifiers can obtain better classification
accuracy than other widely used pattern recognition techniques
[14], [22]. For a long time, these class ifiers were the state-of-the-
art methods [23].
Spatial information has been growing more and more impor-
tant for hyperspectral data classification in recent years [30].
Spatial–spectral classification methods provide significant ad-
vantages in terms of improving performance [10]. To deal with
spatial variability of spectral signature, some recent approaches
try to incorporate spatial information into consideration [31]–[33].
In [32], the proposed method based on the fusion of morphol-
ogical information and original data followed by SVM provides
good classification results. In [33], a new classification frame-
work is proposed to exploit the spatial and spectral infor-
mation using loopy belief propagation and active learning. In
recent years, sparse representation-based methods have been
widely used in m a ny fields. In [34], spatial–spectral kernel
sparse representation is proposed to deal with hyperspectral
data classification.
Considering the machine learning task of classification, clas-
sifiers like linear SVM and logistic regression can be attributed to
single-layer classifiers, whereas decision tree or SVM with
kernels are believed to have two layers [24]. As is confirmed
in neuroscience, human brains perform well in tasks like object
recognition because of its multiple stages of processing from
1939-1404 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Manuscript received October 14, 2013; revised April 24, 2014; accepted May
30, 2014. Date of publication June 25, 2014; date of current version August 01,
2014. This work was supported in part by the Fundamental Research Funds for the
Central Universities under Grant HIT. NSRIF.2013028 and in part by National
Natural Science Foundation of China under Grant 61301206 and Grant
61371180. (Corresponding author: Yushi Chen.)
Y. Chen, Z. Lin, X. Zhao, and Y. Gu are with the Institute of Image and
Information Technology, Harbin Institute of Technology, Harbin 150001, China
(e-mail: chenyushi@hit.edu.cn; lin.zhouhan@gmail.com; zhaoxing@hit.edu.cn;
yfgu@hit.edu.cn).
G. Wang is with the School Electrics and Electronics Engineering, Nanyang
Technological University, 639798 Singapore (e-mail: wanggang@ntu.edu.sg).
Color versions of one or more of the figures in this paper are available online at
http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/JSTARS.2014.2329330
2094 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 7, NO. 6, JUNE 2014

retina to cortex [25]. Similarly, machine learning systems with
multiple layers of processing extract more abstract, invariant
features of data, and thus are believed to have the ability of
yielding higher classification accuracy than those traditional,
shallower classifiers. These deep architectures have been shown
to yield promising performance in many field including classifi-
cation or regression tasks that involve image [26], [27], [47],
language [28], and speech [29].
In this paper, we introduce deep learning-based feature
extraction for hyperspectral data classification for the first time.
Our work focuses on applying autoencoder (AE), which is one of
the deep architecture-based models, to learn deep features of
hyperspectral data in an unsupervised manner. Our methods
exploit single-lay er AE and multi-layer stacked AE (SAE) to
learn shallow and deep features of hyperspectral data, respec-
tively. Furthermore, we propose a new way of extracting spatial-
dominated information for classification. At last, we propose a
novel classification framework dealing with joint spectral–
spatial information, which utilizes all of the features extracted
in the former two sections.
The rest of this paper is organized into six sections. Section II
is a description of deep learning, AE, and SAE models used in
this pape r. In Section III, we focus on classifying with spectral
features, whereas Section IV details a new way of incorporating
spatial information by extracting spatial-dominated features. In
Section V, we further merge the former two spectral and spatia l
approaches and propose a novel joint spectral–spatial deep learn-
ing framework, which yields the highest classification accuracy.
Experimental results are shown in Section VI. Section VII
summarizes the observ ations and completes this paper.
II. D
EEP LEARNING, AE, AND SAE
A. Deep Learning
As early as 1989, the universal expressive power of three-layer
nets was proved via bumps and Fourier ideas [35]. The proof
showed surprisingly that any continuous function from input to
output can be implemented in a three-layer net, given sufficient
number of hidden units and proper nonlinearities in activation
function and weights. However, due to the lack of proper training
algorithms in early years, people could not harness this powerful
model until Hinton proposed his deep learning idea in 2006 [27].
Deep learning involves a class of models which try to hierar-
chically learn deep features of input data with very deep neural
networks, typically deeper than three layers. The network is first
layer-wise initialized via unsupervised training and then tuned in
a supervised manner. In this scheme, high-level features can be
learned from low-level ones, whereas the proper features can be
formulated for pattern classification in the end. Deep models can
potentially lead to progressively more abstract and complex
features at higher layers, and more abstract features are generally
invariant to most local changes of the input. According to some
recent papers [36], [37], deep models can give better approxi-
mation to nonlinear funct ions than shallow models.
Typical deep neural network architectures include deep belief
networks (DBNs) [38], deep Boltzm ann machines (DBMs)
[39], SAEs [40], and stacked denoising AEs (SDAEs) [41].
The layer-wise training models have a bunch of alternatives
such as restricted Boltzmann machines (RBMs) [42], pooling
units [43], convolutional neural networks (CNNs) [44], AEs,
and denoising AEs (DAE) [40]. In this paper, we adopt one of
the above deep learning models, AE, for hyperspectral data
classification and choose SAEs as the corresponding deep
architecture.
B. Autoencoders
An AE has one visible layer of inputs, one hidden layer of
units, one reconstruction layer of d units, and an activation
function (Fig. 1).
During training, it first maps the input
to the hidden layer
and produces the latent activity
. The network correspond-
ing to this step is shown in the boxed part of Fig. 1 and is called an
“encoder.” Then, is mapped by a “decoder” to an output layer
that has the same size of the input layer, which is called
“reconstruction.” The reconstructed values are denoted as
. Mathematically, these two steps can be formulated as
where
and denote the input-to-hidden and the hidden-to-
output weights, respectively,
and denote the bias of hidden
and output units, and denotes the activation function.
Conventionally, the nonlinearity is provided in . There are
a lot of alternatives for such as sigmoid function, hyperbolic
tangent, and rectified linear function.
In our paper, the following constraint holds
We say that the AE has tied weights, which helps to halve
model parameters. Thus, we have three groups of parameters
remaining to learn: ,
, .
The goal of training is to minimize the “error” between input
and reconstruction, i.e.,
where is dependent on parameters ,
, while is given.
stands for the “error,” which can be defined in a variety of
Fig. 1. Single layer AE for hyperspectral data classification. The model learns a
hidden feature “ ” from input “ ” by reconstructing it on “ .” Corresponding
parameters are denoted in the network.
CHEN et al.: DEEP LEARNING-BASED CLASSIFICATION OF HYPERSPECTRAL DATA 2095

ways. Thus, the weight updating rule can be defined as (where
denotes the learning rate)
After training the network, the reconstruction layer together
with its parameters are removed and the learned feature lies in the
hidden layer, which can subsequently be used for classification or
used as the input of a higher layer to produce a deeper feature.
The power of AE lies in this form of reconstruction-oriented
training. Note that during reconstruction, it only uses the infor-
mation in hidden layer activity , which is encoded as features
from input. If the model can recover original input perfectly from
, it means that retains enough information of the input. And the
learned nonlinear transformation, which is defined by those
weights and biases, can be deemed as a good feature extraction
step. So, stacking the encoders trained in this manner minimizes
information loss. At the meantime, they preserve abstract and
invariant information in deeper feature. This is the reason why we
choose AE to progressively extract deep features for hyperspec-
tral data.
C. Stacked AE
Stacking the input and hidden layers of AEs together layer by
layer constructs a SAE. The model is used to generate deep
features of hyperspectral data. Fig. 2 shows a typical instance of a
SAE connected with a subsequent logi stic regression classi fier.
The first AE maps inputs in 0th layer to a first layer feature in
first layer. It is trained using the same method introduced in
Section II-B. After we finish training the first layer AE, subse-
quent layers of AEs are trained via the output of its previous
layer. For example, although we are training the AE between the
second and third layer, we try to recons truct the output of the
second layer according to the activity of the third layer. After this
layer of training, the decoder of the third layer AE is cast away
and only the input-to-hidden parameters are incorporated as
weights between the second and the third layer.
If the subsequent classifier is implemented as a neural network
too, parameters throughout the whole network can be adjusted
slightly while we are training the classifier. This step is called
fine-tuning. For logistic regression, the training is simply back
propagation, searching for a minimum in a peripheral region of
parameters initialized by the former step.
III. C
LASSIFYING WITH SPECTRAL FEATURES
There exist some motivations to extract robust deep spectral
features. First, because of the complex sit uation of lighting in the
large scene, objects of the same class show different spectral
characteristics in different locations. For example, a lawn ex-
posed to direct sunlight shows different spectral characteristics
from a similar lawn eclipsed from the sunlight by a high building.
Also, scattering from other peripheral ground objects tilts the
spectra of the lawn and changes its characteristics too. Other
factors involve rotations of the sensor, different atmospheric
scattering conditions, and so on. According to these factors, the
probability distribution of a certain class is hard to be one-hot and
has variations over multiple directions in the feature space. These
complex variations of spectra make it hopele ss to analyze pixel
by pixel how they are affected by their tangent pixels in the
complicated real situation, thus they demand more robust and
invariant features. It is believed that deep architectures can
potentially lead to progressively more abstract features at higher
layers of feature, and more abstract features are generally
invariant to most local changes of the input [24].
To get more generally invariant featu res and tackle these
problems, a deep spectral feature of hypers pectral data can be
learned progressively layer by layer with the aforementioned AE
models. Gene rally speaking, we first compute features via a SAE
and deem them as the features of data, then construct a logistic
regression classifier on top of the neural network to finish the
classification phase. By adjusting different numbers of layers of
AEs, both shallow and deep features can be learned. Fig. 3 shows
a typical instance of the de ep architecture used in our paper. The
training procedure will be detailed below.
A. Hierarchal Pretraining
The first stage is to learn a deep feature of spectra via
pretraining a SAE in a hierarchal manner, which is outlined in
Fig. 2. Instance of a SAE connected with a logistic regression layer. It has five
layers: one input layer, three hidden layers, and an output layer.
Fig. 3. Classifying with spectral feature. The classification scheme shown here
has five layers: one input layer, three hidden layers of AEs, and an output layer of
logistic regression. If we want to learn a shallower feature set, we just remove the
higher layers of AE.
2096 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 7, NO. 6, JUNE 2014
剩余13页未读,继续阅读
安全验证
文档复制为VIP权益,开通VIP直接复制

评论0