![](https://csdnimg.cn/release/download_crawler_static/10584949/bg3.jpg)
ways. Thus, the weight updating rule can be defined as (where
denotes the learning rate)
After training the network, the reconstruction layer together
with its parameters are removed and the learned feature lies in the
hidden layer, which can subsequently be used for classification or
used as the input of a higher layer to produce a deeper feature.
The power of AE lies in this form of reconstruction-oriented
training. Note that during reconstruction, it only uses the infor-
mation in hidden layer activity , which is encoded as features
from input. If the model can recover original input perfectly from
, it means that retains enough information of the input. And the
learned nonlinear transformation, which is defined by those
weights and biases, can be deemed as a good feature extraction
step. So, stacking the encoders trained in this manner minimizes
information loss. At the meantime, they preserve abstract and
invariant information in deeper feature. This is the reason why we
choose AE to progressively extract deep features for hyperspec-
tral data.
C. Stacked AE
Stacking the input and hidden layers of AEs together layer by
layer constructs a SAE. The model is used to generate deep
features of hyperspectral data. Fig. 2 shows a typical instance of a
SAE connected with a subsequent logi stic regression classi fier.
The first AE maps inputs in 0th layer to a first layer feature in
first layer. It is trained using the same method introduced in
Section II-B. After we finish training the first layer AE, subse-
quent layers of AEs are trained via the output of its previous
layer. For example, although we are training the AE between the
second and third layer, we try to recons truct the output of the
second layer according to the activity of the third layer. After this
layer of training, the decoder of the third layer AE is cast away
and only the input-to-hidden parameters are incorporated as
weights between the second and the third layer.
If the subsequent classifier is implemented as a neural network
too, parameters throughout the whole network can be adjusted
slightly while we are training the classifier. This step is called
fine-tuning. For logistic regression, the training is simply back
propagation, searching for a minimum in a peripheral region of
parameters initialized by the former step.
III. C
LASSIFYING WITH SPECTRAL FEATURES
There exist some motivations to extract robust deep spectral
features. First, because of the complex sit uation of lighting in the
large scene, objects of the same class show different spectral
characteristics in different locations. For example, a lawn ex-
posed to direct sunlight shows different spectral characteristics
from a similar lawn eclipsed from the sunlight by a high building.
Also, scattering from other peripheral ground objects tilts the
spectra of the lawn and changes its characteristics too. Other
factors involve rotations of the sensor, different atmospheric
scattering conditions, and so on. According to these factors, the
probability distribution of a certain class is hard to be one-hot and
has variations over multiple directions in the feature space. These
complex variations of spectra make it hopele ss to analyze pixel
by pixel how they are affected by their tangent pixels in the
complicated real situation, thus they demand more robust and
invariant features. It is believed that deep architectures can
potentially lead to progressively more abstract features at higher
layers of feature, and more abstract features are generally
invariant to most local changes of the input [24].
To get more generally invariant featu res and tackle these
problems, a deep spectral feature of hypers pectral data can be
learned progressively layer by layer with the aforementioned AE
models. Gene rally speaking, we first compute features via a SAE
and deem them as the features of data, then construct a logistic
regression classifier on top of the neural network to finish the
classification phase. By adjusting different numbers of layers of
AEs, both shallow and deep features can be learned. Fig. 3 shows
a typical instance of the de ep architecture used in our paper. The
training procedure will be detailed below.
A. Hierarchal Pretraining
The first stage is to learn a deep feature of spectra via
pretraining a SAE in a hierarchal manner, which is outlined in
Fig. 2. Instance of a SAE connected with a logistic regression layer. It has five
layers: one input layer, three hidden layers, and an output layer.
Fig. 3. Classifying with spectral feature. The classification scheme shown here
has five layers: one input layer, three hidden layers of AEs, and an output layer of
logistic regression. If we want to learn a shallower feature set, we just remove the
higher layers of AE.
2096 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 7, NO. 6, JUNE 2014