V. Kumar, R.S. Singh and Y. Dua Signal Processing: Image Communication 101 (2022) 116549
discriminative information in the shallow layers is transferred via these
connections to aid reconstruction and classification tasks. In [48],
the author proposes a two-branch spectral–spatial attention network
for hyperspectral image classification, with one branch dedicated to
spectral attention and the other to spatial attention. Each convolutional
layer includes attention modules. Thus, making CNN prioritize discrim-
inative channels and spatial positions while suppressing irrelevant ones.
Furthermore, two-branch results are fused in the classification phase
using an adaptively weighted summation method.
Based on the literature’s core problems, as mentioned above, and an
extension of work from Roy et al. [37], we proposed a novel morpho-
logically dilated convolutional network (MDCNN). MDCNN uses both
the 3D convolution layer and the 2D convolution layer, morphological
feature maps, and both standard as well as dilated convolution. In
MDCNN, the principal component analysis (PCA) algorithm is used
to transform high-dimensional input data into low-dimensional data
in order to minimize computational costs. CNN has a flaw, which is
the inaccuracy of boundary location, resulting in partial object shapes.
As a result, information obtained by another type of spatial extractor
can improve deep network feature representation [49]. Then different
mathematical morphology operations are applied on new low dimen-
sional input data to extract discriminant spatial feature maps. Further,
these morphological feature maps concatenated with the previous low
dimensional input data. Then, input patches are extracted around each
pixel, which will send to the network. Both standard 3D convolution
and Dilated-3D convolution were applied to extract spectral–spatial
features at the same time. Then Dilated-2D convolution was used to
extract discriminant spatial features. The dilated convolution applied
in this paper by substituting the convolution layers in traditional CNN
with dilated convolution layers, which expands the receptive field
without boosting parameters and thus improves network performance
without increasing network complexity [50–52]. The dilation layer
does not reduce the number of parameters but reduces the size of the
output feature map, which leads to the overall reduction in the number
of parameters. Spectral–spatial attributes are then transmitted to fully
connecting layers to extract abstract high-level features.
This papers primary contributions can be summarized as follows:
1. Mathematical morphological operations are applied to input
hyperspectral data to extract spatial feature map as output.
2. This output is concatenated with the input and fed into neural
network to reduce the workload of CNN and provide better
spatial features.
3. The neural network contains both 3D convolutional layers as
well as a 2D convolutional layer. We use a mix of traditional and
dilated convolution to increase the receptive field power, reduce
trainable parameters, and reduce overfitting, which in turn re-
sults in the reduction of overall complexity of the model. The
dilated convolution layer’s output feature map size is slightly
less than the output of the traditional convolution layer, which
reduces overall trainable parameters.
4. The model simultaneously extracts the discriminatory spectral–
spatial attributes or properties to achieve high classification
accuracy by utilizing the spectral–spatial relationship.
5. Experiments were performed on three different publicly avail-
able datasets to evaluate the performance of MDCNN model with
other state-of-the-art methods.
The rest of this paper is structured in Section 2, a detailed overview
of the current MDCNN structure. Section 3 discusses experimental
evidence, setup, findings, and interpretation. Finally, some conclusions
are drawn in Section 4.
2. Problem formulation
HSI data is also known as a hypercube. This hypercube can be
represented as 𝐈 ∈ 𝐑
[𝐇×𝐖×𝐂]
, where I is the original HSI input, H is
the height, W is the width of the input, and C is the total number
of spectral bands. HSI provides a tremendous amount of information
through a large number of spectral bands, but their high dimensionality
increases the computational burden. So, we use PCA, which can reduce
the high dimensional data into a low dimension data with minimum
loss of useful information. The reduced data cube after applying PCA is
𝐈𝐏 ∈ 𝐑
[𝐇×𝐖×𝐏]
, where P is the number of the principal component. Bina-
rization operation applies to low dimensional data and gives an output
𝐈𝐁 ∈ 𝐑
[𝐇×𝐖×𝐊]
. Only K initial bands are selected for the Binarization
process. Three mathematical morphological operations apply on Binary
data cube IB and give an output of size [𝐻 × 𝑊 × 𝐾] each. Data cube
IP, 𝐼
𝐸𝑟𝑜𝑠𝑖𝑜𝑛
, 𝐼
𝐷𝑖𝑙𝑎𝑡𝑖𝑜𝑛
and 𝐼
𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡
are concatenated and form an output
𝐈𝐂 ∈ 𝐑
[𝐇×𝐖×𝐁]
, where 𝐁 = (𝐏 + 𝟑𝐊). Input patches 𝐈
𝐏𝐚𝐭𝐜𝐡
∈ 𝐑
[𝐒𝑥𝐒𝑥𝐁]
are
extracted from IC and fed to the CNN for deep spectral–spatial features
extraction and classification task.
3. Proposed framework
Fig. 1 describe the workflow of the proposed model MDCNN, which
includes different steps like (a) Dimension reduction of the hyper-
spectral cube in spectral direction using principal component analy-
sis (PCA), (b) Binarization of the low-dimension hyperspectral cube,
(c) Mathematical morphological operations on the binarized data, (d)
Concatenation of low-dimension hyperspectral cube and morpholog-
ical feature maps (Erosion, Closing, and Gradient), (e) Extraction of
patches for the input of the convolutional neural network, (f) Deep
spectral–spatial features extraction using standard 3D convolution and
Dilated-3D convolution, (g) Discriminative spatial feature extraction
using Dilated-2D convolution, and (h) Prediction of classification map
using softmax classifier.
3.1. Binarization process
Binarization is a process of converting any input vector’s value to
a spectrum of 0 to 1. In the case of HSI data, at the first pixel values,
transform into the range of 0 to 255. The expression for the conversion
is as follows:
𝛹
𝑖,𝑗
=
255 ∗ (𝐼𝑝
𝑖,𝑗
− 𝑚𝑖𝑛(𝐼 𝑝))
𝐼𝑝
𝑖,𝑗
(1)
Where Ip is the image, 𝐈𝐩
𝐢,𝐣
is the pixel value at the position (i, j). After
the range conversion, a threshold (Θ) is selected using the expression
as follow:
𝛩 =
∑
𝐻−1
𝑖=0
∑
𝑊 −1
𝑗=0
𝛹
𝑖,𝑗
𝐻 × 𝑊 × 𝐾
(2)
Where H and W are the height and width of the input cube, and K is
the number of bands selected for binarization. Now, the value of 𝜳
𝐢,𝐣
will be one if it is greater than or equal to Θ, otherwise zero. The new
image formed after thresholding is given by:
𝐵𝐼
𝑖,𝑗
=
{
1, 𝐢𝐟 𝛹
𝑖,𝑗
≥ 𝛩
0, 𝐢𝐟 𝛹
𝑖,𝑗
< 𝛩
(3)
3.2. Mathematical morphology
The high spatial resolution possessed by hyperspectral images makes
a very less number of mixed pixels and provides clear boundaries
between different objects in the case of land covers data. So, the
discriminative spatial features like morphological features can provide
results with better accuracy. Serra [53] first introduced morpholog-
ical analysis in 1982 and used Structuring Elements(SEs) to collect
information on the shape, boundary, and skeleton of an image. It is a
three-step process to acquire morphological feature map, (a) selection
of structuring element, (b) The conversion of image into a binary image,
since through binary image only the structuring element can follow the
3