DISTRIBUTED PARALLEL OPTIMIZATION OF HYPERSPECTRAL IMAGE
CLASSIFICATION BASED ON SPATIAL CORRELATION REGULARIZED SPARSE
REPRESENTATION
Junling Shen
1
,Zekun Kang
1
,Zebin Wu
1,2∗
, Zhihui Wei
1
, Yaoqin Zhu
1
1
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China;
2
Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing, 210003, China
ABSTRACT
The hyperspectral image features wide coverage, high dimen-
sional bands and a huge amount of data, which leads to time-
consuming computation when processing hyperspectral data.
Spark is a distributed big data processing framework, integrat-
ed in-memory computation. So Spark is suitable for complex
iterative calculation. In order to classify massive hyperspec-
tral data efficiently, the Spark version of the original Spatial
Correlation Regularized Sparse Representation Classification
(SCSRC) is proposed in this paper. In Distributed Parallel
SCSRC (DP-SCSRC), firstly, adjacent hyperspectral image
indexes are stored in the same partition of Spark’s RDDs to
preserve spatial correlation. Secondly, Joint Distributed Ma-
trix (JDM) is created to reduce overhead data synchronization
between computing nodes. Experimental results on real hy-
perspectral data demonstrate that DP-SCSRC achieves a re-
markable speedup and is scalable with larger data size.
Index Terms— hyperspectral classification, Spark, s-
parse representation, spatial correlation
1. INTRODUCTION
1
Hyperspectral image (HSI) classification is one of the most
popular tasks in the remote sensing processing field. For H-
SI, the recorded spectra have fine wavelength resolution and
cover hundreds of narrow and continuous bands, so it’s an ef-
fective way for ground object identification or mineral explo-
ration. Recently, several sparsity-based methods have been
successfully applied in HSI classification [5]. In addition, a
trend of HSI classification for improving the classification ac-
curacy is to include the spatial information [7], because H-
SI data in local region is usually similar in terms of spectral
characteristics. Considering this, [3] proposes one state-of-art
*Corresponding author. Email: Zebin.wu@gmail.com
1
This work was supported in part by the national natural science founda-
tion of china under grant no. 61471199, 91538108, 11431015, the fundamen-
tal research funds for the central universities under grant no.30917015104,
the research funds of jiangsu high technology research key laboratory for
wireless sensor networks under grant no. wsnlbkf201507, and the open
fund of state key laboratory of intelligent manufacturing system technology,
qyye1603.
Spatial Correlation Regularized Sparse Representation Clas-
sification (SCSRC), which introduces spatial smoothness reg-
ularization to sparse representation model. However, SCSRC
involves solving l1-norm problem with ADMM [3] method,
which is highly iterative and memory-consuming. Single ma-
chine can hardly bear the memory and CPUs load by SCSRC,
especially when HSI data is huge.
Recently, cloud computing which provides homogeneous
control over dedicated resources (e.g., networks, storage,
applications), has become much more popular, both in re-
search and commercial areas [4]. Cloud computing proposes
efficient distributed computing model like MapReduce [2],
DAG [9] etc. Among cloud computing tools, Spark is an
open source distributed big data processing framework , pro-
posed by UC Berkeley AMPLab in 2009 [1]. It introduces
a distributed memory abstraction called resilient distributed
datasets (RDD) [2], which allows programmers to perform
in-memory computing on Spark cluster, still retaining the
data flow models like MapReduce or DAG. Spark is extreme-
ly suitable for complex iterative calculations, it caches the
intermediate results into memory instead of writing results
of each iteration onto disk, so disk IO time is reduced in the
next iteration. Some traditional algorithms have achieved
significant speedup after being implanted to Spark [8].
In this paper, the Distributed Parallel SCSRC (DP-
SCSRC) based on Spark cluster is proposed. As a sub task of
DP-SCSRC takes responsibility of a part of HSI, DP-SCSRC
centers on reducing data synchronization between sub tasks.
Section 2 briefly introduces the formulations of SCSRC. Par-
tition scheme based on spatial correlation and joint distributed
matrix are proposed in Section 3 to avoid sub tasks’ unnec-
essary data transmission. Section 4 proves the effectiveness
of DP-SCSRC through experimental results on HSI. Finally,
Section 5 gives a summary of our work.
2. SPATIAL CORRELATION REGULARIZED
SPARSE REPRESENTATION CLASSIFICATION
Suppose that there are labeled samples of C distinct classes,
and the cth class has J
c
samples: A
c
=[a
1
c
, a
2
c
, ··· , a
J
c
c
] ∈
,((( ,*$566