Spark并行优化：高光谱图像分类的分布式空间相关正则化

127 浏览量更新于2024-08-26 收藏 1.29MB PDF 举报

"基于空间相关正则化稀疏表示的高光谱图像分类分布式并行优化" 高光谱图像处理是一项复杂的技术，它涉及到对多波段图像数据的分析，这些图像通常具有广泛的覆盖范围、高维度和海量的数据量。由于这些特性，处理高光谱图像时，计算过程往往非常耗时。为了克服这一挑战，研究人员开始利用分布式并行计算技术，如Apache Spark。 Spark作为一个强大的分布式大数据处理框架，其核心特性是内存计算，能够显著提升大规模数据处理的效率。Spark通过将数据存储在内存中，减少了磁盘I/O操作，加快了数据处理速度，尤其适合需要多次迭代的计算任务，如高光谱图像的分类。本文提出的分布式并行空间相关正则化稀疏表示分类（DP-SCSRC）算法，是针对高光谱图像处理的一种优化策略。SCSRC算法本身是一种基于稀疏表示的分类方法，它通过考虑空间邻近像素间的相关性，提高了分类的准确性。在DP-SCSRC中，关键创新在于如何在Spark的弹性分布式数据集（RDD）上实现这一算法的并行化。首先，为了保留空间相关性，相邻的高光谱图像索引被存储在同一RDD分区中。这样设计的目的是确保在并行计算过程中，同一区域的像素可以在同一计算节点上处理，从而充分利用空间相关性的信息，减少不必要的通信开销。其次，引入了联合分布式矩阵（JDM）的概念，这是一种优化的数据结构，用于减少不同计算节点之间同步数据的成本。通过在节点间高效地分发和共享数据，JDM使得大规模高光谱图像的处理更加高效。实验结果表明，DP-SCSRC在处理实际高光谱数据时，不仅显著提升了分类速度，而且具备良好的可扩展性，能够适应更大的数据量。这种分布式并行优化策略对于处理高光谱图像的实时性和大规模性问题具有重要的实际意义，尤其在遥感、环境监测、军事侦察等应用领域，能够大幅提升数据分析的效率和精度。基于Spark的DP-SCSRC算法通过空间相关性的保留和分布式并行计算，解决了高光谱图像处理中的计算效率问题，为处理大规模高光谱数据提供了新的解决方案。该方法的成功实施依赖于有效的数据分区和通信策略，以及对Spark框架的深入理解和巧妙利用。

DISTRIBUTED PARALLEL OPTIMIZATION OF HYPERSPECTRAL IMAGE

CLASSIFICATION BASED ON SPATIAL CORRELATION REGULARIZED SPARSE

REPRESENTATION

Junling Shen

,Zekun Kang

,Zebin Wu

1,2∗

, Zhihui Wei

, Yaoqin Zhu

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China;

Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing, 210003, China

ABSTRACT

The hyperspectral image features wide coverage, high dimen-

sional bands and a huge amount of data, which leads to time-

consuming computation when processing hyperspectral data.

Spark is a distributed big data processing framework, integrat-

ed in-memory computation. So Spark is suitable for complex

iterative calculation. In order to classify massive hyperspec-

tral data efﬁciently, the Spark version of the original Spatial

Correlation Regularized Sparse Representation Classiﬁcation

(SCSRC) is proposed in this paper. In Distributed Parallel

SCSRC (DP-SCSRC), ﬁrstly, adjacent hyperspectral image

indexes are stored in the same partition of Spark’s RDDs to

preserve spatial correlation. Secondly, Joint Distributed Ma-

trix (JDM) is created to reduce overhead data synchronization

between computing nodes. Experimental results on real hy-

perspectral data demonstrate that DP-SCSRC achieves a re-

markable speedup and is scalable with larger data size.

Index Terms— hyperspectral classiﬁcation, Spark, s-

parse representation, spatial correlation

1. INTRODUCTION

Hyperspectral image (HSI) classiﬁcation is one of the most

popular tasks in the remote sensing processing ﬁeld. For H-

SI, the recorded spectra have ﬁne wavelength resolution and

cover hundreds of narrow and continuous bands, so it’s an ef-

fective way for ground object identiﬁcation or mineral explo-

ration. Recently, several sparsity-based methods have been

successfully applied in HSI classiﬁcation [5]. In addition, a

trend of HSI classiﬁcation for improving the classiﬁcation ac-

curacy is to include the spatial information [7], because H-

SI data in local region is usually similar in terms of spectral

characteristics. Considering this, [3] proposes one state-of-art

*Corresponding author. Email: Zebin.wu@gmail.com

This work was supported in part by the national natural science founda-

tion of china under grant no. 61471199, 91538108, 11431015, the fundamen-

tal research funds for the central universities under grant no.30917015104,

the research funds of jiangsu high technology research key laboratory for

wireless sensor networks under grant no. wsnlbkf201507, and the open

fund of state key laboratory of intelligent manufacturing system technology,

qyye1603.

Spatial Correlation Regularized Sparse Representation Clas-

siﬁcation (SCSRC), which introduces spatial smoothness reg-

ularization to sparse representation model. However, SCSRC

involves solving l1-norm problem with ADMM [3] method,

which is highly iterative and memory-consuming. Single ma-

chine can hardly bear the memory and CPUs load by SCSRC,

especially when HSI data is huge.

Recently, cloud computing which provides homogeneous

control over dedicated resources (e.g., networks, storage,

applications), has become much more popular, both in re-

search and commercial areas [4]. Cloud computing proposes

efﬁcient distributed computing model like MapReduce [2],

DAG [9] etc. Among cloud computing tools, Spark is an

open source distributed big data processing framework , pro-

posed by UC Berkeley AMPLab in 2009 [1]. It introduces

a distributed memory abstraction called resilient distributed

datasets (RDD) [2], which allows programmers to perform

in-memory computing on Spark cluster, still retaining the

data ﬂow models like MapReduce or DAG. Spark is extreme-

ly suitable for complex iterative calculations, it caches the

intermediate results into memory instead of writing results

of each iteration onto disk, so disk IO time is reduced in the

next iteration. Some traditional algorithms have achieved

signiﬁcant speedup after being implanted to Spark [8].

In this paper, the Distributed Parallel SCSRC (DP-

SCSRC) based on Spark cluster is proposed. As a sub task of

DP-SCSRC takes responsibility of a part of HSI, DP-SCSRC

centers on reducing data synchronization between sub tasks.

Section 2 brieﬂy introduces the formulations of SCSRC. Par-

tition scheme based on spatial correlation and joint distributed

matrix are proposed in Section 3 to avoid sub tasks’ unnec-

essary data transmission. Section 4 proves the effectiveness

of DP-SCSRC through experimental results on HSI. Finally,

Section 5 gives a summary of our work.

2. SPATIAL CORRELATION REGULARIZED

SPARSE REPRESENTATION CLASSIFICATION

Suppose that there are labeled samples of C distinct classes,

and the cth class has J

samples: A

=[a

, a

, ··· , a

] ∈

,((( ,*$566

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38522795

粉丝: 3
资源: 897

Spark并行优化：高光谱图像分类的分布式空间相关正则化

hsi图像分割matlab代码-MUA_SparseUnmixing:稀疏高光谱分解的快速多尺度空间正则化

基于LASSO算法的稀疏正则化高光谱图像的光谱解混合算法matlab仿真+仿真录像

基于平滑L0正则化的稀疏高光谱分解

使用L1 / 2正则化低秩表示和基于稀疏表示的图割进行光谱空间高光谱图像分类

图正则化非局部高光谱图像去噪方法matlab代码.zip

matlab-(含教程)基于LASSO算法的稀疏正则化高光谱图像的光谱解混合算法matlab仿真

电信设备-基于空间信息约束的增强型稀疏表示高光谱图像分类装置及方法.zip

基于联合正则化的稀疏磁共振图像重构

基于核稀疏表示与半局部空间图正则化的高光谱图像分类

L1/2正则化与稀疏表示：光谱空间高光谱图像分类的新策略

最新资源