深度学习在遥感领域的全面综述：理论、工具与挑战

版权申诉

5星 · 超过95%的资源 128 浏览量更新于2024-07-21 收藏 1.23MB PDF 举报

《深度学习在遥感中的综合调查：理论、工具与社区挑战》是一篇由John E. Ball, Derek T. Anderson和Chee Seng Chan合著的文章，发表于2017年的《应用遥感》杂志第11卷第4期，论文编号为042609。该论文深入探讨了深度学习（Deep Learning，DL）在遥感领域（Remote Sensing, RS）的应用和发展，尤其是在计算机视觉（Computer Vision, CV）、语音识别和自然语言处理等领域取得的显著进步之后。随着深度学习的兴起，它作为一种强大的神经网络（Neural Networks, NNs）形式，已经成为了遥感技术中的关键技术。遥感因其独特的挑战而备受关注，这些挑战主要源于传感器类型多样性和应用场景的广泛性。然而，尽管如此，深度学习已经在解决遥感数据的解析、特征提取、图像分类、目标检测、地理空间分析等多个方面展现出了巨大潜力。文章首先介绍了深度学习的基本理论，包括深度神经网络的架构（如卷积神经网络(Convolutional Neural Networks, CNNs) 和循环神经网络(RNNs)），以及它们如何在处理遥感图像的高维复杂性时发挥作用。作者详细阐述了深度学习在遥感中的具体应用，例如通过深度学习对多光谱和高分辨率遥感数据进行分类和解析，以支持土地利用变化监测、城市规划、环境监测等任务。同时，论文还讨论了深度学习在遥感领域的工具，包括现成的开源框架（如TensorFlow、PyTorch和Keras）以及专为遥感设计的特定库（如Satellite Application Facility on Climate Change ( SAF-CCI) 数据集）。这些工具极大地促进了深度学习在遥感研究中的实践，并降低了研究者和技术人员的入门门槛。然而，尽管深度学习在遥感中取得了显著成果，文中也指出了该领域面临的挑战。首先，数据质量问题，如不一致的标注、噪声和缺失值，对深度学习模型的训练和性能有着重大影响。其次，模型解释性问题，由于深度学习模型通常被认为是“黑箱”，这在需要可解释性结果的遥感应用中是个难题。此外，计算资源的需求、模型的泛化能力、以及对大规模和实时数据处理的需求也是需要解决的关键问题。最后，文章呼吁遥感社区加强跨学科合作，包括与机器学习、计算机视觉、信号处理等领域的专家共同探索深度学习的新方法和解决方案，以便更好地应对深度学习在遥感中所面临的各种挑战。这篇综述为遥感领域的研究者提供了宝贵的指导，也为深度学习技术在遥感领域的未来发展指明了方向。

available, (3) RS data, especially hyperspectral data, are a very large data cube and many suc-

cessful DL algorithms are tuned for small RGB image patches, (4) RS data gather ed via light

detection and ranging (LiDAR) have insufficient DL literature (data are point clouds and not

images), and (5) the best architecture is usually unknown a priori which means a gridded search

(which can be very time consuming) or random methods such as those discussed in Ref. 50 are

required for optimization. Chapter 8 of Ref. 23 discusses optimization techniques for training DL

models. A thorough discussion of these techniques is beyond the scope of this paper; however,

we list some common methods typically used to train DL systems.

Goodfellow et al.

in Sec. 8.5.4 point out that there is no current consensus on the best

training/optimization algorithm. For the interested reader, the survey paper of Schaul et al.

provides results for many optimization algorithms over a large variety of tasks. CNNs are typ-

ically trained using stochastic gradient descent (SGD), SGD with momentum,

AdaGrad,

RMSProp,

and ADAM.

For details on the pros and cons of these algorithms, refer to

Secs. 8.3 and 8.5 of Ref. 23. There are also second-order methods, and these are discussed

in Sec. 8.6 of Ref. 23. A good history of DL is provided in Ref. 56, and training is discussed

in Sec. 5.24. Further discussions in this paper can be found in open questions 7 (Sec. 4.7),

8 (Sec. 4.8), and 9 (Sec. 4.9).

AEs can be trained with optimization algorithms similar to a CNN. Some special AEs, such

as the marginalized DAE in Ref. 25, have closed-form solutions. DBNs can be trained using

greed-layer wise training, as shown in Hinton et al.

and Bengio et al.

and Salakhutdinov

and Hinton

developed an improved pretraining method for DBNs and DBMs, by doubling or

halving the weights (see the paper for more details).

Computation comparisons are complex and highly dependent on factors such as the train-

ing architecture, computer system (and GPUs), the way the architectures get data onto and

off of the GPUs, the particular settings of the optimization algorithm (e.g., the mini-batch

size), the learning rate, etc., and, of course, on the data itself. It is very d ifficult to know

aprioriwhat the complexities will be. This question is currently unanswered in current

DL knowledge.

The survey paper by Shi et al.

investigates tools performance to help users of common DL

tools (see Sec. 2.3.5) such a s Caffe and TensorFlow to understand these tools’ speed, capabilities,

and limitations. They discovered that GPUs are critical to speeding up DL algorithms, whereas

multicore systems do not scale linearly after about 8 cores. The GTX1080 (and now 1080Ti)

GPUs performed the best among the GPUs they tested.

RNNs can be difficult to train due to the exploding gradient problem.

To overcome this

issue, Pascanu et al.

developed a gradient-clipping strategy to more effectively train RNNs.

Martens and Sutskever

developed a Hessian-free with dampening scheme RNN optimization

and tested it on very challenging data sets.

Last, the survey paper of Deng

discusses DL architectures and gives many references for

training DL systems. The survey paper of Bengio et al.

on unsupervised FL also discusses

various learning and optimization strategies.

2.3.4 Big data

Every day, approximately 350 million images are uploaded to Facebook,

Wal-Mart collects

approximately 2.5 petabytes of data per day,

and National Aeronautics and Space

Administration (NASA) is actively streaming 1.73 gigabytes of spacecraft borne observation

data for active missions alone.

IBM reports that 2.5 quintillion bytes of data are now generated

every data, which means that “90% of the data in the world today has been created in the last two

years alone.”

The point is that an unprecedented amount of (varying quality) data exists due to

technologies such as RS, smartphones, and inexpensive data storage. In times past, researchers

used tens to hundreds, maybe thousands of data training samples, but nothing on the order of

magnitude as today. In areas such as CV, high data volume and variety are at the heart of

advancements in performance, meaning reported results are a reflection of advances in data

and machine learning.

To date, a number of approaches have been explored relative to large-scale deep networks

(e.g., hundreds of layers) and big data (e.g., high volume of data). For example, Raina et al.

Ball, Anderson, and Chan: Comprehensive survey of deep learning in remote sensing: theories. . .

Journal of Applied Remote Sensing 042609-9 Oct–Dec 2017

•

Vol. 11(4)

put forth central processing unit (CPU) and GPU ideas to accelerate DBNs and sparse coding.

They reported a 5- to 15-fold speed-up for networks with 100 million plus parameters

versus previous works that used only a few million parameters at best. On the other hand,

CNNs typically use back propagation and they can be implemented either by pulling or

pushing.

Furthermore, ideas such as circular buffers

and multi-GPU-based CNN architec-

tures, e. g., Krizhevsky et al.,

have been put forth. Outside of hardware speedups, operators

such as ReLUs have be en shown to run several times f aster than other common no nlinear

functions. Deng et al.

put forth a deep stacking network (DSN) that consists of specialized

NNs (called modules), each of which has a single hidden layer. Hutchinson et al.

put f o rt h

Tensor-DSN as an efficient and pa rallel extension of DSNs for CPU clu sters. Furthermore,

DistBelief is a library for distributed training and learning of deep networks with large

models (billions of parameters) and massive sized data sets.

DistBelief makes use of machine

clusters to manage the data and parallelism via methods such as multithreading, message

passing, synchronization, and machine-to-machin e communication. DistBelief uses

different optimization methods, namely SGD and Sandblaster.

Last, but not least, there a re

network architectures such as highwa y networks, residual networks, and den se nets.

30,73–76

For example, highw ay networks are based on LSTM recurrent networks and they allow

for the efficient training of deep networks with hundreds of layers based on gradient

descent.

73–75

Najafabadi et al.

discuss some aspects of DL in big data analysis and challenges associated

with these efforts, including nonstationary data, high-dimensional data, and large-scale models.

The survey paper of Chen and Len

also discusses challenges associated with big data and DL,

including high volumes, variety, and velocity of data. The survey paper of Landset et al.

discusses open-source tools for ML with big data in Hadoop ecosystem.

2.3.5 Tools

Tools are also a large factor in DL research and development. Wan et al.

observed that DL is at

the intersection of NNs, graphical modeling, optimization, pattern recognition, and signa l

processing, which means there is a fairly high background level required for this area. Good

DL tools allow researchers and stude nts to try some basic architectures and create new ones

more efficiently.

Table 1 lists some popular DL toolkits and links to the code. Herein, we review some of the

DL tools, and the tool analysis below is based on our experiences with these tools. We thank our

graduate students for providing detailed feedback on these tools.

AlexNet

was a revolutionary paper that reintroduced the world to the results that DL

ca n offer. Al exNet uses ReLU because it is several times faster to evaluate than the hyperbolic

tange nt. Alex Net revealed the importance of preprocessing by incorporating some data aug -

mentation techniques and was able to combat overfitting using max pooling and dropout

layers.

Caffe

was the first widely used DL tool kit. Caffe is C++ based and can be compiled on

various devices, and offers command line, Python, and MATLAB interfaces. There are many

useful examples provided. The cons of Caffe are that is relatively hard to install, due to lack of

documentation and not being developed by an organized company. For those interested in some-

thing other than image processing (e.g., image classification, image segmentation) it is not really

suitable for other areas, such as audio signal processing.

TensorFlow

is arguably the most popular DL tool available. Its pros are that TensorFlow

(1) is relatively easy to install both with CPU and GPU version on Ubuntu (The GPU version

needs CUDA and cuDNN to be installed ahead of time, which is a little complicated); (2) has

most of the state-of-the-art models implemented, and while some original impleme ntations are

not implemented in TensorFlow, it is relatively easy to find a reimplementation in TensorFlow;

(3) has very good documentation and regular updates; (4) supports both Python and C++ inter-

faces; and (5) is relatively easy to expand to other areas besides image processing, as long as you

understand the tensor processing. One con of TensorFlow is that it is really restricted to Linux

applications, as the windows version is barely usable.

Ball, Anderson, and Chan: Comprehensive survey of deep learning in remote sensing: theories. . .

Journal of Applied Remote Sensing 042609-10 Oct–Dec 2017

•

Vol. 11(4)

剩余54页未读，继续阅读

Fun_He

粉丝: 19

深度学习在遥感领域的全面综述：理论、工具与挑战

DEEP LEARNING IN REMOTE SENSING.rar_deep learning_remote_remote

Remote_Sensing_Image_Fusion_With_Deep.pdf

remotesensing-11-01184.pdf_remotesensing_

ASS1.rar_data remote sensing_remote_remote sensing

remote-sensing-2.rar_matlab 遥感_remote_remote sensing_遥感

A Review on Deep Learning in UAV Remote Sensing.pdf

spectrum_sensing.zip_Spectrum_Spectrum_sensing_spectrum sensing_

Deep Learning in Remote Sensing A Review.pdf

Fundamentals_of_Remote_Sensing(遥感原理).pdf

compressive sensing.zip_remote sensing_压缩感知 融合_小波融合_融合 压缩感知_遥感压缩

最新资源

compressive sensing.zip_remote sensing_压缩感知融合_小波融合_融合压缩感知_遥感压缩