并行3维快速傅立叶变换：平面波负载平衡技术

135 浏览量更新于2024-08-26 收藏 1.8MB PDF 举报

“平面波的负载平衡的并行3维快速傅立叶变换” 这篇研究论文深入探讨了在计算物理和材料科学计算中广泛采用的平面波方法。平面波方法是解决基于第一性原理的Kohn-Sham方程的主要工具。Kohn-Sham方程是密度泛函理论(DFT)的核心，用于描述多电子系统的量子力学行为，从而预测物质的性质。在平面波方法中，三维（3-dim）试波函数的快速傅立叶变换（FFT）是一项常规操作，也是计算过程中的关键步骤。FFT是一种高效算法，能够将连续函数或离散序列在傅立叶域和时域之间进行转换，对于理解和模拟材料的电子结构至关重要。然而，随着计算规模的增大，处理大量的平面波和FFT操作会面临计算资源分配不均和效率低下的问题，这正是负载平衡问题所在。论文的重点在于提出并实现了一种并行3-dim FFT算法，该算法通过负载平衡策略优化了计算性能。负载平衡是指在多处理器系统中，合理分配任务以确保所有处理器的工作负载均衡，避免某些处理器过载而其他处理器空闲的情况。在平面波计算中，负载平衡可以确保计算资源得到充分利用，提高整体计算效率。作者Xingyu Gao、Zeyao Mo、Jun Fang、Haifeng Song、Han Wang在文章中详细介绍了他们的方法，可能包括如何根据计算任务的特性动态调整工作分配，以及如何利用并行计算架构（如GPU或分布式计算集群）来实现负载均衡。他们可能还讨论了如何评估和测试这种负载平衡策略的性能，并对比了优化前后的时间复杂度和计算速度。关键词包括“第一性原理计算”、“Kohn-Sham方程”、“平面波”、“FFT”和“负载平衡”，强调了研究的核心内容。论文经历了从2016年1月的提交到7月的在线发布，反映了研究的完整周期，包括修改和同行评审的过程。通过这项工作，研究者们为大型材料科学计算提供了更高效的解决方案，有助于推动科学计算领域的发展，特别是对于需要处理大规模数据和复杂计算的现代材料设计和模拟。这种并行化和负载平衡的策略对于处理未来更大规模的DFT计算具有重要意义，可能会被广泛应用于材料科学、化学、物理学以及其他依赖于高效数值模拟的领域。

Computer Physics Communications 211 (2017) 54–60

Contents lists available at ScienceDirect

Computer Physics Communications

journal homepage: www.elsevier.com/locate/cpc

Parallel 3-dim fast Fourier transforms with load balancing of the

plane waves

Xingyu Gao

a,b,c

, Zeyao Mo

a,b,c

, Jun Fang

b,c

, Haifeng Song

a,b,c

, Han Wang

b,c,∗

Laboratory of Computational Physics, Huayuan Road 6, Beijing 100088, PR China

Institute of Applied Physics and Computational Mathematics, Fenghao East Road 2, Beijing 100094, PR China

CAEP Software Center for High Performance Numerical Simulation, Huayuan Road 6, Beijing 100088, PR China

a r t i c l e i n f o

Article history:

Received 29 January 2016

Received in revised form

23 May 2016

Accepted 2 July 2016

Available online 7 July 2016

Keywords:

First-principles calculation

Kohn–Sham equation

Plane wave

FFT

Load balancing

a b s t r a c t

The plane wave method is most widely used for solving the Kohn–Sham equations in first-principles

materials science computations. In this procedure, the three-dimensional (3-dim) trial wave functions’

fast Fourier transform (FFT) is a regular operation and one of the most demanding algorithms in terms of

the scalability on a parallel machine. We propose a new partitioning algorithm for the 3-dim FFT grid to

accomplish the trade-off between the communication overhead and load balancing of the plane waves.

It is shown by qualitative analysis and numerical results that our approach could scale the plane wave

first-principles calculations up to more nodes.

1. Introduction

In the context of Density Functional Theory (DFT), solving the

Kohn–Sham equation is the most time-consuming part of the first-

principles materials science computations [1–3]. The plane wave

method, which is a widely used numerical approach [4], could lead

to a large-scale dense algebraic eigenvalue problem. This problem

is usually solved by iterative diagonalization methods such as

Davidson’s [5], RMM-DIIS [3], LOBPCG [6], Chebyshev polynomial

filtering subspace iteration [7], etc. The elementary operation

of the iteration methods is the matrix–vector multiplication.

Since the large-scale dense matrix is not suitable for explicit

assembly, we realize the matrix–vector multiplication by applying

the Hamiltonian operator on trial wave functions. The local term of

the effective potential is one part of the Hamiltonian operator. In

order to compute its action in a lower time complexity, we perform

3-dim FFT twice on one trial wave function in each matrix–vector

multiplication.

There are three features to make the trial wave function’s

FFT one of the most demanding algorithms to scale on a parallel

machine. The first is the moderate sized FFT grid rather than a large

∗

Corresponding author at: Institute of Applied Physics and Computational

Mathematics, Fenghao East Road 2, Beijing 100094, PR China.

E-mail address: wang_han@iapcm.ac.cn (H. Wang).

one. The ratio of computation to communication of the parallel

3-dim FFT is of order log N where N, the single dimension of the

FFT grid, is usually O(10

) in most first-principles calculations

of bulk materials. The second is the accumulated communication

overhead led by many execution times corresponding to a

number of wave functions. Thousands of FFTs may be executed

at each step of iterative diagonalization. The third is the all-to-all

communication required by the data transposes. This could limit

the parallel scaling due to the large number of small messages in

the network resulting in competition as well as latency issues.

It has already been recognized that making fewer and larger

messages can speed up parallel trial wave functions’ FFTs. The

hybrid OpenMP/MPI implementation [8,9] can lead to fewer and

larger messages compared to a pure MPI version. And a blocked

version [9] performs a number of trial wave functions’ FFTs at the

same time to aggregate the message sizes and reduce the latency

problem.

In first-principles calculations, we should consider not only

the parallel scaling of trial wave functions’ FFTs, but also the

load balancing of intensive computations on the plane waves that

expand the wave functions. The workload of these computations

are inhomogeneously distributed on a standard 3-dim FFT grid.

Thus a greedy algorithm is usually used to optimize the load

balancing. However, this algorithm results in global all-to-all

communications across all the processors, thus the latency

overhead would grow in proportion to the number of processors

http://dx.doi.org/10.1016/j.cpc.2016.07.001

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38686267

粉丝: 6
资源: 945

并行3维快速傅立叶变换：平面波负载平衡技术

利用MATLAB实现二维图像傅立叶变换算法.pdf

数学基础傅立叶变换PPT课件.pptx

Delft3D并行计算与模拟加速：一步到位的优化策略

MIKE21并行计算与高性能模拟：加速分析的终极指南

TELEMAC_2D水动力模拟全面指南：从安装到并行计算

IDL“cross”函数与并行计算：释放高性能计算的潜力

【动态调节SVPWM控制策略】：有效应对负载变化的实用技巧

智能机器人故障诊断与预防：快速解决问题的策略

MODTRAN 5：从入门到精通，快速搭建高效大气模拟平台

【ADINA土木工程分析专家】：结构分析技巧让你快速成为行家里手

最新资源