自动调优系统优化HDF5并行IO性能

169 浏览量更新于2024-08-25 收藏 1.63MB PDF 举报

"HDF5 - Taming Parallel IO Complexity with Auto-Tuning (P4091-0713_2)" 论文介绍了如何通过自动调优技术优化HDF5应用的并行I/O性能，展示了该系统在不同平台、应用和规模上的有效性。论文作者来自多个知名学术机构和实验室，包括伊利诺伊大学厄巴纳-香槟分校、莱斯大学、劳伦斯伯克利国家实验室等。正文： HDF5是一种流行的数据存储格式，尤其在科学计算领域中广泛使用，因为它支持大容量数据的高效存储和访问。然而，随着计算能力的提升和并行计算环境的普及，如何在并行环境中优化HDF5应用的I/O性能成为一个挑战。该论文提出的自动调优系统正是针对这一问题的解决方案。 1. 自动调优系统概述：论文中介绍的自动调优系统利用遗传算法搜索大量可调参数空间，以找出在并行I/O栈各层（包括硬件、操作系统、文件系统和HDF5库本身）中的最佳参数设置。这个系统通过拦截HDF5调用来透明地应用这些设置，无需用户手动干预，简化了性能优化过程。 2. 系统验证：为了验证自动调优系统的有效性，研究人员选择了三个I/O基准测试——VPIC、VORPAL和GCRM，它们模拟了各自应用的实际I/O行为。通过在不同规模的弱扩展配置下（如128、2048和4096个CPU核心）进行测试，研究团队评估了系统在不同负载条件下的性能表现。 3. 结果与贡献：实验结果表明，该自动调优系统能够显著提高HDF5应用的I/O性能，尤其是在大规模并行计算环境中。它不仅减少了I/O延迟，还可能改善整体计算效率，因为高效的I/O操作可以减少对计算资源的占用，使计算节点能更快地完成任务。 4. 应用意义：这项工作对于处理大数据和高性能计算的科研人员来说具有重要意义，因为它提供了一种自动化的方法来解决并行I/O的复杂性问题，使他们能够在不牺牲效率的情况下，更专注于科学研究的核心部分。 5. 展望：未来的研究可能会进一步优化这个系统，使其适应更多种类的应用和硬件环境，并可能引入更多的智能算法来提高调优精度和速度。此外，这种自动调优技术也可能被推广到其他并行I/O系统和数据存储格式中。这篇论文揭示了通过自动调优技术优化HDF5并行I/O性能的潜力，为解决高性能计算领域的I/O复杂性问题提供了新思路。

Figure 3: A pictorial depiction of the genetic algo-

rithm used in the auto-tuning framework.

performing conﬁguration for a speciﬁc I/O benchmark.

3.1 H5Evolve: Sampling the Search Space

As mentioned previously, due to large size of the param-

eter space and possibly long execution time of a trial run,

ﬁnding optimal parameter sets for writing data of a given

size is a nontrivial task. Depending on the granularity with

which the parameter values are set, the size of the parameter

space can grow exponentially and unmanageably large for a

brute force and enumerative optimization approa ch.

Exact optimization techniques are not appropriate for sam-

pling the search space given the nondeterministic nature of

the objective function which is the runtime of a particular

conﬁguration. Instead of relying on the simplest approach,

manual tweaking, adaptive heuristic search approaches such

as genetic evolution algorithms, simulated annealing, etc.,

can traverse the search space in a reasonable amount of time.

In H5Evolve, we explore genetic algorithms for sampling the

search space.

A genetic algorithm (GA) is a meta-heuristic for approach-

ing an optimization problem, particularly one that is ill-

suited for traditional exact or approximation methods. A

GA is meant to emulate the natural process of evolution,

working with a “population” of potential solutions through

successive “generations” (iterations) as they “reproduce” (in-

termingle portions between two members of the population)

and are subject to “mutations” (random changes to portions

of the solution). A GA is expected, although it cannot nec-

essarily be shown, to converge to an optimal or near-optimal

solution, as strong solutions beget stronger children, while

the random mutations o↵er a sampling of the remainder of

the space.

Our implementation, dubbed H5Evolve, is shown in Fig-

ure 3. It was built in Python using the Pyevolve [20] mod-

ule, which provides an intuitive framework for performing

genetic algorithm experiments in Python.

The workﬂow of H5Evolve is as follows. For a given bench-

mark at a speciﬁc concurrency and problem size, H5Evolve

runs the genetic algorithm (GA). H5Evolve takes a prede-

ﬁned parameter space which contains possible values for the

I/O tuning parameters at each layer of the I/O stack. The

evolution process starts with randomly selected initial pop-

ulation. H5Evolve generates an XML ﬁle containing the se-

lected I/O parameters (an I/O conﬁguration) that H5Tuner

injects into the benchmark. In all of our exp eri ments, the

H5Evolve GA uses a population size of 15; this size is a con-

ﬁgurable option. Starting with an initial group of conﬁgu-

ration sets, the genetic algorithm passes through successive

generations. H5Evolve uses the runtime as the ﬁtness eval-

uation for a given I/O conﬁguration. After each generation

has completed, H5Evolve evaluates the ﬁtness of the popu-

lation and considers the fastest I/O conﬁgurations (i.e., the

“elite members”) for inclusion in the next generation. Ad-

ditionally, the entire current population undergoes a series

of mutations and crossovers to populate the other member

sets in the population of the next generation. This process

repeats for each generation. In our experiments, we set the

numb e r of generations to 40, meaning that H5Evolve runs a

maximum of 600 executions of a given benchmark. We used

a mutation rate of 15%, meaning that 15% of the population

undergoes mutation at each generation. After H5Evolve ﬁn-

ishes sampling the search space, the best performing I/O

conﬁguration is stored as the tuned parameter set.

3.2 H5Tuner: Setting I/O Parameters at Run-

time

The goal of the H5Tuner component is to develop an au-

tonomous parallel I/O parameter injector for scientiﬁc ap-

plications with minimal user involvement, allowing param-

eters to be altered without requiring a recompilation of the

application. The H5Tuner dynamic library is able to set

the parameters of di↵erent levels of the I/O stack—namely,

the HDF5, MPI-IO, and parallel ﬁle system levels in our

implementation. Assuming all the I/O optimization param-

eters for di↵erent levels of the stack are in a conﬁguration

ﬁle, H5Tuner ﬁrst reads the values of the I/O conﬁguration.

When the HDF5 calls appear in the code during the exe-

cution of a benchmark or application, the H5Tuner library

intercepts the HDF5 function calls via dynamic linking. The

library reroutes the intercepted HDF5 calls to a new imple-

mentation, where the parameters from the conﬁguration are

set and then the original HDF5 function is called using the

dynamic library package functions. This approach has the

added beneﬁt of being completely transparent to the user;

the function calls remain exactly the same and all alterations

are made without change to the source code. We show an

example in Figure 4, where H5Tuner intercepts H5FCreate()

function call that creates a HDF5 ﬁle, applies various I/O

parameters, and calls the original H5FCreate() function call.

H5Tuner uses MiniXML [24], a small XML library to read

the XML conﬁguration ﬁles. In our implementation, we are

reading the conﬁguration ﬁle from user’s home directory.

A user has full ﬂexibility to change the conﬁguration ﬁle.

Figure 5 shows a sample conﬁguration ﬁle with HDF5, MPI-

IO, and Lustre parallel ﬁle system tunable parameters.

4. EXPERIMENTAL SETUP

We have evaluated the e↵ectiveness of our auto-tuning

framework on three HPC platforms using three I/O bench-

marks at three di↵erent scales. The HPC platforms in-

clude Hopper, a Cray XE6 system at National Energy Re-

search Scientiﬁc Computing Center (NERSC); Intrepid, a

IBM BlueGene/P (BG/P) system at Argonne Leadership

剩余11页未读，继续阅读

weixin_38688906

粉丝: 12
资源: 904

自动调优系统优化HDF5并行IO性能

P-tuning:一种新的方法来调整语言模型。 纸的代码和数据集``GPT也能理解''

hdf5-1.12.0-1.5.5-API文档-中文版.zip

hdf5-1.6.0到10 hdf5-1.8.18.0 hdf5-1.10.1 hdf5-1.10.2下载

napari-hdf5-labels-io:Napari插件可将Napari项目存储在.h5文件中。 标签层以稀疏表示形式存储（COO列表）

HDF5-read.zip_HDF5-read_hdf5_hdf5 IDL_idl hdf5_idl读取hdf5数据

hdf安装包hdf5-1.10.5-Std-win10_64-vs15.7z

HDF5-Doc_HDF5的说明文档_doc_

最新caffe HDF5Data 安装包 hdf5-1.10.5-Std-win7_64-vs14

hdf5-api-ref:HDF5

hdf5-1.8.18-centos7-x86_64

最新资源

P-tuning:一种新的方法来调整语言模型。纸的代码和数据集``GPT也能理解''

napari-hdf5-labels-io:Napari插件可将Napari项目存储在.h5文件中。标签层以稀疏表示形式存储（COO列表）