MPI与Uniﬁed Parallel C结合的混合并行编程

需积分: 10 7 浏览量更新于2024-09-09 收藏 366KB PDF 举报

"这篇文档是关于使用类C语言（如C++）和MPI（Message Passing Interface，消息传递接口）进行混合并行编程的讨论，特别提到了统一并行C（Unified Parallel C，UPC）作为扩展内存访问的一种方法。文章的作者包括James Dinan、Pavan Balaji、Ewing Lusk、P. Sadayappan和Rajeev Thakur，他们分别来自俄亥俄州立大学计算机科学与工程系和阿贡国家实验室的数学与计算机科学部门。" 在现代并行计算中，MPI已经成为最广泛使用的编程模型之一，它允许程序员在分布式内存系统上编写程序，通过进程间的消息传递实现数据通信。然而，每个MPI进程所能够访问的内存受到计算节点本地内存的限制。这就引发了一个问题，当处理大数据量或者需要跨节点共享数据时，单一的MPI模型可能会遇到效率瓶颈。为了解决这个问题，PGAS（Partitioned Global Address Space）模型应运而生。UPC就是这类模型的一个例子，它提供了一种统一的全局地址空间，使得程序员可以像访问本地内存一样方便地访问分布在多个节点上的内存。UPC通过在编程模型中引入全局变量的概念，使得数据能够在不同节点之间无缝共享，从而降低了通信开销，并简化了编程复杂性。在UPC中，程序被划分为一系列的线程，这些线程可以分布在多个处理器上，每个线程都有自己的局部内存，同时也可以访问一个共同的全局地址空间。这种编程方式使得程序员可以更高效地利用硬件资源，尤其是在那些需要大量跨节点数据交互的应用场景中。将MPI与UPC结合，即所谓的“Hybrid Parallel Programming”（混合并行编程），可以充分利用这两种模型的优点。MPI可用于进程间的远距离通信，而UPC则负责节点内的线程同步和数据共享。这种混合方法能够实现更细粒度的并行化，提高性能，并且在处理大规模并行计算任务时具有很好的可扩展性。在实际应用中，开发多线程MPI程序需要注意几个关键点： 1. **进程管理和线程创建**：需要正确初始化MPI环境，并根据需求创建合适的线程数。 2. **通信模式**：选择适当的MPI函数进行进程间通信，如`MPI_Send`和`MPI_Recv`等。 3. **数据同步**：在使用UPC的全局变量时，必须确保线程间的同步，避免数据竞争。 4. **负载均衡**：合理分配工作负载，避免某些线程过载，而其他线程空闲。 5. **错误处理**：编写健壮的错误处理代码，处理可能的通信失败或资源争用问题。理解和掌握如何在MPI和UPC框架下编写多线程程序是提升并行计算效率的关键。这需要深入理解这两种模型的机制，以及它们在解决特定计算问题时的优势和局限性。通过混合并行编程，开发者可以设计出更加灵活和高效的并行解决方案，以应对日益复杂的计算挑战。

Hybrid Parallel Programming with MPI and

Uniﬁed Parallel C

∗

James Dinan

Dept. Comp. Sci. and Eng.

The Ohio State University

2015 Neil Avenue

Columbus, OH U.S.A.

dinan@cse.ohio-state.edu

Pavan Balaji

Math. and Comp. Sci. Division

Argonne National Laboratory

9700 S. Cass Avenue

Argonne, IL U.S.A.

balaji@mcs.anl.gov

Ewing Lusk

Math. and Comp. Sci. Division

Argonne National Laboratory

9700 S. Cass Avenue

Argonne, IL U.S.A.

lusk@mcs.anl.gov

P. Sadayappan

Dept. Comp. Sci. and Eng.

The Ohio State University

2015 Neil Avenue

Columbus, OH U.S.A.

saday@cse.ohio-state.edu

Rajeev Thakur

Math. and Comp. Sci. Division

Argonne National Laboratory

9700 S. Cass Avenue

Argonne, IL U.S.A.

thakur@mcs.anl.gov

ABSTRACT

The Message Passing Interface (MPI) is one of the most widely

used programming models for parallel computing. However, the

amount of memory available t o an MPI process is limited by the

amount of local memory within a compute node. Partitioned Global

Address Space (PGAS) models such as Uniﬁed Parallel C (UPC)

are growing in popularity because of their ability to provide a shared

global address space that spans the memories of multiple compute

nodes. However, taking advantage of UPC can require a large re-

coding effort for existing parallel applications.

In this paper, we explore a new hybrid parallel programming

model that combines MPI and UPC. This model allows MPI pro-

grammers incremental access to a greater amount of memory, en-

abling memory-constrained MPI codes to process larger data sets.

In addition, the hybrid model offers UPC programmers an opportu-

nity to create static U PC groups that are connected over MPI. As we

demonstrate, the use of such groups can signiﬁcantly improve the

scalability of locality-constrained UPC codes. This paper presents

a detailed description of the hybrid model and demonstrates its ef-

fectiveness in two applications: a random access benchmark and

the Barnes-Hut cosmological simulation. Experimental results in-

dicate that the hybrid model can greatly enhance performance; us-

ing hybrid UPC groups that span two cluster nodes, RA perfor-

mance increases by a factor of 1.33 and using groups that span four

cluster nodes, Barnes-Hut experiences a twofold speedup at the ex-

pense of a 2% increase in code size.

∗

This work was supported in part by the Ofﬁce of Advanced Sci-

entiﬁc Computing Research, Ofﬁce of Science, U.S. Department

of Energy under contract D E-AC 02-06CH11357; by the National

Science Foundation under grant #0702182; and by a resource grant

from the Ohio Supercomputer Center.

edges that this contribution was authored or co-authored by an employee,

contractor or afﬁliate of the U.S. Government. As such, the Government re-

tains a nonexclusive, royalty-free right to publish or reproduce this article,

or to allow others to do so, for G overnment purposes only.

CF’10, May 17–19, 2010, Bertinoro, Italy.

Categories and Subject Descriptors

D.1.3 [Programming Techniques]: Concurrent Programming—

Parallel programming; D.3.3 [Programming Languages]: Lan-

guage Constructs and Features—Concurrent programming struc-

tures

General Terms

Design, Languages, Performance

Keywords

MPI, UPC, PGAS, Hybrid Parallel Programming

1. INTRODUCTION

The Message Passing Interface ( MPI) is considered to be the de

facto standard for parallel programming today [11]. The ﬂexible,

feature-rich interface provided by MPI has successfully allowed

many complex scientiﬁc applications to be represented and mapped

efﬁciently to large-scale high-end computing systems. However,

the amount of memory available to an MPI process is li mi ted by

each process’s virtual address space; and, for a variety of scientiﬁc

applications, this space is insufﬁcient t o solve emerging problems.

Many scientiﬁc applications today are written in MPI using a

one-process-per-core model that partitions memory among the cores.

As systems grow, memory per core remains constant or decreases.

Shared memory hybrid parallel programming with MPI and OpenMP

avoids partitioning of memory and, for some applications, provides

access to a large enough amount of memory to simulate increas-

ingly large problems [18]. For many other applications, however,

the memory requirement grows superlinearly w ith problem size.

In particular, the simulation of the phenomena n the nucleus of an

atom via the Green’s function Monte Carlo ( G FMC) method has a

per process memory requirement that grows as 2

· A! in the num-

ber of nucleons [17]. Hybridization of this MPI code with OpenMP

has successfully extended it to simulate carbon-12, which requires

roughly 0.5 GB memory per node. For larger atoms, however, the

per-MPI-process memory requirements quickly exceed the avail-

able memory per node. Thus, a new solution i s needed.

Partitioned global address space (PGAS) models such as the Uni-

ﬁed Parallel C (UPC) [21] are relative newcomers to large-scale sci-

177

下载后可阅读完整内容，剩余8页未读，立即下载

豆子爱吃鱼

粉丝: 0
资源: 1

MPI与Uniﬁed Parallel C结合的混合并行编程

MPI示例源代码(C)

用mpi实现的一个简单程序 求pi值

MPI与OpenMP并行程序设计：C语言版,mpi和openmp混合编程,C,C++

MPI与OpenMP并行程序设计：C语言版,mpi和openmp混合编程,C,C++源码.zip

MPI教程 多线程编程入门

MPI与OpenMP并行程序设计：C语言版

MPI与OpenMP并行程序设计：C语言版.pdf

MPI与OpenMP并行程序设

Windows下多线程计算π示例：MPI与OpenMP编程

MPI并行程序设计入门与实践指南

最新资源

用mpi实现的一个简单程序求pi值

MPI教程多线程编程入门