HybridCUDA,OpenMP,andMPIparallelprogrammingonmulticoreGPU_GPUArchitectureandProgramming

4星 · 超过85%的资源需积分: 11 101 浏览量更新于2023-03-03 评论 4 收藏 485KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Computer Physics Communications 182 (2011) 266–269

Contents lists available at ScienceDirect

Computer Physics Communications

www.elsevier.com/locate/cpc

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU

clusters

✩

Chao-Tung Yang

∗

, Chih-Lin Huang, Cheng-Fang Lin

Department of Computer Science, Tunghai University, Taichung City, 40704, Taiwan

article info abstract

Article history:

Received 1 March 2010

Received in revised form 18 June 2010

Accepted 25 June 2010

Availableonline16July2010

Keywords:

CUDA

GPU

MPI

OpenMP

Hybrid

Parallel programming

Nowadays, NVIDIA’s CUDA is a general purpose scalable parallel programming model for writing highly

parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory,

and barrier synchronization. This model has proven quite successful at programming multithreaded many

core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are

already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we

propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which

partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists

of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by

CUDA run by the processor cores in the same computational node.

1. Introduction

Nowadays, NVIDIA’s CUDA [1] is a general purpose scalable

parallel programming model for writing highly parallel applica-

tions. It provides several key abstractions – a hierarchy of thread

blocks, shared memory, and barrier synchronization. This model

has proven quite successful at programming multithreaded many

core GPUs and scales transparently to hundreds of cores: scientists

throughout industry and academia are already using CUDA [1] to

achieve dramatic speedups on production and research codes.

This paper proposes a solution to not only simplify the use

of hardware acceleration in conventional general purpose applica-

tions, but also to keep the application code portable. In this paper,

we propose a parallel programming approach using hybrid CUDA,

OpenMP and MPI [3] programming, which partition loop iterations

according to the performance weighting of multicore [4] nodes in

a cluster. Because iterations assigned to one MPI process are pro-

cessed in parallel by OpenMP threads run by the processor cores in

the same computational node, the number of loop iterations allo-

cated to one computational node at each scheduling step depends

on the number of processor cores in that node.

In this paper, we propose a general approach that uses perfor-

mance functions to estimate performance weights for each node.

To verify the proposed approach, a cluster with hybrid CUDA was

✩

This work is supported in part by the National Science Council, Taiwan, under

grants Nos. NSC 98-2220-E-029-004- and NSC 99-2220-E-029-004-.

Corresponding author. Tel.: +886 4 23590415; fax: +886 4 23591567.

E-mail address: ctyang@thu.edu.tw (C.-T. Yang).

built in our implementation. Empirical results show that in the hy-

brid CUDA clusters environments, the proposed approach improved

performance over all previous schemes.

The rest of this paper is organized as follows. In Section 2,

we introduce several typical and well-known parallel programming

schemes. In Section 3, we deﬁne our model and describe our ap-

proach. Our system conﬁguration is then speciﬁed in Section 4,

and experimental results for three types of application program

are presented. Concluding remarks and future work are given in

Section 5.

2. Background review

2.1. CUDA programming

CUDA (an acronym for Compute Uniﬁed Device Architecture) is

a parallel computing [2] architecture developed by NVIDIA. CUDA

is the computing engine in NVIDIA graphics processing units or

GPUs that is accessible to software developers through industry

standard programming languages. CUDA architecture supports a

range of computational interfaces including OpenGL [9] and Direct

Compute. CUDA’s parallel programming model is designed to over-

come this challenge while maintaining a low learning curve for

programmers familiar with standard programming languages such

as C. At its core are three key abstractions – a hierarchy of thread

groups, shared memories, and barrier synchronization – that are

simply exposed to the programmer as a minimal set of language

extensions.

doi:10.1016/j.cpc.2010.06.035

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余3页未读，立即下载

yongxing510

2012-12-20

论文 2010年的没认真看

rockyrockrr

粉丝: 2
资源: 12

会员权益专享

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore G...

评论7

会员权益专享

最新资源

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore G...

评论7

并行计算 CUDA openMP

Paralle Programming for Multicore and Cluster Systems

RayTracing_Parallel:这个存储库有两个使用 OpenMP 和 CUDA 的并行光线追踪版本

openmp和mpi混合编程

OpenMP、MPI介绍

parallel programming in c with mpi and openmp solution manual

OpenMP和MPI并行化原理和形式有何区别？

visual studio 配置openmp mpi

蒙特卡罗openmp+mpi混合编程求pi

cuda openmp

parallel openmp mpi

parallel programming: concepts and practice pdf

mpi与openmp并行程序设计:c语言版 pdf

python openmp

mpi与openmp并行程序设计 pdf

mpi与openmp并行程序设计

gpu和openmp的区别

mpi与openmp并行程序设计 答案

fortran多核并行计算

fortran openmp

会员权益专享

最新资源

mpi与openmp并行程序设计答案