高性能异构计算：英文原版深度解析

High

Performance

需积分: 10 201 浏览量更新于2024-07-21 收藏 2.09MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"High Performance Heterogeneous Computing" 本书是" Wiley Series on Parallel and Distributed Computing"系列中的一本，由Albert Y. Zomaya担任系列编辑。该书深入探讨了高性能异构计算的设计与实现策略，主要面向微电子领域的读者。异构计算是指在同一个系统中集成不同类型的处理器和硬件架构，如CPU、GPU、FPGA等，以实现更高效能和能效比。在"High Performance Heterogeneous Computing"中，作者可能详细讨论了以下几个关键知识点： 1. 异构计算的基础：介绍异构计算的概念，包括其优势和挑战，以及为何在高性能计算领域越来越重要。这可能涵盖了如何通过混合使用不同处理器类型来优化特定工作负载的执行效率。 2. 系统设计：书中可能会涵盖异构系统的设计原则，包括如何分配任务给不同的处理器单元，如何构建高效的通信网络，以及如何考虑功耗和散热等因素。 3. 并行算法：高性能计算通常涉及到并行算法的使用。书中可能详细介绍了适用于异构环境的并行算法，如数据并行、任务并行和混合并行，以及如何针对特定硬件进行优化。 4. 软件框架与编程模型：可能涉及各种编程模型，如OpenMP、CUDA、OpenCL等，这些模型使得开发者能够更有效地利用异构硬件资源。此外，书中可能还讨论了跨平台编程和接口抽象的问题。 5. 性能评估与优化：书中可能包含对异构系统性能的评估方法，如基准测试和性能建模。同时，会讲解如何通过分析和调试来优化代码，提升系统的整体性能。 6. 应用案例：为了更好地理解理论概念，作者可能会提供实际应用案例，如在科学计算、图像处理、机器学习等领域中的异构计算实践。 7. 未来趋势与挑战：随着技术的不断进步，书中可能会展望异构计算的未来发展方向，如量子计算的融合、边缘计算的应用，以及在处理大数据和人工智能任务时面临的挑战。 "High Performance Heterogeneous Computing"这本书为读者提供了一个全面了解和掌握高性能异构计算的平台，无论是对于研究人员、工程师还是学生，都是一个宝贵的参考资料，帮助他们在微电子和高性能计算领域取得更深的理解和实践能力。

资源详情

资源推荐

4 HETEROGENEOUS PLATFORMS AND THEIR USES

Homogeneous distributed memory multiprocessor systems are designed for

high - performance parallel computing and are typically used to run a relatively

small number of similar parallel applications.

The property of homogeneity is easy to break and may be quite expensive

to keep. Any distributed memory multiprocessor system will become hetero-

geneous if it allows several independent users to simultaneously run their

applications on the same set of processors. The point is that, in this case, dif-

ferent identical processors may have different workloads, and hence demon-

strate different performances for different runs of the same application

depending on external computations and communications.

Clusters of commodity processors are seen as cheap alternatives to very

expensive vendor homogeneous distributed memory multiprocessor systems.

However, they have many hidden costs required to maintain their homogene-

ity. First, they cannot be used as multitasking computer systems, allowing

several independent users to simultaneously run their applications on the

same set of processors. Such a usage immediately makes them heterogeneous

because of the dynamic change of the performance of each particular proces-

sor. Second, to maintain the homogeneity over time, a full replacement of the

system would be required, which can be quite expensive .

Thus, distributed memory multiprocessor systems are naturally heteroge-

neous, and the property of heterogeneity is an intrinsic property of the over-

whelming majority of such systems.

In addition to platforms, which are heterogeneous by nature, one interesting

trend is heterogeneous hardware designed by vendors for high - performance

computing. The said heterogeneous design is mainly motivated by applications

and will be brieﬂ y outlined in the next section.

Now we would like to classify the platforms in the increasing order of het-

erogeneity and complexity and brieﬂ y characterize each heterogeneous system.

The classes are

•

vendor - designed heterogeneous systems,

•

heterogeneous clusters,

•

local networks of computers (LNCs),

•

organizational global networks of computers, and

•

general - purpose global networks of computers .

1.2 VENDOR-DESIGNED HETEROGENEOUS SYSTEMS

Heterogeneous computing has seen renewed attention with such examples as

the general programming of graphical processing units (GPUs), the Clear

Speed (ClearSpeed, 2008 , Bristol, UK) Single Instruction Multiple Data

(SIMD) attached accelerator, and the IBM (Armonk, NY) Cell architecture

(Gschwind et al. , 2006 ).

There has been a marked increase in interest in heterogeneous computing

for high performance. Spawned in part by the signiﬁ cant performances

VENDOR-DESIGNED HETEROGENEOUS SYSTEMS 5

demonstrated by special - purpose devices such as GPUs, the idea of ﬁ nding

ways to leverage these industry investments for more general - purpose techni-

cal computing has become enticing, with a number of projects mostly in the

academia as well as some work in national laboratories. However, the move

toward heterogeneous computing is driven by more than the perceived oppor-

tunity of “ low - hanging fruit. ” Cray Inc . has described a strategy based on their

XT3 system (Vetter et al. , 2006 ), derived from Sandia National Laboratories ’

Red Storm . Such future systems using an AMD Opteron - based and mesh -

interconnected Massively Parallel Processing (MPP) structure will provide the

means to support accelerators such as a possible future vector - based processor,

or even possibly Field Programmable Gate Arrays (FPGA) devices. The start -

up company ClearSpeed has gained much interest in their attached array

processor using a custom SIMD processing chip that plugs in to the PCI - X

slot of otherwise conventional motherboards. For compute - intensive applica-

tions, the possibilities of a one to two order of magnitude performance increase

with as little as a 10 - W power consumption increase is very attractive.

Perhaps the most exciting advance has been the long - awaited Cell archi-

tecture from the partnership of IBM, Sony, and Toshiba (Fig. 1.1 ). Cell com-

bines the attributes of both multicore and heterogeneous computing. Designed,

at least in part, as the breakthrough component to revolutionize the gaming

industry in the body of the Sony Playstation 3, both IBM and much of the

community look to this part as a major leap in delivered performance. Cell

MFC LS

SPE

MFC LS

SPE

MFC LS

SPE

MFC LS

SPE

MFC LS

SPE

MFC LS

SPE

MFC LS

SPE

MFC

I/O

Interface

Coherent

Interface

XDR DRAM

Interface

EIB

PPE

SPE

Figure 1.1. The IBM Cell, a heterogeneous multicore processor, incorporates one

power processing element (PPE) and eight synergistic processing elements (SPEs).

(Figure courtesy of Mercury Computer Systems, Inc. )

6 HETEROGENEOUS PLATFORMS AND THEIR USES

incorporates nine cores, one general - purpose PowerPC architecture and eight

special - purpose “ synergistic processing element (SPE) ” processors that

emphasize 32 - bit arithmetic, with a peak performance of 204 gigaﬂ op/s in 32 -

bit arithmetic per chip at 3.2 GHz.

Heterogeneous computing, like multicore structures, offer possible new

opportunities in performance and power efﬁ ciency but impose signiﬁ cant,

perhaps even daunting, challenges to application users and software designers.

Partitioning the work among parallel processors has proven hard enough, but

having to qualify such partitioning by the nature of the work performed and

employing multi - instruction set architecture (ISA) environments aggravates

the problem substantially. While the promise may be great, so are the problems

that have to be resolved. This year has seen initial efforts to address these

obstacles and garner the possible performance wins. Teaming between Intel

and ClearSpeed is just one example of new and concerted efforts to accom-

plish this. Recent work at the University of Tennessee applying an iterative

reﬁ nement technique has demonstrated that 64 - bit accuracy can achieve eight

times the performance of the normal 64 - bit mode of the Cell architecture by

exploiting the 32 - bit SPEs (Buttari et al. , 2007 ).

Japan has undertaken an ambitious program: the “ Kei - soku ” project to

deploy a 10 - petaﬂ ops scale system for initial operation by 2011. While the

planning for this initiative is still ongoing and the exact structure of the system

is under study, key activities are being pursued with a new national High Per-

formance Computing (HPC) Institute being established at RIKEN (2008) .

Technology elements being studied include various aspects of interconnect

technologies, both wire and optical, as well as low - power device technologies,

some of which are targeted to a 0.045 - μ m feature size. NEC, Fujitsu, and

Hitachi are providing strong industrial support with academic partners , includ-

ing University of Tokyo, Tokyo Institute of Technology, University of Tsukuba,

and Keio University among others. The actual design is far from certain, but

there are some indications that a heterogeneous system structure is receiving

strong consideration, integrating both scalar and vector processing compo-

nents, possibly with the addition of special - purpose accelerators such as the

MD - Grape (Fukushige et al. , 1996 ). With a possible budget equivalent to over

US$1 billion (just under 1 billion euros) and a power consumption of 36 MW

(including cooling), this would be the most ambitious computing project yet

pursued by the Asian community, and it is providing strong leadership toward

inaugurating the Petaﬂ ops Age (1 – 1000 petaﬂ ops).

1.3 HETEROGENEOUS CLUSTERS

A heterogeneous cluster (Fig. 1.2 ) is a dedicated system designed mainly for

high - performance parallel computing, which is obtained from the classical

homogeneous cluster architecture by relaxing one of its three key properties,

thus leading to the situation wherein :

剩余283页未读，继续阅读

ryanchi

粉丝: 1
资源: 1

高性能异构计算：英文原版深度解析

High Performance Computing 无水印原版pdf

Heterogeneous Computing with OpenCL 2.0 3rd pdf

Heterogeneous Computing with OpenCL 2011

heterogeneous graph attention network

heterogeneous graph neural network

heterogeneous graph neural network for recommendation

什么是HITOC 技术（Heterogeneous Integration Technology on Chip）

give me a summary of Homo sapiens heterogeneous nuclear ribonucleoprotein D

HERALD: OPTIMIZING HETEROGENEOUS DNN ACCELERATORS FOR EDGE DEVICES

opencl 2.0 异构计算

heterogeneous graph

”Heterogeneous Interactive Snapshot Network for Review-Enhanced Stock Profiling and Recommendation“你能给我介绍下这篇文章吗

FedProto: Federated Prototype Learning across Heterogeneous Clients

heterogeneous graph transformer

Image denoising using non-local means and wavelet shrinkage

能列举一下USENIX ATC、HPCA、ASPLOS、OSDI、NSDI、EuroSys会议中有关操作系统的新算法吗

讲讲DASH: Dynamic Scheduling Algorithm for SingleISA Heterogeneous Nano-scale Many-Cores技术和优缺点

有关半导体照明microLED发表在Top期刊中的期刊，要求在2020年后发表的，至少25篇，注明期刊出处，并用中文简单介绍一下它们的内容

amba总线规范3.0中文

写一段使用python中的econml库构建因果森林模型，并计算处理效应异质性的代码

最新资源