ParallelComputerArchitecture-Ahardwaresoftwareapproach并行计算机架构软硬件方法

4星 · 超过85%的资源需积分: 21 120 浏览量更新于2023-03-16 评论 1 收藏 4.8MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

8/29/97 DRAFT: Parallel Computer Architecture

Preface

Morgan Kaufmann is pleased to present material from a preliminary draft of Parallel Computer Architecture; the

commercial purpose without the express written consent of Morgan Kaufmann Publishers. Please note that this

material is a draft of forthcoming publication, and as such neither Morgan Kaufmann nor the authors can be held

liable for changes or alterations in the ﬁnal edition.

Motivation for the Book

Parallel computing is a critical component of the computing technology of the 90s, and it is likely

to have as much impact over the next twenty years as microprocessors have had over the past

twenty. Indeed, the two technologies are closely linked, as the evolution of highly integrated

microprocessors and memory chips is making multiprocessor systems increasingly attractive.

Already multiprocessors represent the high performance end of almost every segment of the

computing market, from the fastest supercomputers, to departmental compute servers, to the indi-

vidual desktop. In the past, computer vendors employed a range of technologies to provide

increasing performance across their product line. Today, the same state-of-the-art microprocessor

is used throughout. To obtain a signiﬁcant range of performance, the simplest approach is to

increase the number of processors, and the economies of scale makes this extremely attractive.

Very soon, several processors will ﬁt on a single chip.

DRAFT: Parallel Computer Architecture 8/29/97

Although parallel computing has a long and rich academic history, the close coupling with com-

modity technology has fundamentally changed the discipline. The emphasis on radical architec-

tures and exotic technology has given way to quantitative analysis and careful engineering trade-

offs. Our goal in writing this book is to equip designers of the emerging class of multiprocessor

systems, from modestly parallel personal computers to massively parallel supercomputers, with

an understanding of the fundamental architectural issues and the available techniques for

addressing design trade-offs. At the same time, we hope to provide designers of software systems

for these machines with an understanding of the likely directions of architectural evolution and

the forces that will determine the speciﬁc path that hardware designs will follow.

The most exciting recent development in parallel computer architecture is the convergence of tra-

ditionally disparate approaches, namely shared-memory, message-passing, SIMD, and dataﬂow,

on a common machine structure. This is driven partly by common technological and economic

forces, and partly by a better understanding of parallel software. This convergence allows us to

focus on the overriding architectural issues and to develop a common framework in which to

understand and evaluate architectural trade-offs. Moreover, parallel software has matured to the

point where the popular parallel programming models are available on a wide range of machines

and meaningful benchmarks exists. This maturing of the ﬁeld makes it possible to undertake a

quantitative, as well as qualitative study of hardware/software interactions. In fact, it demands

such an approach. The book follows a set of issues that are critical to all parallel architectures –

communication latency, communication bandwidth, and coordination of cooperative work -

across the full range of modern designs. It describes the set of techniques available in hardware

and in software to address each issue and explores how the various techniques interact. Case

studies provide a concrete illustration of the general principles and demonstrate speciﬁc interac-

tions between mechanisms.

Our ﬁnal motivation comes from the current lack of an adequate text book for our own courses at

Stanford, Berkeley, and Princeton. Many existing text books cover the material in a cursory fash-

ion, summarizing various architectures and research results, but not analyzing them in depth.

Others focus on speciﬁc projects, but fail to recognize the principles that carry over to alternative

approaches. The research reports in the area provide sizable body of empirical data, but it has not

yet been distilled into a coherent picture. By focusing on the salient issues in the context of the

technological convergence, rather than the rich and varied history that brought us to this point,

we hope to provide a deeper and more coherent understanding of the ﬁeld.

Intended Audience

We believe the subject matter of this book is core material and should be relevant to graduate stu-

dents and practicing engineers in the ﬁelds of computer architecture, systems software, and appli-

cations. The relevance for computer architects is obvious, given the growing importance of

multiprocessors. Chip designers must understand what constitutes a viable building block for

multiprocessor systems, while computer system designers must understand how best to utilize

modern microprocessor and memory technology in building multiprocessors.

Systems software, including operating systems, compilers, programming languages, run-time

systems, performance debugging tools, will need to address new issues and will provide new

opportunities in parallel computers. Thus, an understanding of the evolution and the forces guid-

ing that evolution is critical. Researchers in compilers and programming languages have

8/29/97 DRAFT: Parallel Computer Architecture

addressed aspects of parallel computing for some time. However, the new convergence with com-

modity technology suggests that these aspects may need to be reexamined and perhaps addressed

in very general terms. The traditional boundaries between hardware, operating system, and user

program are also shifting in the context of parallel computing, where communication, schedul-

ing, sharing, and resource management are intrinsic to the program.

Applications areas, such as computer graphics and multimedia, scientiﬁc computing, computer

aided design, decision support and transaction processing, are all likely to see a tremendous

transformation as a result of the vast computing power available at low cost through parallel com-

puting. However, developing parallel applications that are robust and provide good speed-up

across current and future multiprocessors is a challenging task, and requires a deep understand-

ing of forces driving parallel computers. The book seeks to provide this understanding, but also

to stimulate the exchange between the applications ﬁelds and computer architecture, so that bet-

ter architectures can be designed --- those that make the programming task easier and perfor-

mance more robust.

Organization of the Book

The book is organized into twelve chapters. Chapter 1 begins with the motivation why parallel

architectures are inevitable based on technology, architecture, and applications trends. It then

brieﬂy introduces the diverse multiprocessor architectures we ﬁnd today (shared-memory, mes-

sage-passing, data parallel, dataﬂow, and systolic), and it shows how the technology and architec-

tural trends tell a strong story of convergence in the ﬁeld. The convergence does not mean the end

to innovation, but on the contrary, it implies that we will now see a time of rapid progress in the

ﬁeld, as designers start talking

each other rather than

past

each other. Given this convergence,

the last portion of the chapter introduces the fundamaental design issues for multiprocessors:

naming, synchronization, latency, and bandwidth. These four issues form an underlying theme

throughout the rest of this book. The chapter ends with a historical perspective, poviding a

glimpse into the diverse and rich history of the ﬁeld.

Chapter 2 provides a brief introduction to the process of parallel programming and what are the

basic components of popular programming models. It is intended to ensure that the reader has a

clear understanding of hardware/software trade-offs, as well as what aspects of performance can

be addressed through architectural means and what aspects much be addressed either by the com-

piler or the programmer in providing to the hardware a well designed parallel program. The anal-

ogy in sequential computing is that architecture cannot transform an algorithm into an

algorithm, but it can improve the average access time for common memory reference

patterns. The brief discussion of programming issues is not likely to turn you into an expert par-

allel programmer, if you are not already, but it will familiarize you with the issues and what pro-

grams look like in various programming models. Chapter 3 outlines the basic techniques used in

programming for performance and presents a collection of application case studies, that serve as

a basis for quantitative evaluation of design trade-offs throughout the book.

Chapter 4 takes up the challenging task of performing a solid empirical evaluation of design

trade-offs. Architectural evaluation is difﬁcult even for modern uni-processors where we typi-

cally look at variations in pipeline design or memory system design against a ﬁxed set of pro-

grams. In parallel architecture we have many more degrees of freedom to explore, the

interactions between aspects of the design are more profound, the interactions between hardware

O n

( )

O n nlog( )

DRAFT: Parallel Computer Architecture 8/29/97

and software are more signiﬁcant and of a wider scope. In general, we are looking at performance

as the machine and the program scale. There is no way to scale one without the other example.

Chapter 3 discusses how scaling interacts with various architectural parameters and presents a set

of benchmarks that are used throughout the later chapters.

Chapters 5 and 6 provide a complete understanding of the bus-based multiprocessors, SMPs, that

form the bread-and-butter of modern commercial machines beyond the desktop, and even to

some extent on the desktop. Chapter 5 presents the logical design of “snooping” bus protocols

which ensure that automatically replicated data is conherent across multiple caches. This chapter

provides an important discussion of memory consistency models, which allows us to come to

terms with what shared memory really means to algorithm designers. It discusses the spectrum of

design options and how machines are optimized against typical reference patterns occuring in

user programs and in the operating system. Given this conceptual understanding of SMPs, it

reﬂects on implications for parallel programming.

Chapter 6 examines the physical design of bus-based multiprocessors. Itdigs down into the engi-

neering issues that arise in supporting modern microprocessors with multilevel caches on modern

busses, which are highly pipelined. Although some of this material is contained in more casual

treatments of multiprocessor architecture, the presentation here provides a very complete under-

standing of the design issues in this regime. It is especially important because these small-scale

designs form a building block for large-scale designs and because many of the concepts will

reappear later in the book on a larger scale with a broader set of concerns.

Chapter 7 presents the hardware organization and architecture of a range of machines that are

scalable to large or very large conﬁgurations. The key organizational concept is that of a network

transaction, analogous to the bus transaction, which is the fundamental primitive for the designs

in Chapters 5 and 6. However, in large scale machines the global information and global arbitra-

tion of small-scale designs is lost. Also, a large number of transactions can be outstanding. We

show how conventional programming models are realized in terms of network transactions and

then study a spectrum of important design points, organized according to the level of direct hard-

ware interpretation of the network transaction, including detailed case studies of important com-

mercial machines.

Chapter 8 puts the results of the previous chapters together to demonstrate how to realize a global

shared physical address space with automatic replication on a large scale. It provides a complete

treatment of directory based cache coherence protocols and hardware design alternatives.

Chapter 9 examines a spectrum of alternatives that push the boundaries of possible hardware/

software trade-offs to obtain higher performance, to reduce hardware complexity, or both. It

looks at relaxed memory consistency models, cache-only memory architectures, and software

based cache coherency. This material is currently in the transitional phase from academic

research to commercial product at the time of writing.

Chapter 10 addresses the design of scable high-performance communication networks, which

underlies all the large scale machines discussed in previous chapters, but was deferred in order to

complete our understanding of the processor, memory system, and network interface design

which drive these networks. The chapter builds a general framework for understanding where

hardware costs, transfer delays, and bandwidth restrictions arise in networks. We then look at a

variety of trade-offs in routing techniques, switch design, and interconnection topology with

剩余877页未读，继续阅读

maxwell_phys

2018-11-13

非常不错的资源，谢谢!

ComputeGeneral

粉丝: 1
资源: 3

会员权益专享

Parallel Computer Architecture - A hardware software approach 并行...

评论1

会员权益专享

最新资源

Parallel Computer Architecture - A hardware software approach 并行...

评论1

Parallel Computer Architecture.rar

Parallel Computer Architecture： A Hardware / Software Approach

Parallel Computer Architecture - A Hardware Software Approach

parallel-and-high-performance-computing

基于MPI的K-mean问题多线程并行计算,使用c需要编程

如何检验parallel_fro_并行计算的效率

parallelstream 设置并行度

parallelStream并行流

matlab中并行计算

Mysqldump导出表数据支持并行吗

Mysqlpump并行导出单表数据

帮我写一个c++程序，要求如下：esim_tool --model=<model.bin> --input=<ifmap.bin> --output=<ofmap.bin> --infer_order=<depthfirst|breadthfirst|random|parallel> [--dump=dump_dir]

matlab 并行计算工具箱

解码 bash: parallel: 未找到命令...

an introduction to parallel computing(并行程序设计导论 英文版)

matlab安装并行计算工具箱

pester测试框架-Parallel参数怎么使用

oracle并行parallel多个表

使用pytorc搭建并行计算

如何使用并行计算方法求解360阶矩阵行列式

会员权益专享

最新资源

an introduction to parallel computing(并行程序设计导论英文版)