没有合适的资源?快使用搜索试试~ 我知道了~
首页Parallel Computer Architecture - A hardware software approach 并行计算机架构 软硬件方法
资源详情
资源评论
资源推荐
8/28/97 DRAFT: Parallel Computer Architecture
1
Parallel Computer Architecture
A Hardware / Software Approach
David Culler
University of California, Berkeley
Jaswinder Pal Singh
Princeton University
with Anoop Gupta
Stanford University
Morgan Kaufmann is pleased to present material from a preliminary draft of Parallel
Computer Architecture; the material is (c) Copyright 1997 Morgan Kaufmann Publishers. This
material may not be used or distributed for any commercial purpose without the express
written consent of Morgan Kaufmann Publishers. Please note that this material is a draft of
forthcoming publication, and as such neither Morgan Kaufmann nor the authors can be held
liable for changes or alterations in the final edition.
2
DRAFT: Parallel Computer Architecture 8/28/97
8/29/97 DRAFT: Parallel Computer Architecture
3
Preface
Morgan Kaufmann is pleased to present material from a preliminary draft of Parallel Computer Architecture; the
material is (c) Copyright 1997 Morgan Kaufmann Publishers. This material may not be used or distributed for any
commercial purpose without the express written consent of Morgan Kaufmann Publishers. Please note that this
material is a draft of forthcoming publication, and as such neither Morgan Kaufmann nor the authors can be held
liable for changes or alterations in the final edition.
Motivation for the Book
Parallel computing is a critical component of the computing technology of the 90s, and it is likely
to have as much impact over the next twenty years as microprocessors have had over the past
twenty. Indeed, the two technologies are closely linked, as the evolution of highly integrated
microprocessors and memory chips is making multiprocessor systems increasingly attractive.
Already multiprocessors represent the high performance end of almost every segment of the
computing market, from the fastest supercomputers, to departmental compute servers, to the indi-
vidual desktop. In the past, computer vendors employed a range of technologies to provide
increasing performance across their product line. Today, the same state-of-the-art microprocessor
is used throughout. To obtain a significant range of performance, the simplest approach is to
increase the number of processors, and the economies of scale makes this extremely attractive.
Very soon, several processors will fit on a single chip.
4
DRAFT: Parallel Computer Architecture 8/29/97
Although parallel computing has a long and rich academic history, the close coupling with com-
modity technology has fundamentally changed the discipline. The emphasis on radical architec-
tures and exotic technology has given way to quantitative analysis and careful engineering trade-
offs. Our goal in writing this book is to equip designers of the emerging class of multiprocessor
systems, from modestly parallel personal computers to massively parallel supercomputers, with
an understanding of the fundamental architectural issues and the available techniques for
addressing design trade-offs. At the same time, we hope to provide designers of software systems
for these machines with an understanding of the likely directions of architectural evolution and
the forces that will determine the specific path that hardware designs will follow.
The most exciting recent development in parallel computer architecture is the convergence of tra-
ditionally disparate approaches, namely shared-memory, message-passing, SIMD, and dataflow,
on a common machine structure. This is driven partly by common technological and economic
forces, and partly by a better understanding of parallel software. This convergence allows us to
focus on the overriding architectural issues and to develop a common framework in which to
understand and evaluate architectural trade-offs. Moreover, parallel software has matured to the
point where the popular parallel programming models are available on a wide range of machines
and meaningful benchmarks exists. This maturing of the field makes it possible to undertake a
quantitative, as well as qualitative study of hardware/software interactions. In fact, it demands
such an approach. The book follows a set of issues that are critical to all parallel architectures –
communication latency, communication bandwidth, and coordination of cooperative work -
across the full range of modern designs. It describes the set of techniques available in hardware
and in software to address each issue and explores how the various techniques interact. Case
studies provide a concrete illustration of the general principles and demonstrate specific interac-
tions between mechanisms.
Our final motivation comes from the current lack of an adequate text book for our own courses at
Stanford, Berkeley, and Princeton. Many existing text books cover the material in a cursory fash-
ion, summarizing various architectures and research results, but not analyzing them in depth.
Others focus on specific projects, but fail to recognize the principles that carry over to alternative
approaches. The research reports in the area provide sizable body of empirical data, but it has not
yet been distilled into a coherent picture. By focusing on the salient issues in the context of the
technological convergence, rather than the rich and varied history that brought us to this point,
we hope to provide a deeper and more coherent understanding of the field.
Intended Audience
We believe the subject matter of this book is core material and should be relevant to graduate stu-
dents and practicing engineers in the fields of computer architecture, systems software, and appli-
cations. The relevance for computer architects is obvious, given the growing importance of
multiprocessors. Chip designers must understand what constitutes a viable building block for
multiprocessor systems, while computer system designers must understand how best to utilize
modern microprocessor and memory technology in building multiprocessors.
Systems software, including operating systems, compilers, programming languages, run-time
systems, performance debugging tools, will need to address new issues and will provide new
opportunities in parallel computers. Thus, an understanding of the evolution and the forces guid-
ing that evolution is critical. Researchers in compilers and programming languages have
8/29/97 DRAFT: Parallel Computer Architecture
5
addressed aspects of parallel computing for some time. However, the new convergence with com-
modity technology suggests that these aspects may need to be reexamined and perhaps addressed
in very general terms. The traditional boundaries between hardware, operating system, and user
program are also shifting in the context of parallel computing, where communication, schedul-
ing, sharing, and resource management are intrinsic to the program.
Applications areas, such as computer graphics and multimedia, scientific computing, computer
aided design, decision support and transaction processing, are all likely to see a tremendous
transformation as a result of the vast computing power available at low cost through parallel com-
puting. However, developing parallel applications that are robust and provide good speed-up
across current and future multiprocessors is a challenging task, and requires a deep understand-
ing of forces driving parallel computers. The book seeks to provide this understanding, but also
to stimulate the exchange between the applications fields and computer architecture, so that bet-
ter architectures can be designed --- those that make the programming task easier and perfor-
mance more robust.
Organization of the Book
The book is organized into twelve chapters. Chapter 1 begins with the motivation why parallel
architectures are inevitable based on technology, architecture, and applications trends. It then
briefly introduces the diverse multiprocessor architectures we find today (shared-memory, mes-
sage-passing, data parallel, dataflow, and systolic), and it shows how the technology and architec-
tural trends tell a strong story of convergence in the field. The convergence does not mean the end
to innovation, but on the contrary, it implies that we will now see a time of rapid progress in the
field, as designers start talking
to
each other rather than
past
each other. Given this convergence,
the last portion of the chapter introduces the fundamaental design issues for multiprocessors:
naming, synchronization, latency, and bandwidth. These four issues form an underlying theme
throughout the rest of this book. The chapter ends with a historical perspective, poviding a
glimpse into the diverse and rich history of the field.
Chapter 2 provides a brief introduction to the process of parallel programming and what are the
basic components of popular programming models. It is intended to ensure that the reader has a
clear understanding of hardware/software trade-offs, as well as what aspects of performance can
be addressed through architectural means and what aspects much be addressed either by the com-
piler or the programmer in providing to the hardware a well designed parallel program. The anal-
ogy in sequential computing is that architecture cannot transform an algorithm into an
algorithm, but it can improve the average access time for common memory reference
patterns. The brief discussion of programming issues is not likely to turn you into an expert par-
allel programmer, if you are not already, but it will familiarize you with the issues and what pro-
grams look like in various programming models. Chapter 3 outlines the basic techniques used in
programming for performance and presents a collection of application case studies, that serve as
a basis for quantitative evaluation of design trade-offs throughout the book.
Chapter 4 takes up the challenging task of performing a solid empirical evaluation of design
trade-offs. Architectural evaluation is difficult even for modern uni-processors where we typi-
cally look at variations in pipeline design or memory system design against a fixed set of pro-
grams. In parallel architecture we have many more degrees of freedom to explore, the
interactions between aspects of the design are more profound, the interactions between hardware
O n
2
( )
O n nlog( )
6
DRAFT: Parallel Computer Architecture 8/29/97
and software are more significant and of a wider scope. In general, we are looking at performance
as the machine and the program scale. There is no way to scale one without the other example.
Chapter 3 discusses how scaling interacts with various architectural parameters and presents a set
of benchmarks that are used throughout the later chapters.
Chapters 5 and 6 provide a complete understanding of the bus-based multiprocessors, SMPs, that
form the bread-and-butter of modern commercial machines beyond the desktop, and even to
some extent on the desktop. Chapter 5 presents the logical design of “snooping” bus protocols
which ensure that automatically replicated data is conherent across multiple caches. This chapter
provides an important discussion of memory consistency models, which allows us to come to
terms with what shared memory really means to algorithm designers. It discusses the spectrum of
design options and how machines are optimized against typical reference patterns occuring in
user programs and in the operating system. Given this conceptual understanding of SMPs, it
reflects on implications for parallel programming.
Chapter 6 examines the physical design of bus-based multiprocessors. Itdigs down into the engi-
neering issues that arise in supporting modern microprocessors with multilevel caches on modern
busses, which are highly pipelined. Although some of this material is contained in more casual
treatments of multiprocessor architecture, the presentation here provides a very complete under-
standing of the design issues in this regime. It is especially important because these small-scale
designs form a building block for large-scale designs and because many of the concepts will
reappear later in the book on a larger scale with a broader set of concerns.
Chapter 7 presents the hardware organization and architecture of a range of machines that are
scalable to large or very large configurations. The key organizational concept is that of a network
transaction, analogous to the bus transaction, which is the fundamental primitive for the designs
in Chapters 5 and 6. However, in large scale machines the global information and global arbitra-
tion of small-scale designs is lost. Also, a large number of transactions can be outstanding. We
show how conventional programming models are realized in terms of network transactions and
then study a spectrum of important design points, organized according to the level of direct hard-
ware interpretation of the network transaction, including detailed case studies of important com-
mercial machines.
Chapter 8 puts the results of the previous chapters together to demonstrate how to realize a global
shared physical address space with automatic replication on a large scale. It provides a complete
treatment of directory based cache coherence protocols and hardware design alternatives.
Chapter 9 examines a spectrum of alternatives that push the boundaries of possible hardware/
software trade-offs to obtain higher performance, to reduce hardware complexity, or both. It
looks at relaxed memory consistency models, cache-only memory architectures, and software
based cache coherency. This material is currently in the transitional phase from academic
research to commercial product at the time of writing.
Chapter 10 addresses the design of scable high-performance communication networks, which
underlies all the large scale machines discussed in previous chapters, but was deferred in order to
complete our understanding of the processor, memory system, and network interface design
which drive these networks. The chapter builds a general framework for understanding where
hardware costs, transfer delays, and bandwidth restrictions arise in networks. We then look at a
variety of trade-offs in routing techniques, switch design, and interconnection topology with
剩余877页未读,继续阅读
ComputeGeneral
- 粉丝: 1
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- 27页智慧街道信息化建设综合解决方案.pptx
- 计算机二级Ms-Office选择题汇总.doc
- 单链表的插入和删除实验报告 (2).docx
- 单链表的插入和删除实验报告.pdf
- 物联网智能终端项目设备管理方案.pdf
- 如何打造品牌的模式.doc
- 样式控制与页面布局.pdf
- 武汉理工Java实验报告(二).docx
- 2021线上新品消费趋势报告.pdf
- 第3章 Matlab中的矩阵及其运算.docx
- 基于Web的人力资源管理系统的必要性和可行性.doc
- 基于一阶倒立摆的matlab仿真实验.doc
- 速运公司物流管理模式研究教材
- 大数据与管理.pptx
- 单片机课程设计之步进电机.doc
- 大数据与数据挖掘.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1