8/29/97 DRAFT: Parallel Computer Architecture
5
addressed aspects of parallel computing for some time. However, the new convergence with com-
modity technology suggests that these aspects may need to be reexamined and perhaps addressed
in very general terms. The traditional boundaries between hardware, operating system, and user
program are also shifting in the context of parallel computing, where communication, schedul-
ing, sharing, and resource management are intrinsic to the program.
Applications areas, such as computer graphics and multimedia, scientific computing, computer
aided design, decision support and transaction processing, are all likely to see a tremendous
transformation as a result of the vast computing power available at low cost through parallel com-
puting. However, developing parallel applications that are robust and provide good speed-up
across current and future multiprocessors is a challenging task, and requires a deep understand-
ing of forces driving parallel computers. The book seeks to provide this understanding, but also
to stimulate the exchange between the applications fields and computer architecture, so that bet-
ter architectures can be designed --- those that make the programming task easier and perfor-
mance more robust.
Organization of the Book
The book is organized into twelve chapters. Chapter 1 begins with the motivation why parallel
architectures are inevitable based on technology, architecture, and applications trends. It then
briefly introduces the diverse multiprocessor architectures we find today (shared-memory, mes-
sage-passing, data parallel, dataflow, and systolic), and it shows how the technology and architec-
tural trends tell a strong story of convergence in the field. The convergence does not mean the end
to innovation, but on the contrary, it implies that we will now see a time of rapid progress in the
field, as designers start talking
to
each other rather than
past
each other. Given this convergence,
the last portion of the chapter introduces the fundamaental design issues for multiprocessors:
naming, synchronization, latency, and bandwidth. These four issues form an underlying theme
throughout the rest of this book. The chapter ends with a historical perspective, poviding a
glimpse into the diverse and rich history of the field.
Chapter 2 provides a brief introduction to the process of parallel programming and what are the
basic components of popular programming models. It is intended to ensure that the reader has a
clear understanding of hardware/software trade-offs, as well as what aspects of performance can
be addressed through architectural means and what aspects much be addressed either by the com-
piler or the programmer in providing to the hardware a well designed parallel program. The anal-
ogy in sequential computing is that architecture cannot transform an algorithm into an
algorithm, but it can improve the average access time for common memory reference
patterns. The brief discussion of programming issues is not likely to turn you into an expert par-
allel programmer, if you are not already, but it will familiarize you with the issues and what pro-
grams look like in various programming models. Chapter 3 outlines the basic techniques used in
programming for performance and presents a collection of application case studies, that serve as
a basis for quantitative evaluation of design trade-offs throughout the book.
Chapter 4 takes up the challenging task of performing a solid empirical evaluation of design
trade-offs. Architectural evaluation is difficult even for modern uni-processors where we typi-
cally look at variations in pipeline design or memory system design against a fixed set of pro-
grams. In parallel architecture we have many more degrees of freedom to explore, the
interactions between aspects of the design are more profound, the interactions between hardware
6
DRAFT: Parallel Computer Architecture 8/29/97
and software are more significant and of a wider scope. In general, we are looking at performance
as the machine and the program scale. There is no way to scale one without the other example.
Chapter 3 discusses how scaling interacts with various architectural parameters and presents a set
of benchmarks that are used throughout the later chapters.
Chapters 5 and 6 provide a complete understanding of the bus-based multiprocessors, SMPs, that
form the bread-and-butter of modern commercial machines beyond the desktop, and even to
some extent on the desktop. Chapter 5 presents the logical design of “snooping” bus protocols
which ensure that automatically replicated data is conherent across multiple caches. This chapter
provides an important discussion of memory consistency models, which allows us to come to
terms with what shared memory really means to algorithm designers. It discusses the spectrum of
design options and how machines are optimized against typical reference patterns occuring in
user programs and in the operating system. Given this conceptual understanding of SMPs, it
reflects on implications for parallel programming.
Chapter 6 examines the physical design of bus-based multiprocessors. Itdigs down into the engi-
neering issues that arise in supporting modern microprocessors with multilevel caches on modern
busses, which are highly pipelined. Although some of this material is contained in more casual
treatments of multiprocessor architecture, the presentation here provides a very complete under-
standing of the design issues in this regime. It is especially important because these small-scale
designs form a building block for large-scale designs and because many of the concepts will
reappear later in the book on a larger scale with a broader set of concerns.
Chapter 7 presents the hardware organization and architecture of a range of machines that are
scalable to large or very large configurations. The key organizational concept is that of a network
transaction, analogous to the bus transaction, which is the fundamental primitive for the designs
in Chapters 5 and 6. However, in large scale machines the global information and global arbitra-
tion of small-scale designs is lost. Also, a large number of transactions can be outstanding. We
show how conventional programming models are realized in terms of network transactions and
then study a spectrum of important design points, organized according to the level of direct hard-
ware interpretation of the network transaction, including detailed case studies of important com-
mercial machines.
Chapter 8 puts the results of the previous chapters together to demonstrate how to realize a global
shared physical address space with automatic replication on a large scale. It provides a complete
treatment of directory based cache coherence protocols and hardware design alternatives.
Chapter 9 examines a spectrum of alternatives that push the boundaries of possible hardware/
software trade-offs to obtain higher performance, to reduce hardware complexity, or both. It
looks at relaxed memory consistency models, cache-only memory architectures, and software
based cache coherency. This material is currently in the transitional phase from academic
research to commercial product at the time of writing.
Chapter 10 addresses the design of scable high-performance communication networks, which
underlies all the large scale machines discussed in previous chapters, but was deferred in order to
complete our understanding of the processor, memory system, and network interface design
which drive these networks. The chapter builds a general framework for understanding where
hardware costs, transfer delays, and bandwidth restrictions arise in networks. We then look at a
variety of trade-offs in routing techniques, switch design, and interconnection topology with
评论0