串并行算法解析：硬件、软件与效率

需积分: 10 86 浏览量更新于2024-07-16 1 收藏 8.22MB PDF 举报

"《Algorithms Parallel and Sequential》是由Umut A. Acar和Guy E. Blelloch编写的关于串行与并行算法的教材，涵盖了并行硬件、软件、工作量、跨度、并行时间等多个核心概念，并通过基因测序问题深入探讨了算法设计与实现。" 在计算机科学领域，理解和掌握串行与并行算法是至关重要的。本书首先介绍了并行计算的基础知识，包括两个主要方面：并行硬件和并行软件。并行硬件涉及多处理器系统、分布式内存架构以及GPU等加速设备，这些设备能够同时处理多个计算任务，提高整体性能。并行软件则关注如何利用这些硬件资源，通过编程模型（如OpenMP、MPI等）实现并行化。接着，书中深入讨论了衡量并行算法效率的两个关键指标：工作量（Work）和跨度（Span）。工作量是指执行算法所需的基本操作数量，而跨度则是完成所有操作的最长时间。通过分析工作量和跨度，可以评估算法的并行时间和效率，为优化算法提供指导。在算法设计和实现部分，作者强调了算法规格说明、数据结构规格、问题定义和实施策略的重要性。算法规格说明应清晰地描述算法的目标和行为；数据结构规格涉及如何存储和组织数据以支持高效的计算；问题定义明确我们需要解决的具体问题；而实施策略则关注如何将这些规格转化为实际的代码。以基因测序为例，该书详细解释了这一生物信息学问题的背景、不同的测序方法，以及如何将这些方法转化为计算问题。基因测序问题的复杂性在于处理大量序列数据，以及理解序列间的相互关系。通过分析这个问题的结构，可以设计出针对特定问题的高效并行算法。《Algorithms Parallel and Sequential》是一本深入探讨并行计算理论和实践的教材，适合对并行算法感兴趣的读者，无论他们是初学者还是专业人士，都能从中获得宝贵的见解和技能，以应对日益增长的计算需求。

16 CONTENTS

53 Implementing Dynamic Programming 447

1 Top-Down Method: Memoization . . . . . . . . . . . . . . . . . . . . . . . . . 447

2 Bottom-Up Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

54 Optimal Binary Search Trees 453

2 CHAPTER 1. INTRODUCTION

and at many different scales starting with parallelism in the nano-circuits that implement

individual instructions, and working the way up to parallel systems that occupy large data

centers.

1.1 Parallel Hardware

Multicore Chips. Since the early 2000s hardware manufacturers have been placing mul-

tiple processing units, often called “cores”, onto a single chip. These cores can be general

purpose processors, or more special purpose processors, such as those found in Graphics

Processing Units (GPUs). Each core can run in parallel with the others. Today (in year

2018), multicore chips are used in essentially all computing devices ranging from mobile

phones to desktop computers and servers.

Large-Scale Parallelism. At the larger scale, many computers can be connected by a net-

work and used together to solve large problems. For example, when you perform a simple

search on the Internet, you engage a data center with thousands of computers in some part

of the world, likely near your geographic location. Many of these computers (perhaps as

many as hundreds, if not thousands) take up your query and sift through data to give you

an accurate response as quickly as possible. Because each computer can itself be parallel

(e.g., built with multicore chips), the scale of parallelism can be quite large, e.g., in the

thousands.

Fundamental Reasons for Why Parallelism Matters. There are several reasons for why

parallelism has become prevalent over the past decade.

First, parallel computing is simply faster than sequential computing. This is important,

because many tasks must be completed quickly to be of use. For example, to be useful, an

Internet search should complete in “interactive speeds” (usually below 100 milliseconds).

Similarly, a weather-forecast simulation is essentially useless if it cannot be completed in

time.

The second reason is efﬁciency in terms of energy usage. Due to basic physics, perform-

ing a computation twice as fast sequentially requires eight times as much energy (energy

consumption is a cubic function of clock frequency). With parallelism we don’t need to

use more energy than sequential computation, because energy is determined by the total

amount of computation (work).

These two factors—time and energy—have become increasingly important in the last decade.

Example 1.1. Using two parallel computers, we can perform a computation in half the

time of a sequential computer (operating at the same speed). To this end, we need to di-

vide the computation into two parallel sub-computations, perform them in parallel and

1. PARALLELISM 3

combine their results. This can require as little as half the time as the sequential computa-

tion. Because the total computation that we must do remains the same in both sequential

and parallel cases, the total energy consumed is also the same.

The above reasoning holds in theory. In practice, there are overheads to parallelism: the

speedup will be less than two-fold and more energy will be needed. For example, divid-

ing the computation and combining the results could lead to additional overhead. Such

overhead usually diminishes as the degree of parallelism increases but not always.

Example 1.2. As is historically popular in explaining algorithms, we can establish an anal-

ogy between parallel algorithms and cooking. As in a kitchen with multiple cooks, in

parallel algorithms you can do things in parallel for faster turnaround time. For example,

if you want to prepare 3 dishes with a team of cooks you can do so by asking each cook to

prepare one. Doing so will often be faster that using one cook. But there are some over-

heads, for example, the work has to be divided as evenly as possible. Obviously, you also

need more resources, e.g., each cook might need their own cooking pan.

Example 1.3 (Comparison to Sequential). One way to quantify the advantages or paral-

lelism is to compare its performance to sequential computation. The table below illustrates

the sort of performance improvements that can achieved today. These timings are taken

on a 32 core commodity server machine. In the table, the sequential timings use sequential

algorithms while the parallel timings use parallel algorithms. Notice that the speedup for

the parallel 32 core version relative to the sequential algorithm ranges from approximately

12 (minimum spanning tree) to approximately 32 (sorting).

Application Sequential Parallel Parallel

P = 1 P = 32

Sort 10

strings 2.9 2.9 .095

Remove duplicates for 10

strings .66 1.0 .038

Minimum spanning tree for 10

edges 1.6 2.5 .14

Breadth ﬁrst search for 10

edges .82 1.2 .046

1.2 Parallel Software

Challenges of Parallel Software. It would be convenient to use sequential algorithms on

parallel computers, but this does not work well because parallel computing requires a dif-

ferent way of organizing the computation. The fundamental difference is that in parallel

algorithms, computations must actually be independent to be performed in parallel. By in-

dependent we mean that computations do not depend on each other. Thus when designing

a parallel algorithm, we have to identify the underlying dependencies in the computation

and avoid creating unnecessary dependencies. This design challenge is an important focus

of this book.

Example 1.4. Going back to our cooking example, suppose that we want to make a frittata

in our kitchen with 4 cooks. Making a frittata is not easy. It involves cleaning and chopping

vegetables, beating eggs, sauteeing, as well as baking. For the frittata to be good, the cooks

4 CHAPTER 1. INTRODUCTION

must follow a speciﬁc receipe and pay attention to the dependencies between various tasks.

For example, vegetables cannot be sauteed before they are washed, and the eggs cannot be

ﬁsked before they are broken!

Coding Parallel Algorithms. Another important challenge concerns the implementation

and use of parallel algorithms in the real world. The many forms of parallelism, ranging

from small to large scale, and from general to special purpose, have led to many different

programming languages and systems for coding parallel algorithms. These different pro-

gramming languages and systems often target a particular kind of hardware, and even a

particular kind of problem domain. As it turns out, one can easily spend weeks or even

months optimizing a parallel sorting algorithm on speciﬁc parallel hardware, such as a

multicore chip, a GPU, or a large-scale massively parallel distributed system.

Maximizing speedup by coding and optimizing an algorithm is not the goal of this book.

Instead, our goal is to cover general design principles for parallel algorithms that can be

applied in essentially all parallel systems, from the data center to the multicore chips on

mobile phones. We will learn to think about parallelism at a high-level, learning general

techniques for designing parallel algorithms and data structures, and learning how to ap-

proximately analyze their costs. The focus is on understanding when things can run in

parallel, and when not due to dependencies. There is much more to learn about paral-

lelism, and we hope you continue studying this subject.

Example 1.5. There are separate systems for coding parallel numerical algorithms on shared

memory hardware, for coding graphics algorithms on Graphical Processing Units (GPUs),

and for coding data-analytics software on a distributed system. Each such system tends to

have its own programming interface, its own cost model, and its own optimizations, mak-

ing it practically impossible to take a parallel algorithm and code it once and for all for all

possible applications. Indeed, it can require a signiﬁcant effort to implement even a simple

algorithm and optimize it to run well on a particular parallel system.

2 Work, Span, Parallel Time

This section describes the two measures—work and span—that we use to analyze algo-

rithms. Together these measures capture both the sequential time and the parallelism avail-

able in an algorithm. We typically analyze both of these asymptotically, using for example

the big-O notation.

2.1 Work and Span

Work. The work of an algorithm corresponds to the total number of primitive operations

performed by an algorithm. If running on a sequential machine, it corresponds to the

sequential time. On a parallel machine, however, work can be divided among multiple

processors and thus does not necessarily correspond to time.

剩余471页未读，继续阅读

小小小小鸟^0^

粉丝: 0

串并行算法解析：硬件、软件与效率

Algorithm Design: Parallel and Sequential

Algorithm Design Parallel and Sequential 2017.9.pdf

Parallel.Iterative.Algorithms.-.From.Sequential.to.Grid.Computing

lihang_algorithms/data/train_binary.csv

cannot import name 'hamiltonian_path' from 'networkx.algorithms.approximation' (C:\Users\86182\anaconda3\lib\site-packages\networkx\algorithms\approximation\__init__.py)

No module named 'surprise.prediction_algorithms.asymmetric_algo'

mask-rcnn_r101_fpn_1x_coco.py

PKCS115_Cipher.decrypt() missing 1 required positional argument: 'sentinel'

最新资源

cannot import name 'hamiltonian_path' from 'networkx.algorithms.approximation' (C:\Users\86182\anaconda3\lib\site-packages\networkx\algorithms\approximation\init.py)