Intel TBB多核编程实战指南

1星需积分: 35 79 浏览量更新于2024-07-23 1 收藏 2.6MB PDF 举报

"Threading Building Blocks 编程指南" Intel Threading Building Blocks (TBB) 是一个强大的多核编程库，旨在简化并行编程的过程，使开发者能够充分利用现代多处理器和多核心系统的性能。这本书深入浅出地介绍了TBB库的使用，无论是对初学者还是经验丰富的专家都有很高的参考价值。 TBB的核心特性是它的“工作窃取”技术，这种技术源自于1990年代MIT的Cilk系统。工作窃取是一种调度策略，它允许空闲的线程从繁忙的线程那里窃取任务来执行，从而提高了系统整体的并行效率。TBB通过提供一组高级的C++接口，使得开发者无需深入理解底层并发机制，就能编写出高效、可扩展的并行代码。书中的内容不仅涵盖了TBB的基本用法，如任务并行、数据并行、并行算法等，还深入讨论了并行编程中的一些复杂问题，如线程安全、同步、竞态条件、死锁以及资源管理等。对于初学者，书中的实例和逐步指导有助于理解并行编程的基本概念；对于专家，它提供了关于如何在实际项目中有效利用TBB进行性能优化的洞见。例如，书中可能详细讲解了如何使用TBB的任务调度器来创建并执行任务，如何使用并行容器和算法来处理大量数据，以及如何使用栅栏（barrier）和信号量（semaphore）等同步原语来协调多个线程间的操作。此外，还可能讨论了如何利用TBB的动态内存管理工具，如arena分配器，来优化内存分配性能。书中引用了Autodesk的资深软件工程师Martin Watt的观点，他强调TBB允许开发者在确保程序正确性的前提下，一开始就考虑性能优化，这对于大型软件项目如Maya来说至关重要。这表明TBB不仅可以提高代码的运行速度，还能帮助开发者更早地考虑并行化设计，从而避免后期重构的困难。通过阅读这本书，读者将能够掌握TBB的关键功能，并学会如何将其应用到自己的C++项目中，从而实现代码的并行化，提升软件的执行效率。同时，书中提供的实用建议和最佳实践，将帮助开发者规避并行编程中的陷阱，确保程序的稳定性和可靠性。"Threading Building Blocks 编程指南"是一本全面且实用的TBB学习资源，对于想要掌握多核编程的C++开发者来说，无疑是一份宝贵的参考资料。

Note from the Lead Developer of

Intel Threading Building Blocks

Parallel computing has become personal, in both liberating and demanding ways.

I remember using an IBM 1130 mainframe in high school in the 1970s, and how frus-

trating it was because only one person could use the machine at a time, feeding it via

a card reader. Then, in the 1980s, computers became personal, and I had all the time

I wanted to run and debug sequential programs.

Parallel computing has undergone a similar shift. In the 1980s and 1990s, parallel

computers were institutional. They were fascinating to program, but access was lim-

ited. I was fortunate at one point to share a 256-processor nCUBE with only a few

other people. Now, with multi-core chips, every programmer can have cheap access

to a parallel computer—perhaps not with 256 processors yet, but with a growing

number every year.

The downside, of course, is that parallel programming is no longer optional because

parallel computers have become personal for consumers, too. Now parallel program-

ming is mandatory for performance-sensitive applications.

There is no one true way to do parallel programming. Many paradigms have been

proposed and have been cast in the form of new languages, language extensions, and

libraries. One such paradigm defines tasks that run in shared memory. This para-

digm is well suited to achieving parallel speedup on multi-core chips. The key notion

is to separate logical task patterns from physical threads, and to delegate task sched-

uling to the system.

The paradigm has been around for a long time. Intel Threading Building Blocks was

written to evangelize the paradigm, and to provide it off the shelf so that program-

mers would not have to reinvent it (and debug and tune it!).

Threading Building Blocks is strictly a library, not a new language or language exten-

sion. Though new languages and extensions are attractive, they raise a high barrier to

adoption in the near term, particularly in commercial settings, where continuity from

the existing code base is paramount. (Indeed, it is so important that businesses are

still selling systems that are upward-compatible with some of those mainframes from

the 1970s.)

xvi

Note from the Lead Developer of Intel Threading Building Blocks

Starting in 2004, I chaired a study group inside Intel that drafted the initial Thread-

ing Building Blocks proposal. Despite an early commitment to a library-only

solution, we drew much of our inspiration from new languages and extensions for

parallel programming. Our goal was to borrow as many good ideas as we could put

into library form. Sticking to a library was a requirement so that Threading Building

Blocks could slip easily into existing C++ programming environments.

C++ makes the library approach practical because it is designed for writing libraries.

Stroustrup cites a 10X reduction in line count for the Booch components written in

C++ versus Ada. Perhaps C++ will be even more powerful in the future. For exam-

ple, the addition of lambda functions (see Chapter 12) would simplify the mechanics

of using the Threading Building Blocks

parallel_for.

A library-only solution is not perfect. We had to leave out some features that really

require compiler support. For example, data-parallel array operations such as

Fortran 90, ZPL, and NESL were deemed impractical because they rely heavily on

optimizing compilers. There have been some C++ libraries such as POOMA that do

some of the optimizations via template metaprogramming, but the complexity of

such libraries is high. Parallel functional programming is another powerful para-

digm, but alas, it requires significant compiler support.

Several systems were particularly influential in the development of Threading Build-

ing Blocks; these (and others) are listed in the bibliography of this book.

The Chare Kernel (now Charm++) showed the advantages of breaking a program

into many small tasks. In particular, distributing load is simplified. By analogy, it’s a

lot easier to evenly distribute many small objects among the cases instead of a few

large objects.

Cilk showed the power of combining a scheduling technique called task stealing with

recursive tasks. Recursion is often slower than iteration for serial programming, but

it turns out that recursive parallelism has some big advantages over iterative parallel-

ism with respect to load balancing and cache reuse. Cache reuse is critical because

restructuring for cache sometimes improves program speed by 2X or more, possibly

delivering better improvement than multithreading alone. Fortunately, the Cilk

approach tends to steer programmers to solutions that both are parallel and have

good cache behavior.

The C++ Standard Template Library (STL) showed how a library could be both

generic and efficient. As we gain experience, we’re learning how to be more generic.

STAPL showed how to bring generic binding of algorithms to containers into the

parallel world, by substituting the fundamentally sequential STL iterator with paral-

lel recursive ranges (pRanges in STAPL). This enabled parallel algorithms to operate

on parallel containers and opened up the ability to apply parallel recursive ranges to

multidimensional spaces (e.g.,

blocked_range2d), and even reuse (some would say

abuse) them to write a parallel quicksort.

剩余333页未读，继续阅读

a_xue521

粉丝: 0
资源: 3

Intel TBB多核编程实战指南

Building Blocks.dotx

tbb-Intel Threading Building Blocks 线程构建模块

Built-In Building Blocks.dotx

Intel Threading Building Blocks并行编程指南

英特尔并行编程基石：Threading Building Blocks实战指南

Intel Threading Building Blocks：引领并行编程新时代

Intel Threading Building Blocks：引领多核编程新时代

英特尔并行编程利器：Threading Building Blocks详解

入门Intel Threading Building Blocks (TBB)：并行编程新纪元

Intel® Threading Building Blocks (TBB) 使用与实践指南

最新资源