多线程架构详解：提升计算机性能的关键策略

computer

architecture

5星 · 超过95%的资源需积分: 12 135 浏览量更新于2024-07-19 收藏 1.13MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

《多线程架构》是Mark D. Hill担任系列编辑的计算机体系结构专题系列图书中的一部，由Mario Nemirovsky和Dean M. Tullsen共同撰写，由C&Morgan Claypool Publishers出版，并列入SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE系列。该书于2012年发行，专注于探讨计算机架构中的一个重要议题——多线程技术。在当今信息技术飞速发展的时代，多线程（Multithreading）已经成为提高系统性能的关键手段。它允许一个处理器或硬件同时执行多个独立的任务或线程，通过并发处理来优化资源利用率和响应时间。在《多线程架构》这本书中，作者深入剖析了多线程设计的原则、实现机制以及其对计算机体系结构的影响。内容涵盖了以下核心知识点： 1. **多线程理论基础**：书中详细介绍了多线程的基本概念，包括并发性、线程调度、死锁、活锁等问题，以及它们如何与操作系统内核、处理器架构紧密关联。 2. **处理器支持**：讨论了现代处理器如何通过硬件层面的特性（如超标量执行、超线程技术、硬件线程等）来支持多线程执行，以及这些特性对系统性能的影响。 3. **内存管理和同步**：多线程环境下，内存访问的并发性可能导致数据一致性问题，因此书中会讲解线程安全的数据结构、内存模型和同步原语，如互斥锁、信号量、条件变量等。 4. **性能分析与优化**：通过案例研究和性能模型，作者提供了评估和优化多线程系统性能的方法，包括线程间通信开销、上下文切换成本和负载均衡策略等。 5. **硬件与软件协同**：阐述了如何在设计和实现多线程系统时平衡硬件和软件的优化，包括编译器优化、运行时环境支持和编程语言特性对多线程性能的影响。 6. **应用实例与挑战**：书中可能会提供实际的工业界案例，展示多线程技术在数据库、网络服务器、图形渲染等领域的应用，以及面临的挑战和未来发展趋势。《多线程架构》作为一本深度讲解计算机体系结构中多线程技术的教材或参考书，为读者提供了全面理解并应用于现代计算机设计和优化的重要资源。无论是研究人员、工程师还是学生，都可以从中获取宝贵的理论知识和实践经验，以提升系统的并发性能和效率。

资源详情

资源推荐

xiv PREFACE

their direction and advice, and especially their patience. We would like to thank Burton Smith and

Jim Smith for signiﬁcant conversations early in the process. James Laudon and Mark Hill provided

extensive feedback and suggestions. Michael Taylor and Simha Sethumadhavan provided speciﬁc

suggestions on content. Several others helped proofread and provide further content suggestions,

including Leo Porter, John Seng, Manish Arora, Vasileios Kontorinis, Eric Tune, Kathryn Tullsen,

Vasilis Karakostas, and Damian Rocca. We would also like to thank our loving wives Nancy Tullsen

and Laura Nemirovsky for their continuous help and support. A special thanks goes to Mario’s son,

Daniel Nemirovsky, who motivated him on this endeavor and for his invaluable help in editing and

proofreading his writing.

We would especially like to thank our many co-authors, co-workers, and co-designers over

the years, so many of whom increased our knowledge and understanding of these topics, and thereby

inﬂuenced this book in signiﬁcant ways.

Mario Nemirovsky and Dean M. Tullsen

December 2012

CHAPTER 1

Introduction

A multithreaded architecture is one in which a single processor has the ability to follow multiple

streams of execution without the aid of software context switches. If a conventional processor (Fig-

ure 1.1(a)) wants to stop executing instructions from one thread and begin executing instructions

from another thread, it requires software to dump the state of the r unning thread into memory,

select another thread, and then load the state of that thread into the processor. That would typically

require many thousands of cycles, particularly if the operating system is invoked. A multithreaded

architecture (Figure 1.1(b)), on the other hand, can access the state of multiple threads in, or near,

the processor core. This allows the multithreaded architecture to quickly switch between threads,

and potentially utilize the processor resources more efﬁciently and effectively.

In order to achieve this, a multithreaded architecture must be able to store the state of multiple

threads in hardware—we refer to this storage as hardware contexts, where the number of hardware

contexts supported deﬁnes the level of multithreading (the number of threads that can share the

processor without software intervention).The state of a thread is primarily composed of the program

counter (PC), the contents of general purpose registers, and special purpose and program status

registers. It does not include memory (because that remains in place), or dynamic state that can be

rebuilt or retained between thread invocations (branch predictor, cache, or TLB contents).

Hardwaremultithreading is beneﬁcial when there is a mismatch between the hardware support

for instruction level parallelism (ILP) and the le vel of ILP in an executing thread. More generally, if

we want to also inc lude scalar machines, it addresses the gap between the peak hardware bandwidth

and the achieved software throughput of one thread. Multithreading addresses this gap because

it allows multiple threads to share the processor, making execution resources that a single thread

would not use available to other threads. In this way, insufﬁcient instruction level parallelism is

supplemented by exploiting thread level parallelism (TLP). Just as a moving company strives to

never send a moving truck cross country with a partial load, but instead groups multiple households

onto the same truck, a processor that is not being fully utilized by the software represents a wasted

resource and an opportunity cost.

Hardware multithreading virtualizes the processor, because it makes a processor that might

otherwise look little different than a traditional single core appear to software as a multiprocessor.

When we add a second hardware context to a processor, a thread running on this context appears

to be running on a virtual core that has the hardware capabilities of the original core minus those

resources being used by the ﬁrst thread, and vice versa.Thus, by constructing additional virtual cores

out of otherwise unused resources, we can achieve the performance of a multiple-core processor at

a fraction of the area and implementation cost. But this approach also creates challenges, as the

2 1. INTRODUCTION

regs

CPU CPU

instruction stream

(a) Conventional Processor (b) Multithreaded Processor

instruction stream

Figure 1.1: A conventional processor compared with a multithreaded processor.

resources available to each virtual core are far more dynamic than in a single-thread, conventional

processor, as they depend on the cycle-by-cycle behavior of co-resident threads.

Because multithreading ultimately seeks to bridge the gap between hardware parallelism and

software parallelism, the history of multithreading is closely tied to the advances that have increased

that gap.Early computers were relatively well balanced,and did not experience large utilization gaps,

except when waiting for I/O or long-term storage (e.g., disk). Those latencies were large enough

that they could be hidden by software context switches (software multithreading) once time-sharing

was introduced.

As computational logic speeds advanced more quickly than memory speeds, we began to see

processors idle for short periods waiting for memory. This gap was bridged signiﬁc antly by caches,

but some memory accesses remained. Computational diversity (the need to support both simple and

complex operations) also contributed to this gap.To maximize throughput, you might set your cycle

time around an integer add operation, and therefore wait many cycles when executing a ﬂoating

point multiply.

Microprogrammed machines were rarely idle, as the core churned through the microcode

even for long latency operations; however, the introduction of pipelined processors meant that even

a scalar processor could stall (incur bubbles or empty pipeline stages) on short-latency dependent

operations. Superscalar processors exacerbated the problem, as even single-cycle latencies could

introduce bubbles, preventing the processor from fully exploiting the width of the pipeline each

cycle.

This book explores several models of multithreading. These models differ, in part, by which

of these sources of the hardware/software gap they can tolerate or hide. We say a machine “hides" or

“tolerates" a particular type of latency if the machine can continue to do productive work even while

experiencing such a latency-causing event. Coarse-grain multithreaded processors directly execute one

thread at a time, but can switch contexts quickly, in a matter of a few cycles. This allows them to

switch to the execution of a new thread to hide long latencies (such as memory accesses), but they

are less effective at hiding short latencies. Fine-grain multithreaded processors can context switch

every cycle with no delay. This allows them to hide even short latencies by interleaving instructions

from different threads while one thread is stalled. However, this processor cannot hide single-cycle

latencies. A simultaneous multithreaded processor can issue instructions from multiple threads in the

same cycle, allowing it to ﬁll out the full issue width of the processor, even when one thread does

not have sufﬁcient ILP to use the entire issue bandwidth.

Modern architectures experience many sources of latency. These include data cache misses

to nearby caches (short latency) or all the way to memory (long latency), instruction cache misses,

instruction dependencies both short (integer add) and long (ﬂoating point divide), branch mispre-

dictions,TLB misses, communication delays between processors, etc. Architects spend a signiﬁcant

amount of effort trying to either reduce or hide each of these sources of latency. The beauty of

multithreading is that it is a general latency tolerant solution. It provides a single solution to all of

these sources of latency. The only catch is that it requires the existence of thread level parallelism.

Thus, given sufﬁcient TLP, multithreading can hide virtually any source of latency in the processor.

This book describes the design and architecture of multithreaded processors, both those pro-

posed in research and those implemented in commercial systems. A number of multithreaded ex-

ecution models are described in Chapter 2, while the following three chapters detail the primary

models of multithreading – coarse-grain multithreading, ﬁne-grain multithreading, and simultane-

ous multithreading. Chapter 6 examines various sources of contention between threads and effor ts

to manage them. Chapter 7 describes new opportunities enabled by multithreaded architectures.

Chapter 8 examines the challenge of accurately measuring and modeling multithreaded systems.

Lastly, Chapter 9 describes a number of implementations of multithreading, primarily designs from

industry and a few academic machines.

剩余110页未读，继续阅读

reasly168

粉丝: 1
资源: 2

多线程架构详解：提升计算机性能的关键策略

Multithreading

multithreading instance

Hardware multithreading的几种实现方式

multithreading C#

simultaneous multithreading

"UA_MULTITHREADING" "GREATER_EQUAL" "100"

What is multithreading?

python multithreading

Provide at least three benefits of multithreaded programming.

ffmpeg threads

请帮我搜索关于多线程数字信号处理的教程

vmaf threads

多线程MT和多线程MD区别

Java implements

介绍java特性，详细一点

r7 3700x和r7 3800x

我想做个java分享，可以帮我列一下分享目录么

java implements

请列出微机原理中所有专有名词

matlab gui点击按钮两次后运行时间变长

最新资源