内存瓶颈：现代CPU为何饥饿及解决之道

4 浏览量更新于2024-08-25 收藏 258KB PDF 举报

"为什么现代CPU面临饥饿问题及解决方法 - 弗朗西斯科·阿尔特德 (StarvingCPUs-CISE-2010)- 计算机科学" 这篇由IEEE CS和AIP联合出版的文章《为什么现代CPU正在挨饿以及如何解决这个问题》由弗朗西斯科·阿尔特德撰写，探讨了计算机科学与工程领域中的一个重要现象：CPU性能的增长速度已经超过了内存速度。这种趋势在1和2中得到了充分的记录，表明CPU的性能提升远超内存性能，导致当前的CPU在处理数据时出现了“饥饿”现象，因为内存I/O已经成为性能瓶颈。过去，处理器和内存的速度是同步发展的。例如，在20世纪80年代初，内存时钟访问速度约为1MHz，而CPU和内存速度协同提升，到80年代末达到了16MHz。然而，到了90年代初，CPU和内存速度开始出现分歧：内存速度的提升趋于平缓，而CPU的时钟速率持续飙升至100MHz以上甚至更高。这种不平衡的发展导致CPU的能力远超内存的性能。作者以一个100MHz的处理器为例，强调了当CPU的运算能力显著超过内存读取数据的速度时，CPU将不得不等待内存提供数据，这大大降低了系统的整体效率。这种情况被称为“CPU饥饿”。文章可能深入讨论了这个问题的根源，包括内存技术的物理限制、CPU设计的进步以及由此产生的架构挑战。同时，作者可能提出了几种缓解这一问题的策略，如改进内存访问技术、使用更高效的数据缓存策略、优化软件设计以减少内存访问，或者发展新的计算模型，比如异步计算或分布式内存系统。此外，可能还讨论了未来发展趋势，如高速互连技术的进步、非易失性内存（NVM）的引入以及硬件和软件之间的协同设计，这些都有可能打破现有的性能瓶颈，确保CPU和内存之间能有更好的平衡，以提高整体系统性能。这篇文章对理解现代计算机系统中的性能挑战以及如何通过技术创新来解决这些问题提供了深入的洞察，对于计算机科学家、工程师和IT专业人员来说，都是极具价值的参考资料。

S CIENTI F IC P ROGRA MMI NG

Editors: Konstantin Läufer, laufer@cs.luc.edu

Konrad Hinsen, hinsen@cnrs-orleans.fr

Why Modern CPUs Are stArving

And WhAt CAn Be done ABoUt it

By Francesc Alted

well-documented trend shows

that CPU speeds are in-

creasing at a faster rate than

memory speeds.

1,2

Indeed, CPU per-

formance has now outstripped mem-

ory performance to the point that

current CPUs are starved for data,

as memory I/O becomes the perfor-

mance bottleneck.

This hasn’t always been the case.

Once upon a time, processor and

memory speeds evolved in parallel.

For example, memory clock access in

the early 1980s was at approximately

1 MHz, and memory and CPU speeds

increased in tandem to reach speeds

of 16 MHz by decade’s end. By the

early 1990s, however, CPU and mem-

ory speeds began to drift apart: mem-

ory speed increases began to level off,

while CPU clock rates continued to

skyrocket to 100 MHz and beyond.

It wasn’t too long before CPU capa-

bilities began to substantially outstrip

memory performance. Consider this: a

100 MHz processor consumes a word

from memory every 10 nanoseconds

in a single clock tick. This rate is im-

possible to sustain even with present-

day RAM, let alone with the RAM

available when 100 MHz processors

were state of the art. To address this

mismatch, commodity chipmakers in-

troduced the rst on-chip cache.

But CPUs didn’t stop at 100 MHz; by

the start of the new millennium, pro-

cessor speeds reached unparalleled ex-

tremes, hitting the magic 1 GHz gure.

As a consequence, a huge abyss opened

between the processors and the memory

subsystem: CPUs had to wait up to 50

clock ticks for each memory read or

write operation.

During the early and middle 2000s,

the strong competition between Intel

and AMD continued to drive CPU

clock cycles faster and faster (up to 4

GHz). Again, the increased impedance

mismatch with memory speeds forced

vendors to introduce a second-level

cache in CPUs. In the past ve years,

the size of this second-level cache

grew rapidly, reaching 12 Mbytes in

some instances.

Vendors started to realize that they

couldn’t keep raising the frequency

forever, however, and thus dawned

the multicore age. Programmers be-

gan scratching their heads, wondering

how to take advantage of those shiny

new and apparently innovative multi-

core machines. Today, the arrival of

Intel i7 and AMD Phenom makes

four-core on-chip CPUs the most

common conguration. Of course,

more processors means more demand

for data, and vendors thus introduced

a third-level cache.

So, here we are today: memory la-

tency is still much greater than pro-

cessor clock step (around 150 times

greater or more) and has become an

essential bottleneck over the past 20

years. Memory throughput is improv-

ing at a better rate than its latency,

but it’s also lagging behind processors

(about 25 times slower). The result is

that current CPUs are suffering from

serious starvation: they’re capable of

consuming (much!) more data than

the system can possibly deliver.

The Hierarchical

Memory Model

Why, exactly, can’t we improve mem-

ory latency and bandwidth to keep

up with CPUs? The main reason is

cost: it’s prohibitively expensive to

manufacture commodity SDRAM

that can keep up with a modern pro-

cessor. To make memory faster, we

need motherboards with more wire

layers, more complex ancillary logic,

and (most importantly) the ability to

run at higher frequencies. This addi-

tional complexity represents a much

higher cost, which few are willing to

pay. Moreover, raising the frequency

implies pushing more voltage through

the circuits. This causes the energy

consumption to quickly skyrocket and

more heat to be generated, which re-

quires huge coolers in user machines.

That’s not practical.

To cope with memory bus limita-

tions, computer architects introduced

a hierarchy of CPU memory caches.

Such caches are useful because they’re

closer to the processor (normally in

the same die), which improves both la-

tency and bandwidth. The faster they

run, however, the smaller they must

be due mainly to energy dissipation

problems. In response, the industry

CPUs spend most of their time waiting for data to arrive. Identifying low-level bottlenecks—and how to

ameliorate them—can save hours of frustration over poor performance in apparently well-written programs.

CISE-12-2-ScientificPro.indd 68 2/8/10 2:23:25 PM

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38545961

粉丝: 4
资源: 963

内存瓶颈：现代CPU为何饥饿及解决之道

FAST - Fast Architecture Sensitive Tree Search on Modern CPUs and GPUs - Slides-计算机科学

Index Search Algorithms for Databases and Modern CPUs - Florian Gloss (Nov 2010)-计算机科学

Fast Sort on CPUs, GPUs and Intel MIC Architectures - Technical Report - Intel Labs (intel-labs-radix-sort-mic-report)-计算机科学

SLIDE - Accelerating SLIDE Deep Learning on Modern CPUs - Vectorization, Quantizations, Memory Optimizations, and More - 2021 (2103.10891)-计算机科学

Agner Fog - Microarchitecture of Intel, AMD and VIA CPUs - An optimization guide for assembly programmers and compiler makers (2018-04-09)-计算机科学

Performance Analysis and Tuning on Modern CPUs

The Microarchitecture of Intel, AMD and VIA CPUs - An Optimization Guide for Assembly Programmers and Compiler Makers (2018)-计算机科学

Improving the speed of neural networks on CPUs (37631)-计算机科学

Efficiently Compiling Efficient Query Plans for Modern Hardware - 2011 (p539-neumann)-计算机科学

Windows Kernel Internals - Traps, Interrupts and Exceptions-计算机科学

最新资源