硬件视点：软件黑客的内存屏障详解

需积分: 5 135 浏览量更新于2024-07-15 收藏 342KB PDF 举报

《内存屏障：硬件视角下的软件黑客指南》是一篇发表于2010年8月的研究论文，作者是Paul McKenney，他在IBM的Linux Technology Center工作。该文章探讨了在多处理器系统（SMP）环境下，为何CPU设计者会引入内存屏障这一硬件机制，以及这对软件设计者，特别是那些关注实时性和同步编程的软件工程师来说意味着什么。内存屏障，或者称为内存一致性协议，是处理器设计中的一个重要特性，其目的是为了确保多线程程序在多核系统中执行时，内存访问的有序性。在现代多处理器架构中，由于每个核心可能有独立的缓存，如果没有适当的内存屏障，可能会出现竞态条件和数据不一致问题，这将导致程序行为不可预测，对并发程序的正确性构成威胁。文章指出，内存屏障的存在是为了防止内存重新排序（reordering），即允许CPU根据性能优化调整内存访问的顺序，但这可能导致依赖于内存顺序的操作（如读写锁、信号量等同步原语）无法按预期执行。因此，为了维护正确的程序逻辑，程序员需要在适当的地方插入内存屏障，以强制内存操作按照程序的逻辑顺序进行。作者Paul McKenney通过硬件视角深入剖析了内存屏障的工作原理，并分享了其在实时Linux（Real-Time Linux View project）和Linux Real-Time View project等项目中的应用。他强调了理解内存屏障对于编写高效且健壮的多线程软件的重要性，特别是在实时和并发环境中，正确地使用内存屏障可以避免潜在的性能瓶颈和错误。这篇论文不仅提供了理论解释，还为软件开发者提供了一份实用的指南，帮助他们理解和使用内存屏障来管理多核环境中的并发行为。截至2014年5月，该文章已被引用31次，阅读量达到2,422次，显示了其在软件开发领域的影响力。如果你正在处理与多线程编程、并发控制或内存一致性相关的项目，这篇文章将为你揭示内存屏障背后的硬件原理，从而提升代码的可靠性和性能。同时，对于硬件开发者和系统架构师来说，理解内存屏障也是评估和优化多处理器系统的关键知识。

Transition (d): The CPU does an atomic read-

modify-write operation on a data item that was

not present in its cache. It transmits a “read

invalidate”, receiving the data via a “read re-

sponse”. The CPU can complete the transition

once it has also received a full set of “invalidate

acknowledge” responses.

Transition (e): The CPU does an atomic read-

modify-write operation on a data item that was

previously read-only in its cache. It must trans-

mit “invalidate” messages, and must wait for a

full set of “invalidate acknowledge” responses be-

fore completing the transition.

Transition (f): Some other CPU reads the cache

line, and it is supplied from this CPU’s cache,

which retains a read-only copy, possibly also

writing it back to memory. This transition is

initiated by the reception of a “read” message,

and this CPU responds with a “read response”

message containing the requested data.

Transition (g): Some other CPU reads a data item

in this cache line, and it is supplied either from

this CPU’s cache or from memory. In either case,

this CPU retains a read-only copy. This tran-

sition is initiated by the reception of a “read”

message, and this CPU responds with a “read re-

sponse” message containing the requested data.

Transition (h): This CPU realizes that it will soon

need to write to some data item in this cache

line, and thus transmits an “invalidate” message.

The CPU cannot complete the transition until

it receives a full set of “invalidate acknowledge”

responses. Alternatively, all other CPUs eject

this cache line from their caches via “writeback”

messages (presumably to make room for other

cache lines), so that this CPU is the last CPU

caching it.

Transition (i): Some other CPU does an atomic

read-modify-write operation on a data item in a

cache line held only in this CPU’s cache, so this

CPU invalidates it from its cache. This transi-

tion is initiated by the reception of a “read in-

validate” message, and this CPU r esponds with

both a “read response” and an “invalidate ac-

knowledge” message.

Transition (j): This CPU does a store to a data

item in a cache line that was not in its cache,

and thus transmits a “read invalidate” message.

The CPU cannot complete the transition until it

receives the “read response” and a full set of “in-

validate acknowledge” messages. The cache line

will presumably transition to “modiﬁed” state

via transition (b) as soon as the actual store com-

pletes.

Transition (k): This CPU loads a data item in

a cache line that was not in its cache. The

CPU transmits a “read” message, and completes

the transition upon receiving the corresponding

“read response”.

Transition (l): Some other CPU does a store to a

data item in this cache line, but holds this cache

line in read-only state due to its being held in

other CPUs’ caches (such as the current CPU’s

cache). This transition is initiated by the recep-

tion of an “invalidate” message, and this CPU

responds with an “invalidate acknowledge” mes-

sage.

Quick Quiz 4: How does the hardware handle the

delayed transitions described above?

2.4 MESI Protocol Example

Let’s now look at this from the p ers pective of a cache

line’s worth of data, initially residing in memory at

address 0, as it travels through the various single-line

direct-mapped caches in a four-CPU system. Table 1

shows this ﬂow of data, with the ﬁrst column show-

ing the sequence of operations, the second the CPU

performing the operation, the third the operation be-

ing performed, the next four the state of each CPU’s

cache line (memory address followed by MESI state),

and the ﬁnal two columns whether the corresponding

memory contents are up to date (“V”) or not (“I”).

Initially, the CPU cache lines in which the data

would reside are in the “invalid” state, and the data

is valid in memory. When CPU 0 loads the data at

剩余28页未读，继续阅读

边城水手

粉丝: 113
资源: 35

硬件视点：软件黑客的内存屏障详解

memory-barriers.pdf

Memory-barriers.pdf

Memory Barriers: a Hardware View for Software Hackers

E:\bird\bird\bird\mainscene.cpp:82: error: no match for 'operator+' (operand types are 'QRect' and 'int') if (m_bird.m_X+ m_bird.m_Rect.width() >= m_barriers[i-1].m_X && m_bird.m_X <= m_bird.m_Rect + m_barriers[i-1].m_Rect.width() && ^

free_and_open_source_software_for_development.pdf

Memory Barriers: a Hardware View for Software Hackers 讲解内存屏障的好论文，推荐！

XF_barriers

二抽取代码MATLAB-Objective_material_barriers_to_the_transport_of_momentum_an

最新资源