TBTSO：时间受限的TSO内存模型与无_fence非对称同步

PDF格式 | 633KB | 更新于2024-08-25 | 158 浏览量 | 举报

本文档探讨了一种名为“Temporally Bounded Total Store Ordering”（TBTSO）的内存模型，它在解决内存回收和偏斜锁定等非对称同步问题上提供了无锁（fence-free）的解决方案。TBTSO是Total Store Order (TSO)内存模型的一种扩展，通过限制数据从存储缓冲区写入内存的时间，增强了原有模型的性能特性。首先，TBTSO的关键在于它对内存操作时间的限定，这使得系统能够在处理频繁执行的高性能路径（通常涉及数据访问）的同时，提供对偶尔执行但关键的低性能路径（如内存回收和锁定操作）的有效同步。这种时间边界使得设计者能够构建无需使用内存屏障（fence）的算法，从而避免了传统同步机制可能带来的开销和性能瓶颈。在内存回收方面，作者提出了一种无锁版本的Hazard Pointers方案，这是一种在TSO下常见的内存管理技术。通过TBTSO的约束，算法可以在不依赖于全局屏障的情况下，确保内存的正确释放，即使在多线程环境中，也能够保证内存的顺序一致性。对于偏斜锁定，TBTSO允许设计出一种与未管理环境兼容的无锁偏斜锁定算法。传统的偏斜锁定可能依赖于安全点或类似的安全机制，但在TBTSO中，这些机制不再是必需的，从而简化了并发控制，提高了程序的灵活性和性能。此外，论文还讨论了TBTSO在硬件层面的实现可能性。尽管现有的TSO架构可能需要进行适度的修改，但这并不意味着TBTSO是一种无法实现的技术。通过优化现有的存储子系统，包括缓存层次、总线管理和硬件一致性协议，TBTSO可以在保持现有功能的基础上引入所需的性能界限。总结来说，TBTSO内存模型是一个创新的理论框架，它为解决现实世界中的非对称同步问题提供了新的途径，特别是在那些对性能敏感且需要兼顾并发控制和资源管理的场景中。通过限制内存操作的延迟，TBTSO允许开发者设计出更加高效且易于维护的并发代码，有助于提升现代多核处理器系统的整体性能。

interact through a memory subsystem. The memory subsys-

tem contains one FIFO store buffer for each thread and is

protected by a global fair lock. The memory subsystem lock

is used to model atomic read-modify-write operations as be-

ing performed by a thread holding the lock. (For simplic-

ity, we use atomic operations directly throughout this paper.)

The machine also has a global clock (initially 0) readable by

the threads.

The execution of the machine proceeds in time units.

In each time unit the global clock increases by one. Then,

at most one of the following actions can be executed for

each thread T if it is valid to do so under the rules below.

(This does not mean that a valid action must be executed

for T —a scheduler decides the actions in each time unit.

Thus, despite the presence of a global clock, the execution is

asynchronous.)

The following actions are possible only when the memory

subsystem lock is unlocked or held by thread T :

1. The memory subsystem can dequeue T ’s oldest entry

from T ’s store buffer and write it to memory.

2. T may read: If T reads from an address for which a

matching write exists in its store buffer, the read returns

the newest corresponding value stored in the buffer. Oth-

erwise, the read returns the value from memory.

3. T may acquire the memory lock if it does not hold it.

4. T may release the memory lock if it holds the lock and

its store buffer is empty (if T wishes to release the lock

when its store buffer is not empty, the memory subsystem

must ﬁrst empty T ’s store buffer with #1 actions).

The following are allowed at any time:

5. T can execute a fence if its store buffer is empty (simi-

larly to #4, the memory subsystem must act to empty T ’s

store buffer ﬁrst).

6. T may write, enqueuing an entry to its store buffer.

7. T may read the global clock.

Bounding store buffering time In the TBTSO[∆] model

(where ∆ ≥ 1), we consider only the abstract machine ex-

ecutions in which the following property holds:

A write enqueued to a thread’s store buffer (action #6)

at global time t

is written to memory (action #1) at

global time t

≤ t

+ ∆.

3. TBTSO ﬂag principle

Fence use in TSO often occurs when applying the ﬂag prin-

ciple [18]. The ﬂag principle says that when two threads,

and T

, each “raise a ﬂag”—writing to a variable in

memory—and then “look” at the other’s ﬂag by reading it

from memory, then at least one will see the other’s ﬂag

raised [18]. Of course, correctly ordering “raising the ﬂag”

to be globally visible before “looking at the other ﬂag” re-

quires a memory fence on TSO (and TBTSO):

flag0 := 1 flag1 := 1

Flag fence fence

principle if (flag1) if (flag0)

print "saw T

" print "saw T

This section shows a TBTSO variant of the ﬂag principle

that is asymmetric: it removes the fence from T

’s code and

shifts the responsibility of maintaining correct ordering of

’s reads and writes to T

, which does so using the TBTSO

∆ bound. We subsequently apply this asymmetric TBTSO

ﬂag principle to remove the fence from the fast path of

hazard pointers (§ 4) and biased locks (§ 5).

To devise the TBTSO ﬂag principle, we ﬁrst use TBTSO’s

global time to rephrase the original ﬂag principle: If, when

reads flag

at time t

, T

’s write to flag

does not ap-

pear in memory (i.e., is not yet globally visible), then T

will

necessarily see flag

raised when it reads flag

at time

> t

. This holds because if this happens, then T

did not

yet execute its fence at time t

, whereas T

’s write is already

globally visible at time t

because of its fence. TBTSO al-

lows us to break this symmetry by removing the fence from

and placing the responsibility of guaranteeing the above

property on T

, which will wait ∆ time units before reading

’s ﬂag:

flag0 := 1 flag1 := 1

TBTSO fence

ﬂag wait ∆ time units

principle if (flag1) if (flag0)

print "saw T

" print "saw T

Now, if T

reads flag

at time t

but T

’s ﬂag write is

not yet globally visible, T

is still guaranteed that T

will

see flag

raised—because in this case T

has not yet issued

its fence at time t

and thus will read flag

at least ∆ time

units later, by which time T

’s write is globally visible. In

the opposite case, if T

reads flag

at time t

but T

’s write

is not yet globally visible, then T

has not written to its ﬂag

before t

−∆. However, T

’s write is globally visible at t

−∆

since T

issues a fence, and so T

must observe flag

ﬂag

raised.

4. Fence-free hazard pointers (FFHP)

This section describes fence-free hazard pointers (FFHP),

a nonblocking fence-free SMR algorithm for TBTSO. (Al-

though we build on hazard pointers [28], the ideas described

here apply equally well to Herlihy et al.’s guards [19]—an

SMR method that differs from hazard pointers only in how

removed objects are stored before being reclaimed.)

4.1 Standard hazard pointers

In the hazard pointers method, each thread maintains several

hazard pointers, hp

,hp

,. .., hp

, which it uses to announce

objects it is about to access. Applying hazard pointers in

剩余13页未读，继续阅读

weixin_38735570

粉丝: 5
资源: 934

TBTSO：时间受限的TSO内存模型与无_fence非对称同步

基于优化对比度增强的图像去雾算法

High-power, Joule-class, temporally shaped multi-pass ring laser amplifier with two Nd:glass laser heads

High-peak-power temporally shaped nanosecond fiber laser immune to SPM-induced spectral broadening

High-speed dual-view band-limited illumination profilometry using temporally interlaced acquisition

Temporally-Constrained Group Sparse Learning forLongitudinal Data Analysis in Alzheimer’s Disease

Empirical Analysis of Collaborative Filtering-based Recommenders in Temporally Evolving Systems

Ethanol-assisted ablation of silicon and germanium by temporally shaped femtosecond pulses

Long-term performance of collaborative filtering based recommenders in temporally evolving systems

TecoGAN:此存储库包含Temporally Coherent GAN SIGGRAPH项目的源代码和材料-Source material

无线网络路由协议TORA——Temporally Ordered Routing Algorithm(tora)

最新资源