多生产者消费者弱内存一致性缓存感知无锁队列

116 浏览量更新于2024-07-14 收藏 251KB PDF 举报

"这篇论文探讨了缓存感知的无锁FIFO队列数据结构，适用于多生产者和多消费者环境，并能在弱内存一致性模型下工作。该算法利用缓存行为和懒更新共享数据来减少不必要的同步，提高性能。在8核多核平台上的实验显示，新算法相比于之前的快速无锁算法有显著更好的性能表现。" 本文主要介绍了一种针对多生产者和多消费者场景设计的无锁FIFO队列，它特别考虑了缓存效率和弱内存一致性。无锁数据结构是并发设计中一种可扩展的方法，它提供了高并发性和对死锁的免疫力。然而，在多线程和分布式系统中，内存一致性模型的差异可能会导致复杂性，特别是在缓存的行为上。无锁FIFO队列的关键在于其能够在不使用传统锁的情况下实现并发访问。这通常通过原子操作（如CAS - Compare and Swap）来实现，以确保数据的一致性。在本文中，作者提出的新算法不仅支持多个生产者同时向队列添加元素，还支持多个消费者同时从中移除元素，而不会出现竞争条件或数据不一致。为了适应弱内存一致性模型（如Intel x86的TSO或ARM的AArch64），该算法利用了缓存的行为，这包括缓存行填充、缓存同步和缓存一致性协议等。它采用懒更新策略，即推迟对共享数据的更新，直到绝对必要时才进行，从而减少了不必要的数据同步，提高了整体性能。此外，文章还引入了一种动态的无锁内存管理方案，这个方案能够进一步降低同步开销。这种管理策略可能包括动态分配和释放内存块，以及优化数据结构中的指针操作，以减少跨缓存线的冲突。实验证明，新算法在8核多核平台上相对于现有的快速无锁算法具有显著的性能优势。这表明，即使在资源紧张的环境中，该算法也能提供高效的并发处理能力，这对于高性能计算和大规模并发应用至关重要。这篇论文为并发编程提供了一个重要的工具，它解决了在弱内存一致性环境下构建高效无锁数据结构的挑战。通过深入理解缓存行为和优化内存管理，可以设计出既安全又高效的并发解决方案。这对于开发面向未来的多核和分布式系统软件来说，具有极高的实用价值。

Program 1 The functionality supported by the memory management scheme.

1 node_t

NewNode(int size);

2 void DeleteNode(node_t

node);

3 node_t

DeRefLink(node_t

link);

4 void ReleaseRef(node_t

node);

5 bool CASRef(node_t

link, node_t

old, node_t

_new);

6 void StoreRef(node_t

link, node_t

node);

Program 2 Callback procedures for the memory management.

1 void TerminateNode(block_t

node) {

2 StoreRef(&node->next,NULL);

3 }

4 void CleanUpNode(block_t

node) {

5 block_t

next = DeRefLink(&node->next);

6 block_t

next2 = DeRefLink(&globalTailBlock);

7 CASRef(&node->next, next, next2);

8 }

for increasing scalability besides allowing disjoint Enqueue and Dequeue operations to

execute in parallel.

3 The New Algorithm

The underlying data structure that our algorithmic design uses is a linked list of arrays,

and is depicted in Figure 1. In the data structure every array element contains a pointer to

some arbitrary value. Both the Enqueue and Dequeue operations are using increasing

array indices as each array element gets occupied versus removed. To ensure consis-

tency, items are inserted or removed into each array element by using the CAS atomic

synchronization primitive. To ensure that a Enqueue operation will not succeed with a

CAS at a lower array index than where the concurrent Dequeue operations are operat-

ing, we need to enable the CAS primitive to distinguish (i.e., avoid the ABA problem)

between ”used” and ”unused” array indices. For this purpose two null pointer values

[11] are used; one (NULL) for the empty indices and another (NULL2) for the removed

indices. As each array gets fully occupied (or removed), new array blocks are added to

(or removed from) the linked list data structure. Two shared pointers, globalHeadBlock

and globalTailBlock, are globally indicating the ﬁrst and last active blocks respectively.

These shared pointers are also concurrently updated using CAS operations as the linked

list data structure changes. However, as these updates are done lazily (not atomically

together with the addition of a new array block), the actually ﬁrst or last active block

might be found by following the next pointers of the linked list.

As a successful update of a shared pointer will cause a cache miss to the other

threads that concurrently access that pointer, the overall strategy for improving perfor-

mance and scalability of the new algorithm is to avoid accessing pointers that can be

concurrently updated [5]. Moreover, our algorithm achieves fewer updates by not hav-

ing shared variables with explicit information regarding which array index currently

being the next active for the Enqueue or Dequeue. Instead each thread is storing its

剩余15页未读，继续阅读

weixin_38674763

粉丝: 6
资源: 967

多生产者消费者弱内存一致性缓存感知无锁队列

最新资源