没有合适的资源?快使用搜索试试~ 我知道了~
首页GPU优化的细粒度锁基跳表算法:加速并发计算
本文档深入探讨了一种专为图形处理单元(GPUs)设计的优化的细粒度锁基础跳表算法——GPU-Friendly Skiplist。作者是Nurit Moscovici、Erez Petrank,分别来自以色列特拉维夫的Technion计算机科学系,以及瑞士洛桑联邦理工学院的Computer Science Dept.,作者Nachshon Cohen也参与了这项研究。跳表是一种常用的数据结构,尤其在需要高效查找和插入操作的并发计算中发挥关键作用,而GPU因其并行处理能力常被用于加速这类任务。 传统的并行计算在GPU上遇到的主要挑战之一是数据访问的复杂性和细粒度同步的需求。由于GPU的优势在于内存访问的聚集和执行线程的协同,因此设计一个能够充分利用这些特性的同时处理复杂数据不规则访问的跳表至关重要。该算法的核心创新在于采用基于数组的节点,每个节点由一组称为warps的线程集共同操作,这样可以减少内存冲突和提高执行效率。这种设计允许算法在GPU环境中实现更高效的并发计算。 论文展示了作者团队实施的GPU-Friendly Skiplist的具体实现,并通过实验对比展示了其与传统跳表设计相比,性能提升显著,最高可达到11.6倍。这表明,针对GPU优化的数据结构设计对于提升大规模并行应用的性能具有重要意义。此外,该研究可能对其他需要高效并发处理和内存管理的领域,如数据库索引、图形渲染或机器学习中的梯度计算等,提供了有价值的技术参考。这篇文章为GPU上的并发数据结构设计提供了一种新的解决方案,值得在高性能计算和GPU编程社区进一步研究和探讨。
资源详情
资源推荐
Val 0
Key 0
DATA (N-2 entries)
Val 1
Key 1
Upper
32-bits
Lower
32-bits
Val N-3
Key N-3
Next Ptr
Max Field
NEXT
Lock Field
LOCK
…
…
Figure 2. Format of a chunk of size N
Data Array
20 25 33 … 33
-∞
5 10 … 10
101 570 … …
∞
-∞
10 25 … 25
-∞
570 … …
∞
…
ctr ptr
1
2
12
Head
Array
570 600 810 …
∞
Next
Lock
Figure 3. A chunked skiplist
these requirements by using array-based skiplist nodes and
allowing threads in a warp to cooperate in the execution of
the skiplist operations.
We tackle the problem of scattered memory accesses by
packing consecutive key-value pairs residing in the same
level into large cache-aligned skiplist nodes called chunks,
shown in Fig. 2. Chunks contain a data array, a sorted array
of key-value pairs, along with a LOCK entry and a NEXT
entry consisting of a pointer to the next chunk and a max
field holding the maximum key in the current chunk. hunks
are designed to be read efficiently in the fewest possible
memory transactions.
GFSL consists of several levels of chunked linked lists,
each containing a subset of the keys in the level below, as
seen in Fig. 3. Each chunk’s data array is sorted in rising
order, with empty entries denoted by a special ∞ value and
grouped at the end of the array. In the upper levels the value
field of each entry in the data array points to a chunk in
the level below, and in the bottom level this field will hold
the data element associated with the corresponding key. A
key-value pair in level i + 1 generally points to a chunk
containing the same key in level i, though it may temporarily
point to a chunk containing smaller values during Inserts and
Deletes. The first chunk in each level contains a −∞ key in
the first entry with a pointer to the first chunk in the level
below, and is accessed via a pointer from the Head Array.
The last chunk in every level contains an ∞ value in both
its next-pointer and max fields. ∞ and −∞ are distinct from
keys in the structure.
Threads are divided into groups called teams, which
cooperate to perform the skiplist operations. Teams can be
defined by the user to be either the size of a warp or smaller.
The number of entries in a chunk is equal to the number
of threads in a team, so that the entire chunk is read in a
single kernel instruction (executed in lockstep by the team).
Each thread in a team simultaneously reads data from the
chunk index corresponding to its place within the team (tId).
For a team of size N the first (N-2) threads, called DATA
threads, access the data array, while the last two access the
NEXT and LOCK values respectively. Each thread performs
computations on the value it read then cooperates with the
rest of its team to decide on the next step in the execution
via intra-warp operations.
Structure traversal is similar in spirit to traversal over a
regular skiplist. A team searching for a key k reads the first
chunk in the highest level. Each DATA thread compares k to
the key read from its entry, while the NEXT thread compares
k to the maximum field. The threads share their results and
decide simultaneously how to continue the traversal: either
a lateral step via the next pointer, or a step down to the next
level via a pointer in some DATA field. The team continues
laterally if the searched key is greater than the maximum
and steps down otherwise via the data-entry containing the
largest key smaller or equal to k. If all keys in the chunk are
greater than k then the team must backtrack to the previous
chunk in the level and step down from there.
Insert and Delete operations are likewise performed by an
entire team in tandem while ensuring the chunks remain both
internally and externally sorted. If an insertion occurs when
there is no free space in the data array a split operation
is performed: A new chunk is allocated and added to
the structure after the overflowed chunk. The data array
is divided equally between both chunks, whilst remaining
sorted. Conversely, if a deletion causes a lower bound on
the number of key-value pairs to be crossed then a merge
operation is performed: the chunk is marked as a zombie and
its values are moved to the next chunk in the level. If the
next chunk is too full this operation may cause it to be split.
Pointers are redirected after both split and merge operations
in order to ensure the upper level pointers remain accurate
and to physically remove a zombie from the structure. All
changes to the contents of the skiplist are performed under
the protection of the chunks’ locks, so at most one team can
change the contents of a chunk at any time.
GFSL contains fewer nodes and levels than the classic
skiplist. A single node in GFSL contains several keys, and
so replaces several separate nodes in the classic version.
Thus more keys can be inserted into a level before it
becomes necessary to add a pointer in the level above.
The teams process more data for every memory transaction
than a single thread does in the original algorithm, enabling
faster traversals over the structure, while also causing less
divergence within a warp.
Unlike the classic skiplist algorithm, GFSL does not
predetermine a level for every key inserted. Instead, a key
can be raised to level i + 1 only as a result of a split, i.e.
when a new chunk is added to level i. Raising the key as a
result of insertion of new chunks and not single keys causes
the factor between levels to be tied to the number of entries
in a chunk, aiding in shorter traversals. In an ideal structure
剩余13页未读,继续阅读
weixin_38708841
- 粉丝: 3
- 资源: 945
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- C++多态实现机制详解:虚函数与早期绑定
- Java多线程与异常处理详解
- 校园导游系统:无向图实现最短路径探索
- SQL2005彻底删除指南:避免重装失败
- GTD时间管理法:提升效率与组织生活的关键
- Python进制转换全攻略:从10进制到16进制
- 商丘物流业区位优势探究:发展战略与机遇
- C语言实训:简单计算器程序设计
- Oracle SQL命令大全:用户管理、权限操作与查询
- Struts2配置详解与示例
- C#编程规范与最佳实践
- C语言面试常见问题解析
- 超声波测距技术详解:电路与程序设计
- 反激开关电源设计:UC3844与TL431优化稳压
- Cisco路由器配置全攻略
- SQLServer 2005 CTE递归教程:创建员工层级结构
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功