GPU优化的细粒度锁基跳表算法：加速并发计算

114 浏览量更新于2024-08-25 收藏 2.23MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

Val 0

Key 0

DATA (N-2 entries)

Val 1

Key 1

Upper

32-bits

Lower

32-bits

Val N-3

Key N-3

Next Ptr

Max Field

Lock Field

LOCK

…

Figure 2. Format of a chunk of size N

Data Array

20 25 33 … 33

-∞

5 10 … 10

101 570 … …

∞

-∞

10 25 … 25

-∞

570 … …

∞

…

ctr ptr

Head

Array

570 600 810 …

∞

Lock

Figure 3. A chunked skiplist

these requirements by using array-based skiplist nodes and

allowing threads in a warp to cooperate in the execution of

the skiplist operations.

We tackle the problem of scattered memory accesses by

packing consecutive key-value pairs residing in the same

level into large cache-aligned skiplist nodes called chunks,

shown in Fig. 2. Chunks contain a data array, a sorted array

of key-value pairs, along with a LOCK entry and a NEXT

entry consisting of a pointer to the next chunk and a max

ﬁeld holding the maximum key in the current chunk. hunks

are designed to be read efﬁciently in the fewest possible

memory transactions.

GFSL consists of several levels of chunked linked lists,

each containing a subset of the keys in the level below, as

seen in Fig. 3. Each chunk’s data array is sorted in rising

order, with empty entries denoted by a special ∞ value and

grouped at the end of the array. In the upper levels the value

ﬁeld of each entry in the data array points to a chunk in

the level below, and in the bottom level this ﬁeld will hold

the data element associated with the corresponding key. A

key-value pair in level i + 1 generally points to a chunk

containing the same key in level i, though it may temporarily

point to a chunk containing smaller values during Inserts and

Deletes. The ﬁrst chunk in each level contains a −∞ key in

the ﬁrst entry with a pointer to the ﬁrst chunk in the level

below, and is accessed via a pointer from the Head Array.

The last chunk in every level contains an ∞ value in both

its next-pointer and max ﬁelds. ∞ and −∞ are distinct from

keys in the structure.

Threads are divided into groups called teams, which

cooperate to perform the skiplist operations. Teams can be

deﬁned by the user to be either the size of a warp or smaller.

The number of entries in a chunk is equal to the number

of threads in a team, so that the entire chunk is read in a

single kernel instruction (executed in lockstep by the team).

Each thread in a team simultaneously reads data from the

chunk index corresponding to its place within the team (tId).

For a team of size N the ﬁrst (N-2) threads, called DATA

threads, access the data array, while the last two access the

NEXT and LOCK values respectively. Each thread performs

computations on the value it read then cooperates with the

rest of its team to decide on the next step in the execution

via intra-warp operations.

Structure traversal is similar in spirit to traversal over a

regular skiplist. A team searching for a key k reads the ﬁrst

chunk in the highest level. Each DATA thread compares k to

the key read from its entry, while the NEXT thread compares

k to the maximum ﬁeld. The threads share their results and

decide simultaneously how to continue the traversal: either

a lateral step via the next pointer, or a step down to the next

level via a pointer in some DATA ﬁeld. The team continues

laterally if the searched key is greater than the maximum

and steps down otherwise via the data-entry containing the

largest key smaller or equal to k. If all keys in the chunk are

greater than k then the team must backtrack to the previous

chunk in the level and step down from there.

Insert and Delete operations are likewise performed by an

entire team in tandem while ensuring the chunks remain both

internally and externally sorted. If an insertion occurs when

there is no free space in the data array a split operation

is performed: A new chunk is allocated and added to

the structure after the overﬂowed chunk. The data array

is divided equally between both chunks, whilst remaining

sorted. Conversely, if a deletion causes a lower bound on

the number of key-value pairs to be crossed then a merge

operation is performed: the chunk is marked as a zombie and

its values are moved to the next chunk in the level. If the

next chunk is too full this operation may cause it to be split.

Pointers are redirected after both split and merge operations

in order to ensure the upper level pointers remain accurate

and to physically remove a zombie from the structure. All

changes to the contents of the skiplist are performed under

the protection of the chunks’ locks, so at most one team can

change the contents of a chunk at any time.

GFSL contains fewer nodes and levels than the classic

skiplist. A single node in GFSL contains several keys, and

so replaces several separate nodes in the classic version.

Thus more keys can be inserted into a level before it

becomes necessary to add a pointer in the level above.

The teams process more data for every memory transaction

than a single thread does in the original algorithm, enabling

faster traversals over the structure, while also causing less

divergence within a warp.

Unlike the classic skiplist algorithm, GFSL does not

predetermine a level for every key inserted. Instead, a key

can be raised to level i + 1 only as a result of a split, i.e.

when a new chunk is added to level i. Raising the key as a

result of insertion of new chunks and not single keys causes

the factor between levels to be tied to the number of entries

in a chunk, aiding in shorter traversals. In an ideal structure

剩余13页未读，继续阅读

weixin_38708841

粉丝: 3
资源: 945

GPU优化的细粒度锁基跳表算法：加速并发计算

各种算法的Python实现方案-Python-and-Algorith.pdf

通过遗传算法实现在固定区间内求某函数最大值-JAVA_genetic-algorith-JAVA.zip

louvain algorith聚类的matlab代码

psd归一化MATLAB

5020-微信小程序基于JAVA微信点餐小程序设计+ssm（源码+数据库+lun文）.zip

基于 Flask 的数字猜谜系统.zip

5206-微信小程序投票评选系统的设计与实现ssm（源码+数据库+lun文）.zip

PeaZip 64 bit 9.9.1 free 解压缩工具 杜绝360垃圾

vit_keras-0.0.12-py3-none-any.whl

5272-微信小程序微信智能招聘小程序设计+ssm（源码+数据库+lun文）.zip

1.Ansible 自动化运维实战笔记.xmind分享给需要的同学

bacpypes-0.15.0-py3-none-any.whl

人工智能大作业-无人机图像目标检测基于python源代码+文档说明+数据集.zip

【前景培训教材】第二十三章 4G干扰优化专项.pdf

5015-微信小程序基于微信的设备故障报修管理系统设计与实现+ssm（源码+数据库+lun文）.zip

基于 Flask 和 Azure 的智能医疗诊断应用.zip

影视热门排行网站系统.zip

5211-微信小程序的水果销售系统的设计与实现springboot（源码+数据库+lun文）.zip

5184-微信小程序社区二手物品交易小程序ssm（源码+数据库+lun文）.zip

【前景培训教材】第十九章投诉处理.pdf

最新资源

PeaZip 64 bit 9.9.1 free 解压缩工具杜绝360垃圾