优化受限交易内存下Patricia trie动态数据结构的案例研究

104 浏览量更新于2024-08-25 收藏 318KB PDF 举报

本文是一篇2015年发表的研究论文，由Thomas J. Repetti和Maurice P. Herlihy两位来自布朗大学计算机科学系的专家撰写，标题为《利用受限交易内存优化HTM动态数据结构：帕特里西亚树》。随着多核微处理器的兴起，特别是引入了受限交易内存(RTM)及其编译器支持，作者重新审视了基础数据结构的设计，以发掘更多的并行性能。帕特里西亚树是一种常用的动态数据结构，广泛用于存储集合和字典，特别是在需要高效查找、添加和删除操作的场景。研究的核心内容是设计一个并发的帕特里西亚树实现，它能够适应动态大小变化。作者采用了锁传送RTM快速路径来加速查找、插入和删除操作，同时引入原子交换自旋锁作为慢路径，以平衡并发控制和性能。在这个实现中，作者特别关注字母表大小和树深度这两个与帕特里西亚树特性紧密相关的因素，以优化数据结构的性能。论文中提出了一个新颖的方法，即确定在动态分配的数据结构上进行特定操作时，如读写操作与操作系统内存管理功能交互的次数，以此来确定最佳的重试策略。这种策略旨在减少不必要的同步开销，提高系统的吞吐量和响应时间。通过分离操作的重试策略，作者试图找到在并发环境下维持数据一致性和性能的最佳平衡点。这篇论文提供了对在受限交易内存环境中优化动态数据结构的深入分析，特别是针对帕特里西亚树的并发设计，展示了如何通过技术创新来应对现代硬件环境对并行处理能力的需求。这对于理解并利用RTM技术提升多核系统性能，以及设计高效的数据结构在分布式和并发系统中的应用具有重要意义。

A Case Study in Optimizing HTM-Enabled

Dynamic Data Structures: Patricia Tries

Thomas J. Repetti Maurice P. Herlihy

Department of Computer Science

Brown University

{trep etti,mph}@cs.brown.edu

Abstract

The advent of multi-core microprocessors with restricted transac-

tional memory (RTM) and accompanying compiler support allows

us to revisit fundamental data structures with an eye to extracting

more parallelism. The Patricia trie is one such common data struc-

ture used for storing both sets and dictionaries in a variety of con-

texts. This paper presents a concurrent implementation of a dynam-

ically sized Patricia trie using a lock teleportation RTM fast path for

find, add and remove operations, and a slow path based on atomic

exchange spinlocks. We weigh the tradeoffs between alphabet size

and tree depth inherent to tries and propose a novel means of deter-

mining the optimal number of retry attempts for speciﬁc operations

on dynamically allocated data structures. The strategy proposed

separates the retry policy governing operations that potentially in-

teract with the operating system’s memory management facilities

from read-only operations, and we ﬁnd that this transactional trie

can support considerably higher multiprogramming levels than its

lock-based equivalent. A notable result is that this scheme keeps

throughput from collapsing at high thread counts, even when the

number of threads interacting with the data structure exceeds the

number of hardware contexts available on the system.

Keywords Patricia trie, symmetric multiprocessing, concurrent

data structure, hardware transactional memory, restricted transac-

tional memory

1. Introduction

1.1 Transactional Memory

Transactional memory is a synchronization paradigm, which ef-

fectively exends the atomicity of traditional atomic shared mem-

ory operations like compare-and-swap or fetch-and-add to gener-

alized read-modify-write operations on arbitrary regions of mem-

ory [12]. Although it was originally conceived as an architectural

feature to extend cache coherency protocols in hardware, until re-

cently all implementations were strictly in software [29]. The com-

posability of speculative regions of software execution alone is a

beneﬁt to the productivity of programmers writing concurrent soft-

ware, but before the widespread commercial availability of hard-

ware with transactional memory support, the full performance ben-

eﬁts of the technique could not be brought to bear. Currently avail-

able commercial hardware such as Intel’s Haswell processors [14]

and POWER8 architecture systems like IBM Blue Gene/Q [11] and

System z [16] all support best-effort hardware transactional mem-

ory, meaning there are no guarantees of forward progress, and a

transaction may abort for any reason, the cause of which may be

opaque to the programmer. Avni and Kuszmaul preface [1] with

a good summary of the variety of issues that may trigger an abort

for unspeciﬁed reasons under Intel’s Transactional Synchronization

Extensions (TSX) RTM. The reasons for transactional aborts under

the TSX scheme that do not have an cause visible to the program-

mer may include cache misses, TLB misses and interrupts. For

these reasons, it is common to use pre-allocation strategies when

investigating data structures under HTM in order to avoid inter-

ference from the operating system. The implementation presented

here, however, uses dynamic memory allocation at runtime as one

would expect from a normal data structure in the ﬁeld.

1.2 Tries

Tries [9] are tree data structures used to store a set of arbitrary

length keys. The root of such a tree is a node corresponding to

the null string key. Each node, including the root, has a number

of possible children determined by the number of characters in the

alphabet from which strings are composed. A string present in the

set will have a succession of non-null pointers to character nodes

starting at the root which match each character in its sequence. This

assumes the presence of a string termination signiﬁer such as the

“” character from the C string model or a ﬂag within the node

signfying the end of a string. Without a signiﬁer of this kind, it

could be inferred that preﬁxes of strings in the set were themselves

keys in the set when in fact they were not [3]. In their simplest

form, therefore, tries have exactly one node for every character in a

unique sufﬁx of a given string within the set. Shared string preﬁxes

then share the preﬁx nodes descending from the root since their

unique sufﬁxes will only branch at the node corresponding to the

character at which the strings themselves diverge.

1.3 Patricia Tries

A Patricia trie [24] (also known as a radix tree or preﬁx tree) is a

compact trie, in which any only child node can be eliminated by in-

corporating it into its parent node. A string with a sufﬁx unique to

the set can consequently be stored with a single node regardless of

the length of the sufﬁx. By the same logic, common internal sub-

strings need only be represented by a single node as well, which

further reduces memory overhead. This modiﬁcation requires that

the data structure keep track of the omitted characters in the com-

pressed portions of the string on a node-by-node basis. Due to their

modest time complexity for key lookup, Patricia tries are often

used in IP address lookup [26] [30], as well as as natural language

processing applications such as approximate string matching [28].

They also often serve as an intermediary lookup data structure for

more intricate objects such as the string B-tree [8].

1.4 Alphabet Size

The most natural choice of alphabet, in which there are 256 pos-

sible characters for each byte, may be efﬁcient in the overall num-

ber of pointer references necessary to store a given string, but this

incurs an additional memory cost because of the per-node stor-

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38744526

粉丝: 16
资源: 959

优化受限交易内存下Patricia trie动态数据结构的案例研究

Algorithms and Data Structures - Niklaus Wirth

基于Ssm和Vue的电影网站源码 电影网站代码（程序，中文注释）

基于微盾品牌的VwFirewall防火墙设计源码

高校推免报名 基于Ssm和Mysql的高校推免报名代码（程序，中文注释）

党务政务服务热线平台 基于Ssm和Mysql的党务政务服务热线平台代码（程序，中文注释）

基于asp.net的教师工作量管理系统设计与实现.docx

ESG批发零售行业白皮书

基于asp.net的学生选课系统设计与实现.docx

基于Java后端和多种前端技术的15高德服装店电商平台设计源码

2024医药健康行业数字化转型白皮书

最新资源

基于Ssm和Vue的电影网站源码电影网站代码（程序，中文注释）

高校推免报名基于Ssm和Mysql的高校推免报名代码（程序，中文注释）

党务政务服务热线平台基于Ssm和Mysql的党务政务服务热线平台代码（程序，中文注释）