利用现代处理器SIMD指令的k-ary搜索算法

111 浏览量更新于2024-08-25 收藏 386KB PDF 举报

"k-Ary Search on Modern Processors - 计算机科学" 本文由Benjamin Schlegel, Rainer Gemulla和Wolfgang Lehner共同撰写，来自 Technische Universität Dresden 和 IBM Almaden Research Center，探讨了在现代处理器上利用SIMD（单指令多数据）指令进行k-ary搜索的新算法。这些算法是二分查找的自然扩展，但比二分查找更高效。二分查找是一种经典搜索策略，每次迭代进行一次比较，将搜索空间分割成两半。而k-ary搜索算法则在同一迭代中执行k次比较，将搜索空间划分为k个部分。在传统的处理器上，由于每次迭代的额外成本，k-ary搜索并不优于二分查找。然而，在现代处理器中，由于可以同时执行多个标量操作，即SIMD指令集的优势，k-ary搜索变得更为吸引人，因为它能减少总的迭代次数，从而提高效率。论文提出了两种不同的搜索算法，它们在效率和内存访问模式上有所区别。这两种算法首先被描述，然后通过实验分析它们在不同场景下的性能。实验结果展示了在具有SIMD支持的现代处理器上，k-ary搜索如何通过并行化比较来提高搜索速度，尤其是在处理大数据集时，其性能提升尤为明显。文章进一步讨论了实现这些算法的技术细节，包括如何有效地利用SIMD指令进行数据并行处理，以及如何优化内存访问以减少缓存未命中的情况。此外，还可能涉及算法的复杂性分析，包括时间复杂性和空间复杂性，以评估它们在不同硬件环境下的适应性。在实际应用中，这些算法可能对数据库查询、搜索引擎、机器学习模型的预处理步骤等具有重要意义。通过优化这些搜索过程，可以显著提升系统整体的性能，特别是在需要大量数据处理的实时或近实时应用中。这篇论文为理解和利用现代处理器的SIMD特性提供了新的视角，并为开发更高效的搜索算法提供了理论基础和实践指导。对于计算机科学领域的研究人员和工程师来说，这是一个有价值的资源，有助于他们在设计和实现高性能计算解决方案时，更好地利用硬件的潜力。

k-Ary Search on Modern Processors

Benjamin Schlegel

Technische Universität Dresden

benjamin.schlegel@tu-

dresden.de

Rainer Gemulla

IBM Almaden Research Center

rgemull@us.ibm.com

Wolfgang Lehner

Technische Universität Dresden

wolfgang.lehner@tu-

dresden.de

ABSTRACT

This paper presents novel tree-based search algorithms that

exploit the SIMD instructions found in virtually all mod-

ern processors. The algorithms are a natural extension of

binary search: While binary search performs one compar-

ison at each iteration, thereby cutting the search space in

two halves, our algorithms perform k comparisons at a time

and thus cut the search space into k pieces. On traditional

processors, this so-called k-ary search procedure is not ben-

eﬁcial because the cost increase per iteration oﬀsets the cost

reduction due to the reduced number of iterations. On mod-

ern processors, however, multiple scalar operations can be

executed simultaneously, which makes k-ary search attrac-

tive. In this paper, we provide two diﬀerent search algo-

rithms that diﬀer in terms of eﬃciency and memory access

patterns. Both algorithms are ﬁrst described in a platform

independent way and then evaluated on various state-of-the-

art processors. Our experiments suggest that k-ary search

provides signiﬁcant performance improvements (factor two

and more) on most platforms.

1. INTRODUCTION

Searching is a fundamental operation that is used in al-

most every domain of computer science. The classical prob-

lem is to retrieve from a dataset of disjoint key/value pairs

(i) the value for a single given key or (ii) all the pairs within

a range of two given keys. For example, suppose that the

dataset consists of a set of persons. Then, a single-key query

might ask for a person with a given name, while a range

query might ask for all persons born between 01/01/1980

and 12/31/1980.

The classical search problem has been studied extensively

in the literature. Apart from linear search, one distin-

guishes sort-based, tree-based, and hash-based search algo-

rithms. Hash-based algorithms are well-suited for single-key

lookups, but they require memory over and above that to

store the base data

and—since the distribution of keys to

Although some modern hashing techniques [11] can achieve

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

Proceedings of the Fifth International Workshop on Data Management on

New Hardware (DaMoN 2009) June 28, 2009, Providence, Rhode-Island

buckets is randomized—they perform poorly in the presence

of range or nearest-key queries. In this paper, we restrict our

attention to the former two classes, i.e., sort-based and tree-

based search. When the dataset is sorted, binary search

constitutes the provably best algorithm of the sort-based

class of algorithms in terms of time complexity. Each step

of a binary search halves the search space by performing

one comparison so that the total number of comparisons is

logarithmic in the dataset size. Similar arguments hold for

tree-based search based on a binary search tree.

The downside of binary search is that it does not make use

of the SIMD capabilities of modern processors. On IBM’s

Cell processor, for example, the cost of a single 32-bit scalar

comparison is identical to the cost of four 32-bit scalar com-

parisons executed simultaneously. A naive execution of bi-

nary search would therefore “waste” three comparisons at

each step. A natural idea to boost the speed of searching is,

therefore, to not run binary search but k-ary search, k > 2,

where each step divides the search space into k parts based

on the outcome of k − 1 comparisons (e.g., k = 5 for the

Cell with 4-way vector registers). Although this approach

does not aﬀect the asymptotic time complexity of search, it

might signiﬁcantly reduce the actual execution cost.

In this paper, we take a closer look at k-ary search on

SIMD architectures. Our goal is to determine which SIMD

operations are essential for k-ary search to be more eﬃcient

than binary search and by what margin the former outper-

forms the latter on selected processors. Furthermore, we

show that an additional reduction in execution time can be

achieved by rearranging the dataset in an order more ap-

propriate for k-ary search. Our reordering is based on a

linearization of a k-ary search tree.

In what follows, we consider a somewhat idealistic scenario

in which the dataset is stored in main memory and is static

(or updated infrequently, e.g., on a daily or monthly basis).

Although these restrictions do not always hold in practice,

they allow us to focus on the advantages of modern architec-

tures over more conventional ones. To see this, suppose that

the dataset is not stored in memory but on hard disk. Then,

the I/O cost of reading the data would dominate the cost

of search and usually outweigh any performance gain due

to SIMD instructions. For the former reason, one typically

keeps as much of the data in memory as possible, in which

case k-ary search is attractive. Similarly, if the dataset is

updated frequently with respect to the number of searches,

the maintenance cost of both the sorted-array representation

and the linearized-tree representation become substantial.

up to 95-99% occupancy.

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38739900

粉丝: 4
资源: 928

利用现代处理器SIMD指令的k-ary搜索算法

基于正交循环码的M-ary扩频解扩新算法及FPGA实现.pdf

k-ary-tree:K元树，数组实现

ns-3-dce-fattree:ns-3-dce 中的 k-ary 胖树拓扑

daffodil:doffodil 是一个完整的 k-ary 树的实现

k-ary n-cube网络中跨区域适应性路由算法 (2009年)

An On-Line Reconfigurable Four-Ary Tree-Based Network on Chip for Distributed Particle Filters

PSLFS:虚拟文件系统，基于带有命令行界面的 K-Ary 数据结构 (C)

M-ary QAM 的 BER 比较：使用 AWGN 信道比较不同 M-ary QAM BER 的 MATLAB 代码。-matlab开发

实现 ns-3-dce 中 k-ary 胖树拓扑的步骤指南

k-ary n-cube网络的QCR路由算法：无死锁与负载均衡优化

最新资源