利用SIMD指令快速排序集交集算法

73 浏览量更新于2024-08-25 收藏 335KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"Fast Sorted-Set Intersection using SIMD Instructions (p1-SCHLEGEL)-计算机科学" 这篇论文探讨了使用SIMD（单指令多数据）指令进行快速排序集合交集的方法，这是许多算法，如RID列表交集和倒排索引中的关键部分。与传统的基于标量的排序集合交集算法不同，该研究的重点是通过推测性执行比较来实现并行化，从而减少指令数量，提高整体速度。在传统的方法中，排序集合交集的算法通常致力于减少比较次数。然而，作者们提出了一种新的策略，它虽然可能需要更多的比较，但由于并行处理能力的提升，反而能执行更少的指令，从而提高了效率。这种方法的核心是利用现代处理器中广泛支持的SIMD指令，它们能够在单个时钟周期内处理多个数据元素，极大地提升了处理速度。论文中，作者们为不同类型的整数数据提供了多种排序集合交集的算法。他们提出了使用未压缩整数值作为输入和输出的版本，以及使用压缩整数值的版本。使用压缩整数值可以节省存储空间，但在处理时可能需要额外的解压步骤，这需要权衡空间效率和计算效率。此外，论文可能还详细分析了不同算法的性能，包括它们的复杂度、内存访问模式以及在各种硬件配置下的表现。作者们可能通过实验比较了他们的SIMD优化算法与传统算法的性能差异，展示了在实际应用中的优势。论文可能还涵盖了如何有效地利用SIMD指令来并行处理数据，可能包括如何在多个数据元素间同步推测性执行，以及如何处理可能的边界条件和错误情况。作者们可能还讨论了这些技术对于大数据分析、数据库查询优化以及机器学习等领域的潜在影响。这篇论文为使用SIMD指令进行高效排序集合交集提供了一个全新的视角，通过并行化和推测执行，即使在增加比较次数的情况下，也能实现性能的显著提升。这对于需要处理大量数据集的领域来说，是一种重要的性能优化手段。

资源详情

资源推荐

Fast Sorted-Set Intersection using SIMD Instructions

Benjamin Schlegel

TU Dresden

Dresden, Germany

benjamin.schlegel@tu-

dresden.de

Thomas Willhalm

Intel GmbH

Munich, Germany

thomas.willhalm@intel.com

Wolfgang Lehner

TU Dresden

Dresden, Germany

wolfgang.lehner@tu-

dresden.de

ABSTRACT

In this paper, we focus on sorted-set intersection which is

an important part in many algorithms, e.g., RID-list inter-

section, inverted indexes, and others. In contrast to tradi-

tional scalar sorted-set intersection algorithms that try to

reduce the number of comparisons, we propose a parallel

algorithm that relies on speculative execution of compar-

isons. In general, our algorithm requires more comparisons

but less instructions than scalar algorithms that translates

into a better overall speed. We achieve this by utilizing ef-

ﬁcient single-instruction-multiple-data (SIMD) instructions

that are available in many processors. We provide diﬀerent

sorted-set intersection algorithms for diﬀerent integer data

types. We propose versions that use uncompressed integer

values as input and output as well as a version that uses a

tailor-made data layout for even faster intersections. In our

experiments, we achieve speedups up to 5.3x compared to

popular fast scalar algorithms.

1. INTRODUCTION

Sorted-set intersection is a fundamental operation in

query processing in the area of databases and information

retrieval. It is part of many algorithms and often accounts

for a large fraction of the overall runtime. Examples are in-

verted indexes in information retrieval [25], lists intersection

in frequent-itemset mining [1, 28], and merging of RID-lists

in database query processing [19]. In many of these applica-

tion areas low latencies are a key concern so that reducing

the execution time of set intersection is very important.

There has been done a lot of research with the goal of im-

proving sorted-set intersection. Many approaches focus on

speed up sequential intersection [4, 5, 9, 11, 13, 20] by using

eﬃcient data structures or improved processing techniques.

Other approaches focus on utilizing modern hardware like

graphic processors (GPUs) [1, 2, 12, 26, 27] and multi-core

CPUs [23, 24] to utilize the parallelism oﬀered by these pro-

cessors. However, the main focus of these approaches is only

on thread-level parallelism. Data-level parallelism is so far

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee. This article was presented at:

The Second International Workshop on Accelerating Data Management

Systems using Modern Processor and Storage Architectures (ADMS’11).

not considered although it is available via SIMD instruction

sets in almost all modern CPUs. For this reason, our goal

in this paper is to speed up the intersection of sorted sets

using SIMD capabilities.

Utilization of SIMD capabilities in algorithms is ideally

achieved through automatic vectorizing by compilers or by

inserting SIMD instructions manually. However, many com-

pilers detect vectorization opportunities only for simple loop

constructs with few or without any data dependencies. In

all other cases, hand-tuned assembly or intrinsics must be

used. Unfortunately, all sorted-set intersection algorithms

have complex data dependencies so that automatic vector-

ization cannot be applied.

In this paper, we introduce parallel sorted-set intersection

algorithms that rely on speculative execution. Our main in-

tention is to speculatively execute more than the necessary

comparisons as done by the scalar algorithms. To do this

eﬃciently, we use the string and text processing new instruc-

tions (STTNI) that are part of the Intel

 Streaming SIMD

Extensions 4.2 (Intel

 SSE 4.2).

These instructions allow

a fast full comparison of either eight 16-bit values (= 64

comparisons) or sixteen 8-bit values (= 256 comparisons)

with only one instruction. Many of these comparisons are

useless and would not been exectuted by scalar algorithms.

However, since these instructions itself require only 8 cycles

to complete [14] we achieve signiﬁcant speedups.

In summary, our contributions are as follows:

• We propose fast parallel sorted-set intersection algo-

rithms for 8-bit, 16-bit and 32-bit integer values based

on STTNI of Intel SSE 4.2. The algorithms use un-

compressed integer values as input and output. We

explain in detail the necessary SIMD instructions and

steps of the parallel intersection.

• We present a hierarchical data layout that is tailor-

made for a fast parallel intersection of integer values

with a precision higher than 16 bits.

• We compare our parallel algorithms with two highly

eﬃcient scalar versions on synthetic datasets.

The scope of our paper is as follows. We focus solely

on sorted-set intersection of integer values without dupli-

cates. Usually, this is not a limitation since both of our ex-

ample scenarios—information retrieval and frequent-itemset

Intel and Core are trademarks of Intel Corporation in the

U.S. and/or other countries. Other names and brands may

be claimed as the property of others.

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38499950

粉丝: 4
资源: 941

利用SIMD指令快速排序集交集算法

java8看不到源码-persistent-sorted-set:Clojure/Script的基于快速B树的持久排序集

redis帮助文档之sorted-set.md

springboot 基于sorted-set的实现延时队列

这五种数据类型怎么使用

ruby操作redis的sorted set

ruby操作redis的sorted set，如何获取指定key

redistemplate 如何使用 redis sorted set

sorted(set(self.ids))、

sorted(set(y))

sorted set和zset的区别

sns.heatmap(cf_matrix, annot=True, xticklabels = sorted(set(y_test)), yticklabels = sorted(set(y_test)))

已知 x = 'abcddcefag'，那么表达式 ''.join(sorted(set(x), key=x.rindex)) 的值为'bdcefag'。

写一个matlab函数找出两个数组的交集并进行排序

python字符串缩写

C++将两个升降序不一定的有序链表合并成一个链表，并保持有序性

二路归并排序的链式实现

redis 的Sorted Set可以分页吗

最新资源