k-Ary Search on Modern Processors
Benjamin Schlegel
Technische Universität Dresden
benjamin.schlegel@tu-
dresden.de
Rainer Gemulla
IBM Almaden Research Center
rgemull@us.ibm.com
Wolfgang Lehner
Technische Universität Dresden
wolfgang.lehner@tu-
dresden.de
ABSTRACT
This paper presents novel tree-based search algorithms that
exploit the SIMD instructions found in virtually all mod-
ern processors. The algorithms are a natural extension of
binary search: While binary search performs one compar-
ison at each iteration, thereby cutting the search space in
two halves, our algorithms perform k comparisons at a time
and thus cut the search space into k pieces. On traditional
processors, this so-called k-ary search procedure is not ben-
eficial because the cost increase per iteration offsets the cost
reduction due to the reduced number of iterations. On mod-
ern processors, however, multiple scalar operations can be
executed simultaneously, which makes k-ary search attrac-
tive. In this paper, we provide two different search algo-
rithms that differ in terms of efficiency and memory access
patterns. Both algorithms are first described in a platform
independent way and then evaluated on various state-of-the-
art processors. Our experiments suggest that k-ary search
provides significant performance improvements (factor two
and more) on most platforms.
1. INTRODUCTION
Searching is a fundamental operation that is used in al-
most every domain of computer science. The classical prob-
lem is to retrieve from a dataset of disjoint key/value pairs
(i) the value for a single given key or (ii) all the pairs within
a range of two given keys. For example, suppose that the
dataset consists of a set of persons. Then, a single-key query
might ask for a person with a given name, while a range
query might ask for all persons born between 01/01/1980
and 12/31/1980.
The classical search problem has been studied extensively
in the literature. Apart from linear search, one distin-
guishes sort-based, tree-based, and hash-based search algo-
rithms. Hash-based algorithms are well-suited for single-key
lookups, but they require memory over and above that to
store the base data
1
and—since the distribution of keys to
1
Although some modern hashing techniques [11] can achieve
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Proceedings of the Fifth International Workshop on Data Management on
New Hardware (DaMoN 2009) June 28, 2009, Providence, Rhode-Island
Copyright 2009 ACM 978-1-60558-701-1 ...$10.00.
buckets is randomized—they perform poorly in the presence
of range or nearest-key queries. In this paper, we restrict our
attention to the former two classes, i.e., sort-based and tree-
based search. When the dataset is sorted, binary search
constitutes the provably best algorithm of the sort-based
class of algorithms in terms of time complexity. Each step
of a binary search halves the search space by performing
one comparison so that the total number of comparisons is
logarithmic in the dataset size. Similar arguments hold for
tree-based search based on a binary search tree.
The downside of binary search is that it does not make use
of the SIMD capabilities of modern processors. On IBM’s
Cell processor, for example, the cost of a single 32-bit scalar
comparison is identical to the cost of four 32-bit scalar com-
parisons executed simultaneously. A naive execution of bi-
nary search would therefore “waste” three comparisons at
each step. A natural idea to boost the speed of searching is,
therefore, to not run binary search but k-ary search, k > 2,
where each step divides the search space into k parts based
on the outcome of k − 1 comparisons (e.g., k = 5 for the
Cell with 4-way vector registers). Although this approach
does not affect the asymptotic time complexity of search, it
might significantly reduce the actual execution cost.
In this paper, we take a closer look at k-ary search on
SIMD architectures. Our goal is to determine which SIMD
operations are essential for k-ary search to be more efficient
than binary search and by what margin the former outper-
forms the latter on selected processors. Furthermore, we
show that an additional reduction in execution time can be
achieved by rearranging the dataset in an order more ap-
propriate for k-ary search. Our reordering is based on a
linearization of a k-ary search tree.
In what follows, we consider a somewhat idealistic scenario
in which the dataset is stored in main memory and is static
(or updated infrequently, e.g., on a daily or monthly basis).
Although these restrictions do not always hold in practice,
they allow us to focus on the advantages of modern architec-
tures over more conventional ones. To see this, suppose that
the dataset is not stored in memory but on hard disk. Then,
the I/O cost of reading the data would dominate the cost
of search and usually outweigh any performance gain due
to SIMD instructions. For the former reason, one typically
keeps as much of the data in memory as possible, in which
case k-ary search is attractive. Similarly, if the dataset is
updated frequently with respect to the number of searches,
the maintenance cost of both the sorted-array representation
and the linearized-tree representation become substantial.
up to 95-99% occupancy.