HASHI: An Application-Specific Instruction Set
Extension for Hashing
Oliver Arnold, Sebastian Haas,
Gerhard Fettweis
Vodafone Chair Mobile Communications
Systems
Technische Universit
¨
at Dresden
Dresden, Germany
Benjamin Schlegel
⇤
, Thomas Kissinger,
Tomas Karnagel, Wolfgang Lehner
Database Technology Group
Technische Universit
¨
at Dresden
Dresden, Germany
ABSTRACT
Hashing is one of the most relevant operations within query
processing. Almost all core datab a se operators like group-
by, selections, or di↵erent join implementations rely on
highly efficient hash i mp l ementations. In thi s paper, we
present a way to significantly improve performance and en-
ergy efficiency of hash op era ti o n s using specialized instruc-
tion set extensions for the Ten sil ic a Xtensa LX5 core. To
show the applicability of instruction set extensions, we im-
plemented a bit extraction hashing scheme for 32-bit integer
keys as well as the CityHash fun c t io n for string values. We
identify the individual parts of the algorithms required to
be optimized, we describe our hashing-specific instructi o n
set, and finally give a comprehensive experimental evalua-
tion. We observed th a t the hash implementation usin g the
hashing-specific instruction set (1) is u p to two orders of
magnitudes fas te r than the basic core without extensions,
(2) exhi b it s always better performance compared to hand-
tuned code running on modern high-end general purpose
CPUs, and (3) has a significantly better footprint with re-
spect to energy consumption a s well as chip area. Especially
the third observation has the potential for a higher packing
density and therefore a significantly better overall system
performance.
1. INTRODUCTION
Database systems can be optimized in m a ny di↵erent di-
rections, while the overall challenges are often optimiza-
tions of algorithms or adapting the software to given hard-
ware features like multiple co res , SIMD, and memory hier-
archies. Algorithms deployed in database systems a re there-
fore highly tuned and very often eith e r reach the processor’s
peak performance or they are limited by some system char-
acteristics like memory bandwidth or latency, the number
⇤
This author works now at Oracl e Labs, Belmont, CA, USA.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise,
to republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Articles from this volume were invited
to present their results at ADMS’14, a workshop co-located with The 40th
International Conference on Very Large Data Bases, September 1st - 5th
2014, Hangzhou, China.
of ava i la b le cores, or down-scaled core frequencies due to
power and heat dissipation limitations.
Therefore, a high performance database system design
also depends on hardware specializations to achieve even
better performance. Unfortunately, gen era l purpose pro-
cessors are reaching their limits since single-threaded per-
formance has almost sto p ped to in c rea se, because the maxi -
mum c o re frequency is limited by physical constraints. Even
the current solution, to put more and more homogeneous
cores onto a single socket, wi ll also reach physical limita-
tions soon. As the feature size in which processors are man-
ufactured will shrink, the growing numb er of transistors will
increase the occurrence of dark silicon [4, 6]. Dark silicon is
caused by thermal problems, since supplying all transistors
with energy at the same time would overheat the processor.
Having an energy-efficient processor d es ig n and introducing
specialized instruc t io n sets that are usable on-demand, mit-
igates the impact of dark sili c o n . Especially the latter idea
can be implemented o n a fra c t io n of th e chip space, wh ere
the instruction set extensions can be power-gated whenever
needed without compromis in g the overall general purpose
characteristics of the chip itself.
To follow the idea of specialized instruction sets and to
push the envelope in database performance, we strive to
widen the view t owards adjusting processors for today’s and
future query p rocessing needs. Processors them sel ves ca n
be extended to a ll ow for a high er query throughput and
lower late n cy by adding sp ec ia l iz ed instruction sets, opti-
mized for supporting query processing primitives. Special
purpose hardware extensions for database operation s can
improve the performance of the supported a lg o rit h m s sig-
nificantly while – at the same time – saving energy in the
processor, mit ig a t in g the impact of dark silicon and there-
fore allowing a much h ig h er packing density of individual
cores due to goo d elec tri c a l an d th erma l p ro perties.
In previous work, we proposed an instruction set exten-
sion to acce lera t e set-oriented datab a se primitives [2]. In
this paper, we want to go furt h er with this novel way of
optimization by providing a specialize d instru c t io n set ex-
tension for database hashing primitives in combination with
a low-energy processor design . Hashing is widely used in
modern database s for join implementations, agg reg a ti o n op-
erators, as well as indexing. To support al l these op era t ors
with instruction set extensions, we investigate hashing prim-
itives like integer and string ha s h in g , as well as insert and
lookup operations. For all these operations, we use well-