First Name
Phone Number
Last Name
John
Smith
Figure 1: Simple hyperspace hashing in three di-
mensions. Each plane represents a query on a sin-
gle attribute. The plane orthogonal to the axis for
“Last Name” passe s through al l points for last_name
= ‘Smith’, while the other plane passes through all
points for first_name = ‘John’. Together they rep-
resent a line formed by the intersection of the two
search conditions; that is, all phone numbers for
people named “John Smith”. The two cubes show
regions of the space assigned to two different servers.
The que ry for “John Smith” needs to contact only
these servers.
op erations. In HyperDex, a search specifies a set of at-
tributes and the values that they must match (or, in the case
of numeric values, a range they must fall between). Hyper-
Dex returns objects which match the search. Each search
op eration uniquely maps to a hyperplane in the hyperspace
mapping. A search with one attribute specified map s to
a hyperplane that intersects th at attribute’s axis in exactly
one location and intersects all other axes at every point. Al-
ternatively, a search that specifies all attributes maps to
exactly one point in hyperspace. The hyperspace mapping
ensures that each additional search t erm potentially reduces
the number of servers to contact and guarantees that addi-
tional search terms will not increase search complexity.
Clients maintain a copy of th e hyperspace mapping, and
use it to deterministically execute search operations. A
client first maps the search into the hyperspace using the
mapping. It then determines which servers’ regions intersect
the resulting hyperplane, and issues the search request to
only those servers. The client may then collect matching
results from the servers. Because the hyperspace mapping
maps objects and servers into the same hyperspace, it is
never necessary to contact any server whose region does not
intersect the search hyp erplane.
Range queries correspond to extruded hyperplanes. When
an attribute of a search specifies a range of values, the cor-
responding hyperplane will intersect the attribute’s axis at
every point that falls between the lower and upper bounds of
the range. N ote that for such a scheme to work, objects’ rela-
tive orders for the attribute must be preserved when mapped
onto th e hyperspace axis.
Figure 1 illustrates a query for first_name = ‘John’ and
last_name = ‘Smith’. The query for first_name = ‘John’
k
v
1
v
2
v
3
v
4
v
5
. . .
v
D-1
v
D-1
v
D
hyperspace
k
v
1
v
2
v
3
v
4
v
5
. . .
v
D-1
v
D-1
v
D
key subspace
subspace 0 subspace 1 subspace S
Figure 2: HyperDex partitions a high-dim ensional
hyperspace into multiple low-dimension subspaces.
corresponds to a two-dimensional plane which intercepts the
first_name axis at the h ash of ‘John’. Similarly, the query
for last_name = ‘Smith’ creates another plane which inter-
sects the last_name axis. The intersection of the two planes
is the line along which all phone numbers for John Smith re-
side. Since a search for John Smith in a particular area code
defines a line segment, a H yperDex search needs to contact
only those nodes whose regions intersect that segment.
3. DATA PARTITIONING
HyperDex’s Euclidean space construction significantly re-
duces the set of servers that must be contacted to find match-
ing objects.
However, the drawback of coupling the dimensionality of
hyperspace with the number of searchable attributes is that,
for tables with many searchable attributes, the hyperspace
can be very large since its volume grows exponentially with
the number of dimensions. Covering large spaces with a grid
of servers may not be feasible even for large data-center de-
ployments. For example, a table with 9 secondary attributes
may require 2
9
regions or more to support efficient searches.
In general, a D dimensional hyperspace will need O(2
D
)
servers.
HyperDex avoids the problems associated with high-di-
mensionality by partitioning tables with many attributes
into multiple lower-dimensional hyperspaces called subspaces.
Each of these subsp aces uses a sub set of object attributes as
the dimensional axes for an independent hyperspace. Fig-
ure 2 shows how HyperDex can partition a table with D
attributes into multiple independ ent su bspaces. W hen per-
forming a search on a table, clients select the subspace that
contacts the fewest servers, and will issue the search to
servers in exactly one subspace.
Data partitioning increases the efficiency of a search by
reducing the dimensionality of the underlying hyperspace.
In a 9-dimensional hyperspace, a search over 3 attributes
would need to contact 64 regions of the hyperspace (and
thus, 64 servers). If, instead, the same table were parti-
tioned into 3 subspaces of 3 attributes each, the search will
never contact more than 8 servers in the worst case, and
exactly one server in the best case. By partitioning the ta-
ble, HyperDex reduces the worst case behavior, decreases
the number of servers necessary to maintain a table, and
increases the likelihood that a search is ru n on exactly one
server.
Data partitioning forces a trade-off between search gener-
ality and efficiency. On the one hand, a single hyperspace
can accommodate arbitrary searches over its associated at-
tributes. On the other hand, a hyperspace which is too large