2015 IEEE International Conference on Big Data (Big Data)
978-1-4799-9926-2/15/$31.00 ©2015 IEEE 1868
Spatio-temporal Queries in HBase
Xiaoying Chen
1
Chong Zhang
2
Bin Ge
3
Weidong Xiao
4
Science and Technology on Information Systems Engineering Laboratory
National University of Defense Technology
Changsha 410073, P.R.China
{
1
chenxiaoying1991,
2
leocheung8286}@yahoo.com
3
gebin1978@gmail.com
4
wilsonshaw@vip.sina.com
Abstract—Geoscience gives insights into our surroundings
and benefits many aspects of our life. Nowadays, with massive
sensors deployed to sense all kinds of parameters for envi-
ronments, tens of billions, even trillions of sensed data are
collected and need to be analyzed for surveillance or other
purposes. From many perspectives, users always issue queries
according to specific spatial and temporal predicates. For these
applications, relational databases are overwhelmed by the large
scale and high rate insertions, and NoSQL database could be
considered a feasible solution. HBase, a popular key-value store
system, is capable to solve the storage problem, but fails to
provide in-built spatio-temporal querying capability.
Many previous works tackle the problem by designing
schema, i.e., designing row key and column key formation for
HBase, which we don’t believe is an effective solution. In this
paper, we address this problem from nature level of HBase,
and propose an index structure as a built-in component for
HBase. STEHIX (Spatio-TEmporal Hbase IndeX) is adapted
to two-level architecture of HBase and suitable for HBase to
process spatio-temporal queries. It is composed of index in
the meta table (the first level) and region index (the second
level) for indexing inner structure of HBase regions. Base
on this structure, two common queries, range query and
kNN query are solved by proposing algorithms, respectively.
For achieving load balancing and scalable kNN query, two
optimizations are also presented. We implement STEHIX and
conduct experiments on real dataset, and the results show our
design outperforms a previous work in many aspects.
Keywords-spatio-temporal query; HBase; range query; kNN
query; load balancing
I. INTRODUCTION
With development of Geoscience, especially in Global
Positioning System (GPS) and Remote Sensing (RS), the
volume of spatio-temporal data accumulated to TB, even PB
or EB. After storing spatio-temporal data, users always have
requests of querying data by specific spatial and temporal
predicate, which requires efficient storage and retrieval capa-
bility. Traditional database management systems (DBMSs)
have advantage of data organization and are equipped with
multi-dimensional index structures. However, dealing with
large scale of data, they are incapable in high rate insertion
and real time query. On the other hand, HBase [1], a key-
value store system, can effectively support large scale data
*This work is supported by NSF of China grant 61303062
operations, but do not natively support multi-attribute index,
which limits the rich query applications.
A. Motivation
Our motivation is to adapt HBase to efficiently process
spatio-temporal queries. Although some previous works pro-
posed distributed index on HBase, but these works only
consider spatial dimension, more critically, most of these
works only concern how to design schema for spatial data,
which do not tackle the problem from the nature level of
HBase, except one, MD-HBase [2] is designed to add index
structure into the meta table, however, it doesn’t provide
index to efficiently retrieve the inner data of HBase regions.
Our solution, STEHIX (Spatio-TEmporal Hbase IndeX), is
built on two-level lookup mechanism, which is based on
the retrieval mechanism of HBase. First, we use Hilbert
curve to linearize geo-locations and store the converted one-
dimensional data in the meta table, and for each region,
we build a region index indexing the StoreFiles in HBase
regions. We focus on range queries and kNN queries for
such environment in this paper.
B. Contributions
We address how to efficiently answer range and k nearest
neighbor (kNN) queries on spatio-temporal data in HBase.
Our solution is called STEHIX (Spatio-TEmporal Hbase
IndeX), which fully takes inner structure of HBase into
consideration. The previous works focus on building index
based on the traditional index, such as R-tree, B-tree, while
our method constructs index based on HBase itself, thus, our
index structure is more suitable for HBase retrieval. In other
way, STEHIX considers not only spatial dimension, but also
temporal one, which is more in line with user demand.
We use Hilbert curve to partition space as the initial
resolution, the encoded value of which is used in the meta
table to index HBase regions, then we use quad-tree to
partition Hilbert cells as the finer resolution, based on this,
we design region index structure for each region, which
contains the finer encoded values for indexing spatial dimen-
sion and time segments for indexing temporal dimension.
And later, we show such two-level index structure, meta
table + region index, is more suitable for HBase to process