616 X.Y. Chen, C. Zhang, B. Ge, W.D. Xiao
2 Related Works
To overcome the drawbacks of traditional RDBMS, as an attractive alternative for large-
scale data processing, Cloud storage system currently adopts a hash-like approach to retrieve
data that only support simple keyword-based queries, but lacks various forms of information
search. For data processing operations, several cloud data managements (CDMs), such as HBase,
are developed. HBase, as NoSQL databases, is capable to handle large scale storage and high
insertion rate, however, it does not offer much support for rich index functions. Many works
focus on this point and propose various approaches.
Nishimura et al. [5] address multidimensional queries for PaaS by proposing MD-HBase. It
uses k-d-trees and quad-trees to partition space and adopts Z-curve to convert multidimensional
data to a single dimension, and supports multi-dimensional range and nearest neighbor queries,
which leverages a multi-dimensional index structure layered over HBase. However, MD-HBase
builds index in the meta table, which does not index inner structure of regions, so that scan
operations are carried out to find results, which reduces its efficiency.
Hsu et al. [6] propose a novel Key formulation scheme based on R
+
-tree, called KR
+
-tree,
and based on it, spatial query algorithm of kNN query and range query are designed. Moreover,
the proposed key formulation schemes are implemented on HBase and Cassandra. With the
experiment on real spatial data, it demonstrates that KR
+
-tree outperforms MD-HBase. KR
+
-
tree is able to balance the number of false-positive and the number of sub-queries so that it
improves the efficiency of range query and kNN query a lot. This work designs the index according
to the features found in experiments on HBase and Cassandra. However, it still does not consider
the inner structure of HBase.
Zhou et al. [7] propose an efficient distributed multi-dimensional index (EDMI), which con-
tains two layers: the global layer divides the space into many subspaces adopting k-d-tree, and
in the local layer, each subspace is associated to a Z-order prefix R-tree (ZPR-tree). ZPR-tree
can avoid the overlap of MBRs and obtain better query performance than other Packed R-trees
and R
∗
-tree. This paper experimentally evaluates EDMI based on HBase for point, range and
kNN query, which verifies its superiority. Compared with MD-HBase, EDMI uses ZPR-tree in
the bottom layer, while MD-HBase employs scan operation, so that EDMI provides a better
performance.
Han et al. [8] propose HGrid data model for HBase. HGrid data model is based on a hybrid
index structure, combining a quad-tree and a regular grid as primary and secondary indices,
supports efficient performance for range and kNN queries. This paper also formulates a set of
guidelines on how to organize data for geo-spatial applications in HBase. This model does not
outperform all its competitors in terms of query response time. However, it requires less space
than the corresponding quad-tree and regular-grid indices.
HBaseSpatial, a scalable spatial data storage based on HBase, proposed by Zhang et al. [9].
Compared with MongoDB and MySQL , experimental results show it can effectively enhance the
query efficiency of big spatial data and provide a good solution for storage. But this model does
not compare with other distributed index method.
All the previous works we have mentioned above only consider the spatial query. For moving
objects, a certain type of geo-spatial applications, requires high update rate and efficient real-
time query on multi-attributes such as time-period and arbitrary spatial dimension. Du et al. [10]
present hybrid index structure based on HBase, using R-tree for indexing space and applying
Hilbert curve for traversing approaching space. It supports efficient multi-dimensional range
queries and kNN queries, especially it is adept at skewing data compared with MD-HBase and
KR
+
-tree. As this work focus on moving objects, it is different for our goal, and it also does not
take the inner structure of HBase into account.