SPATIO-TEMPORAL KEYWORDS QUERIES IN HBASE 83
overlay network CAN, and only clustering information of local index is issued to the
entire overlay network through the CAN interface. The experimental results show
that VF-CAN reduces the index storage space and improves query performance
effectively.
For multi-dimensional data applications, Nishimura et al. [9] proposes MD-HB-
ase, a scalable multi-dimensional data store supporting efficient multi-dimensional
range and nearest neighbor queries. MD-HBase layers a multi-dimensional index
structure over a range partitioned key-value store. Using a design based on lin-
earization, its implementation layers standard index structures like K-D trees and
Quad trees.
As we have mentioned above, many applications not only require finding objects
closest to a specified location, but also that containing a set of keywords. Felipe
et al. [6] present an efficient method to answer top-k spatial keyword queries. It
adopts a structure called IR
2
-Tree (Information Retrieval R-Tree) which combines
an R-Tree with superimposed text signatures. Algorithms are proposed to construct
and maintain an IR
2
-Tree, and answer top-k spatial keyword queries. With exper-
imentally comparison with current methods, superior performance and excellent
scalability of the algorithms are shown.
Cong et al. [5] propose a novel indexing framework for location top-k text re-
trieval. It integrates the inverted file for text retrieval and the R-tree for spatial
proximity query. Several hybrid indexing approaches are explored within the frame-
work. Algorithms utilize the proposed indexes to compute the top-k query by taking
into account both text relevancy and location proximity to prune the search space.
In practical applications, it is need to consider the nature of the movement of
objects. Traditional indexes have good query performance but can not handle this
data processing problem in that they are not efficient for update which is crucial
for an index for moving objects, as they change their position frequently. Jensen et
al. [7] represent moving-object locations as vectors that are time stamped based on
their update time. Then B
+
-tree based data structure is adopted to index moving
objects according to their time stamp and otherwise preserves spatial proximity.
This method supports range and nearest neighbor queries, as well as continuous
queries. B
x
-tree which is based on the B
+
-tree, outperforms the TPR-tree in range
queries and kNN queries concerning the current or near-future positions of the
objects.
3. Problem definition. In this section, we first present the definitions for spatio-
temporal keyword data and queries, and then describe the HBase logical table. For
simplicity, only two-dimensional space is considered in this paper, however, our
method can be directly extended into higher dimensional space.
Spatio-temporal data. A record r of spatio-temporal data can be denoted as
hx, y, t, W , Ai, where (x, y) means the geo-location of the record, t means the valid
time when the data is produced, W is a set of keywords, W ={w
1
, w
2
, . . . , w
n
}, A
represents other attributes, such as user-id, object’s shape, etc.
Spatio-temporal keyword query. Given a set U
st
of records of spatio-
temporal data, a spatio-temporal keyword (STK) query Q=hR
q
, t
s
, t
e
, W
q
i, where
R
q
=(x
l
, y
l
, x
u
, y
u
) means a query range with lower-left coordinate (x
l
, y
l
) and
upper-right coordinate (x
u
, y
u
), W
q
={w
q
1
, w
q
2
, . . . , w
q
m
} is a set of query key-
words, aims to find a subset S
st
={r
s
| r
s
∈ U
st
} satisfying that:
r
s
.(x, y) ∈ R
q
r
s
.t ∈ [t
s
, t
e
]