Which is Better for kNN Query Processing in the Cloud:
Sequential or Parallel
∗
Chong Zhang
1
, Xiaoying Chen
2
, Bin Ge
3
, Weidong Xiao
4
∗
Science and Technology on Information Systems Engineering Laboratory
National University of Defense Technology, Changsha 410073, China
+
Collaborative Innovation Center of Geospatial Technology, China
{
1
leocheung8286,
2
chenxiaoying1991}@yahoo.com
3
gebin1978@gmail.com,
4
wilsonshaw@vip.sina.com
ABSTRACT
With the development of various Cloud system, providing
powerful kNN query capability to DaaS (Database as a Ser-
vice) is an essential requirement for many applications. In
this paper, we are interested in two opposite approaches for
processing kNN query in Cloud system, parallel processing
and sequential processing, and we want to explore the an-
swer of which one performs better. For addressing such a
question, we devise a new distributed indexing structure VI-
HCO, which is characterized by fast locating Cloud nodes
capability. Then parallel and sequential processing methods
are designed upon the structure. For parallel one, we take
differential cells between two consecutive range queries into
consideration, and for sequential one, we elaborately design
an accurate message delivery algorithm. We verify our ideas
through experiments, which is conducted on both synthetic
and real dataset, and the results show that VIHCO outper-
forms a previous work RT-CAN, and the sequential method
is more efficient under small k query condition and small
system size, while parallel one suits for large k and large
scale of computing nodes.
Categories and Subject Descriptors
H.2.4 [Database Management]: Systems - query processing
General Terms
Algorithms, Measurement, Performance
Keywords
kNN query, Cloud, histogram, parallel, sequential
∗
This work is supported by NSF of China grant 61303062
and 71331008.
c
2016, Copyright is with the authors. Published in the Workshop Pro-
ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor-
deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this
paper is permitted under the terms of the Creative Commons license CC-
by-nc-nd 4.0
1. INTRODUCTION
With the development of Cloud computing, various lay-
ers of computing resources are used in terms of pay-as-you-
go, such as IaaS (Infrastructure as a Service), PaaS (Plat-
form as a Service) and SaaS (Software as a Service). Nowa-
days, Database as a Service (DaaS)[2][13] is a hot topic for
database community in Cloud computing era. For DaaS
users, it is not necessary to focus on the location of database
instance, nor the physical storage mechanism of schema or
tables, not even data partition fashion or query processing,
in one word, the inner of database is transparent to the
users. They just define the structure of table, and insertion,
query or other operations seem similar to use a centralized
local database. However, it is possible for one table, data are
spread over many computing nodes, and querying processing
needs the collaboration of these nodes. And as data volume
increases, the database should be adaptive to the new scale
and new query requirement, i.e., it should be elastic.
In this paper, we focus on kNN query in DaaS, which is
an essential function for spatial database. Given a point
in the space, kNN query aims to find k nearest objects to
the query point. This topic is addressed well in some previ-
ous works[13][9][14], however, there are two opposite ideas
to solve the problem in the state-of-the-art, namely, parallel
processing and sequential processing. Parallel method ex-
ploits the parallelism of Cloud nodes, and make them work
simultaneously, while sequential one uses the vicinity rela-
tionship between query point and Cloud nodes to accurately
deliver query messages. Nevertheless, which is better for
DaaS is not studied before. Hence, in this paper, we extend
our work in [14] to acquire the answer.
For comparing the two approaches, we use a previous work
RT-CAN as a baseline, and propose a new structure, called
VIHCO (VIcinity-based Hilbert Cloud Overlay), to index
spatial data in Cloud system and to process kNN query.
The feature of VIHCO is not only leveraged on fast look up
routing table (finger table), but also highlighted on vicinity
neighbors to quickly locate the nearby Cloud nodes. Based
on such structure, we present the designs of parallel and
sequential processing algorithms. Experiments on both syn-
thetic and real dataset show that VIHCO outperforms RT-
CAN, in efficiency and scalability, and the sequential method
is more proper under small k query condition and small sys-
tem size, while parallel one suits for large k and large scale
of computing nodes.