Speed up distance-based similarity query using
multiple threads
Fuli Lei
College of Computer Science and Software Engineering
Shenzhen University, China
fuli_ray@yeah.net
Wenbo Wu
Department of Statistics
University of Georgia, USA
wuwenbo@uga.edu
Qiaozhi Li, He Zhang, Ping Li,Qiuming Luo, Rui Mao
College of Computer Science and Software Engineering
Shenzhen University, China
{2010150108, 2120230503, 2130230416, lqm, mao}@szu.edu.cn
Rui Mao is the corresponding author
Abstract—Metric-space indexing, also known as distance-
based indexing, is a universal indexing to support similarity
queries. It only requires that the similarity of data be defined
by a metric distance function. To achieve the great
universalness, metric-space indexing does not take use of the
domain information of data, and is thus outperformed by many
domain-specific methods. In this paper, to speed up metric-
space similarity query, we first assign one thread for each
query in the multi-query case to increase the throughput.
Then, for a single query, we assign one thread for each search
path from the root of the index tree to decrease the responding
time. Last but not least, we implement an in-memory buffer
to break the bottleneck of the disk access to the index file.
Experimental results show that our efforts result in good speed
up and parallel efficiency.
Keywords—metric space; similarity searching; multi-thread
I.
I
NTRODUCTION
In many database applications, similarity searching, an
operation like finding out all similar objects to a given
object from the database is gaining its popularity today. In
the application we firstly should give out a query object, a
distance value and a database, then find out all data, whose
distance to the query object is less than or equal to the
distance value given before. Many existing applications like
text matching, finding out similar pictures, DNA/RNA or
protein sequences from databases are all belong to similarity
searching domain, more and more researchers pay their
effort to similarity searching research to find out a general
high performance method which can be used to all kinds of
similarity searching applications. There have been some
different research achievements of similarity searching on
different type of data. In one-dimensional data space, index
structures like binary-search tree, B-tree[1], RB-tree[2] are
commonly used, while in multi-dimensional or high-
dimensional spaces, K-D tree[3], R-tree[4] and some other
dimension reduction methods like space filling curve are
used to speed up the similarity search. All of the above
similarity searching technologies has their own advantages
and disadvantages and each of them is developing on its
own way. Metric space indexing is distance based
indexing[5-7], which use relative distance among data
objects to support similarity searching and it can be used in
all data types introduced above, so metric space searching is
also called as general similarity searching. In face of dataset
in which data cannot be represented as multi-dimensional
data form with coordinates, metric space similarity searching
technology is the only way to be used. Index structures of
metric space similarity searching include GH-tree, CR-tree,
and VP-tree. The index structure in the system to be
paralleled in our work is a kind of VP-tree[8].
Metric space indexing is a widely used general method
to similarity searching application. Objects are abstracted to
points of metric space, and then pruning all the irrelevant
points through the triangle inequality of the distance
function without have to calculate the distance between all
objects accommodating in space and the query object. Index
structures in metric space include HP-tree, CR-tree and VP-
tree. In metric space, similarity between any two objects is
represented with a distance value calculated using distance
functions satisfying metric axioms. Metric space could be
represented as a pair(X, d), where X is a non-empty data
objects set, d is a metric distance function, which satisfies
the following property:
( 1 ) d
(
x, y
)
= d(y, x) symmetry