Processing Top-N Queries based on p-Norm Distances
Liang Zhu
1,a
, Feifei Liu
1,b
, Wu Chen
1,c
, Qin Ma
2,a
1
Key Lab of Machine Learning and Computational Intelligence, School of Mathematics and
Computer Science, Hebei University, Baoding, Hebei 071002, China
2
Department of Foreign Language Teaching and Research, Hebei University, Baoding, Hebei
071002, China
a
{zhu, maqin}@hbu.edu.cn;
b
liufeifei9476@126.com;
c
chenwu@cmc.hbu.cn
Keywords: Top-N query, p-norm distance, ranking function
Abstract. Top-N queries are employed in a wide range of applications to obtain a ranked list of data
objects that have the highest aggregate scores over certain attributes. The threshold algorithm (TA)
is an important method in many scenarios. However, TA is effective only when the ranking function
is monotone and the query point is fixed. In the paper, we propose an approach that alleviates the
limitations of TA-like methods for processing top-N queries. Based on p-norm distances as ranking
functions, our methods utilize the fundamental principle of Functional Analysis so that the
candidate tuples of top-N query with a p-norm distance can be obtained by the Maximum distance.
We conduct extensive experiments to prove the effectiveness and efficiency of our method for both
low-dimensional (2, 3 and 4) and high-dimensional (25,50 and 104) data.
Introduction
The efficient processing of top-N queries is important in many applications that involve massive
amounts of data. Top-N queries have been studied and obtained outstanding achievements since late
1990s. The threshold algorithm (TA) [1] is one of representative and crucial algorithms. TA has
following three characteristics: (1) the query point is fixed, (2) TA scans the sorted index lists
unidirectionally, and (3) the ranking function is monotone. A function f(x) is monotone if f(x) ≤ f(y)
whenever x
i
≤ y
i
for every i [1]. In many cases, however, the conditions of TA are not satisfied.
Example 1 illustrates a situation.
Example 1. Consider a database system of used books with schema Usedbooks(id#, title, author,
year, price), the top-50 query Q with (year = 2000, price = $50) and the ranking function is the
Manhattan distance [2] between a query point and a tuple t. □
In Example 1, if min(year) < 2000 < max(year) and min(price) < 50 < max(price), the ranking
function, the Manhattan distance, is not monotone. Thus, TA is not applicable [3].
Because the monotonicity has very good properties, most proposed techniques take into account
monotone ranking functions and the monotonicity of ranking functions plays a central role in
processing top-N queries; for instance, the threshold algorithm (TA) [1] and its family (say, [4, 5, 6,
etc.]). Using nonmonotone ranking functions in top-N queries is a challenge since they cannot
benefit from special properties of monotone functions that facilitate early termination [7]. For this
challenge, we develop an approach using the principle of Functional Analysis to transform a generic
p-norm distance to the Maximum distance [2]. The query model in [2, 8, 9] is most related to the
model in this paper. As a particular case of our query model, the methods in [2, 8, 9] are different
from our algorithms, and in fact they are not the members of TA family, but of the Filter-Restart
category as described in [7].
Applied Mechanics and Materials Vols. 490-491 (2014) pp 1293-1297
© (2014) Trans Tech Publications, Switzerland
doi:10.4028/www.scientific.net/AMM.490-491.1293
All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of TTP,
www.ttp.net. (ID: 60.4.163.23-16/01/14,05:52:35)