Processing Relational Top-N Queries with Text and Numeric Attributes
Liang Zhu
1,a
, Bin Liu
1,b
, Guang Liu
2,c
, Quanlong Lei
1,d
1
Key Lab of Machine Learning and Computational Intelligence, School of Mathematics and
Computer Science, Hebei University, Baoding, Hebei 071002, China
2
College of Art, Hebei University, Baoding, Hebei 071002, China
a
zhu@hbu.edu.cn;
b
lliubbin163@163.com;
c
lg672@tom.com;
d
lqlfeng@126.com
Keywords: Relational Database, top-N Query, Semantic Distance, Numeric Distance, WordNet.
Abstract. Relational top-N queries with both text attributes and numeric attributes are useful in many
applications, by using the ranking functions based on both semantic distances for text attributes and
numeric distances for numeric attributes. In this paper, we propose an approach for processing such
type of top-N queries in relational databases. The basic idea of the approach is to create an index based
on WordNet to expand the tuple words semantically for text attributes and on the related information
of numeric attributes, meanwhile the size of the index increases linearly with the size of the database.
The results of extensive experiments show that our method is efficient and effective.
Introduction
A relational top-N (or ranking) query is to find a sorted set of N tuples that are the best but not
necessarily all answers to the query. Most of the researches involve numeric attributes with numeric
ranking functions [1, 2]. However, there are many applications where top-N queries are evaluated by
using both text attributes and numeric attributes, as demonstrated in the following example.
Example 1. Assume that a database BOOK of used books with schema: Books(isbn, title, author,
year, publisher, price). A user wants to find a book with title on “criminal law”, price about “$100”,
and year around “2005”, where title is a text attribute with semantics, and price and year are two
numeric attributes. Obviously, a book on “symphony” with price = “$100” and year = “2005” is not
the desired result for the user. However, another book on “penal code” with price = “$103” and year
= “2006” may be the need of the user.
For the type of relational top-N queries as shown in Example 1, we design a ranking function that
combines the semantic distances and numeric distances by employing statistics and training methods,
and then we create an index to process the queries in terms of semantic and numeric matching in
database search. Moreover, this work is a continuation of the work in [3] and [4]. [3] studied the
processing of relational ranking queries only with text attributes, without numeric attributes. The size
of the index in [4] does not increase linearly with the size of the database, and it then may not be
suitable for the database with the big size, or three or more numeric attributes. The method in this
paper will alleviate the limitations of the approach in [4].
Query Model
Assume that R(tid, A) is a relation/table with identifier tid, where A is a text attribute with semantics,
and S(idx, B
1
, B
2
, …, B
m
, …, FKid) is another relation with identifier idx, where B
1
, B
2
, …, B
m
are m
numeric attributes, and FKid is the foreign key referencing R.tid. Let R
0
= R S with S.FKid = R.tid.
Let t be a tuple in R
0
. Then t[A] = (tw
1
, tw
2
, …, tw
n
) is the word-string with n words on the text
attribute A, and t[B
j
] = b
j
is the numeric value on the attribute B
j
(1 ≤ j ≤ m). For simplicity, we denote
t = (t
A
, b
1
, b
2
, …, b
m
) where t
A
= (tw
1
, tw
2
, …, tw
n
), and call tw
i
a tuple word and b
j
a tuple value (1 ≤ i
≤ n, 1 ≤ j ≤ m). As described in [3], for each tuple word tw, we can obtain the set K(tw) of all kinship
words of tw by WordNet [3, 5], i.e., K(tw) includes the five kinds of words in WordNet: (1) word tw
Applied Mechanics and Materials Vols. 490-491 (2014) pp 1326-1329
© (2014) Trans Tech Publications, Switzerland
doi:10.4028/www.scientific.net/AMM.490-491.1326
All rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of TTP,
www.ttp.net. (ID: 60.4.163.23-16/01/14,05:54:28)