LotusX: A Position-Aware XML Graphical Search
System with Auto-Completion
Chunbin Lin
1
, Jiaheng Lu
1
, Tok Wang Ling
2
, Bogdan Cautis
3
1
Key Laboratory of Data Engineering and Knowledge Engineering, MOE
Renmin University of China
{chunbinlin, jiahenglu}@ruc.edu.cn
2
School of Computing, National Universtiy of Singapore
lingtw@comp.nus.edu.sg
3
T´el´ecom ParisTech
cautis@telecom-paristech.fr
Abstract— The existing query languages for XML (e.g.,
XQuery) require professional programming skills to be for-
mulated, however, learning such complex query languages is
a tedious and a time consuming process that can be very
challenging especially to novice users. In addition, when issuing
an XML query, users are required to be familiar with the
content (including the structural and textual information) of
the hierarchical XML, which is difficult for common users. The
need for designing user-friendly interfaces to reduce the burden
of query formulation is fundamental to the spreading of XML
community.
We present a twig-based XML graphical search system, called
LotusX, that provides a graphical interface to simplify the query
processing without the need of learning query languages, data
schemas, nor the knowledge of the content of the XML document.
The basic idea is that LotusX proposes “position-aware” and
“auto-completion” features to help users to create tree-modeled
queries (twig pattern queries) by providing the reasonable can-
didates on-the-fly. In addition, complex twig queries (including
order-sensitive queries) are supported in LotusX. Furthermore,
a new ranking strategy and a query rewriting solution are
implemented to rank the results and automatically rewrite
queries, respectively. We provide an online demo for LotusX
system: http://datasearch.ruc.edu.cn:8080/LotusX
I. Introduction
XML plays an important role in information exchange
nowadays. As a result, a wide spectrum of users, including
those with minimal or no computer programming skill at all,
have the need to query hierarchical XML. Therefore, designing
effective and efficient systems to simplify the query processing
over XML documents attracts lots of research interests. The
well known XML query languages (e.g., XQuery) are provided
to process XML queries. However, these languages are far too
<bib>
{ for $b in doc (‘‘bib.xml’’)/bib/book
where $b//publisher=‘‘Thomas S. Huang’’
and ($b/year>1999 or $b/year <2010)
and ($b/price>30 and $b/price<50 )
return <book> { $b/title } </book> }
</bib>
book
“>1999 or <2010”“<50 and >30”“Thomas S. Huang’’
(a) Xquery expression (b) Twig Pattern Query
price
year
publisher
title
Fig. 1. The XQuery and twig pattern expression of the query.
complicated for unskilled users, who might only be aware of
the basics of the XML data model or even lack the knowledge
of the content (i.e., structural and textual information) of the
XML documents.
For example, assume that a user wants to issue the fol-
lowing query “List the title of books written by Thomas S.
Huang and published before 1999 or later than 2010, and
the price should be distributed in 30 ∼ 50 dollars”. This
query can be formulated as the XQuery expression in Figure
1(a). Unfortunately, formulating such query often demands
considerable cognitive effort from the end users and requires
“programming” skills that is at least comparable to SQL,
which can be both time-consuming and error-prone. In order
to deal with the problem, XML graphical languages are
developed (e.g., XQBE[3], GLASS [6]) to allow the users,
who do not know the professional query languages, to express
queries. They allow users to create queries through simple
graphical languages and then map the queries directly to
XQuery in the background. However, (i) users are required
to learn the syntax of the graphical languages, furthermore,
(ii) users need to have the knowledge of the structural (i.e.,
the parent-child (P-C) and ancestor-descendant (A-D) relation)
and textual (i.e., node names and values) information of the
XML documents, since the content of each node in the query
should be input by users instead of the systems. E.g., when
issuing the query in Figure 1, the user needs to know the
name of the publisher is “Thomas” rather than “Thomason”
(i.e., textual information) and the price is a child of the book
(i.e., structural information).
In order to simplify the query processing, (i) XML keyword
search systems are proposed (e.g., XReal [2]), which return the
subtrees containing all the keywords. However, keywords can
only express simple textual information but cannot describe
the structural information and complex content. For example,
these systems cannot answer the query in Figure 1, since
keywords cannot describe the structures (e.g., year is a child
of book) and the content conditions (e.g., “year>1999 or year
<2010”). (ii) Visual search systems are implemented (e.g.,
Xing[4]). They present the structural and textual information
of the document in visual interfaces, which allows the users to