810 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 41, NO. 4, JULY 2011
Correspondence
Falcons Concept Search: A Practical Search Engine for
Web Ontologies
Yuzhong Qu and Gong Cheng
Abstract—Web ontologies provide shared concepts for describing do-
main entities and thus enable semantic interoperability between applica-
tions. To facilitate concept sharing and ontology reusing, we developed
Falcons Concept Search, a novel keyword-based ontology search engine.
In this paper, we illustrate how the proposed mode of interaction helps
users quickly find ontologies that satisfy their needs and present several
supportive techniques including a new method of constructing virtual
documents of concepts for keyword search, a popularity-based scheme
to rank concepts and ontologies, and a way to generate query-relevant
structured snippets. We also report the results of a usability evaluation
as well as user feedback.
Index Terms—Indexing, ontology ranking, ontology search, snippet gen-
eration, virtual document.
I. INTRODUCTION
The Semantic Web is targeted at facilitating data integration across
Web applications. Semantic Web data are formatted according to
Resource Description Framework (RDF), a triple/graph-based way
to represent information. Furthermore, Web ontologies described in
RDF Vocabulary Description Language (RDFS) and the Web On-
tology Language (OWL) provide shared concepts, i.e., classes and
properties, for describing domain entities and thus enabling semantic
interoperability of different applications. Semantic interoperability
depends on reusing or extending existing ontologies when developing
new applications. Therefore, ontology search becomes a fundamental
service for application developers.
In recent years, several ontology search engines have been devel-
oped; some of which are still accessible [1]–[3]. Similar to traditional
Web search engines, these systems accept keyword queries and re-
turn matched concepts and/or ontologies. However, for the returned
results, they usually provide either only basic metadata (e.g., a human-
readable name of each concept) or all the related RDF description,
both of which cannot help users efficiently determine whether a
concept/ontology returned satisfies their needs.
We developed Falcons Concept Search,
1
a novel keyword-based
ontology search engine, as part of the Falcons system. It retrieves
concepts whose textual description is matched with the terms in the
keyword query and ranks the results according to both query relevance
and popularity of concepts. The popularity is measured based on a
large data set collected from the real Semantic Web. Each concept
Manuscript received November 13, 2008; revised April 29, 2009; accepted
January 22, 2010. Date of publication May 2, 2011; date of current version
June 21, 2011. This work was supported in part by the National Natural Science
Foundation of China under Grants 60773106 and 60973024. This paper was
recommended by Editor W. Pedrycz.
Y. Qu is with the State Key Laboratory for Novel Software Technology,
Nanjing University, Nanjing 210093, China (e-mail: yzqu@nju.edu.cn).
G. Cheng is with the School of Computer Science and Engineering,
Southeast University, Nanjing 210096, China (e-mail: gcheng@seu.edu.cn).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSMCA.2011.2132705
1
http://ws.nju.edu.cn/falcons/conceptsearch/
returned is associated with a query-relevant structured snippet, indi-
cating how the concept is matched with the keyword query and also
briefly clarifying its meaning. Meanwhile, the system recommends
several query-relevant popular ontologies, which can be used by users
to restrict the results to the ones in a specific ontology. Within such a
mode of interaction, users can quickly compare ontologies and deter-
mine whether these ontologies satisfy their needs by checking query-
relevant concepts as well as their contexts, i.e., structured snippets.
The system also provides the detailed RDF description of each concept
and a summary of each ontology on demand. A demonstration of the
system is given in the following.
A. System Demonstration
Suppose that a user wants to describe some students studying at
some university. The user submits a keyword query “student uni-
versity” to the system and obtains result page, as shown in Fig. 1.
The bottom area presents the concepts returned. For each concept,
the first line gives its name (label or local name) and type. The user
can click on the name to browse its detailed RDF description. Below
that, a structured snippet, consisting of part of the RDF description
of the concept that is matched with the terms in the keyword query,
is presented to help the user quickly determine its relevance. RDF
description, followed by its provenance, is marked by stars if it comes
from the RDF document retrieved by dereferencing the Uniform
Resource Identifier (URI) of the concept. The URI is also presented
below the snippet, followed by a number indicating how many RDF
documents this concept is mentioned, which is linked to a list of these
documents for further browsing.
The top area of this page recommends nine ontologies. The user
can select one of them, e.g., Semantic Web for Research Communities
(SWRC), to restrict the search to that ontology. Then, as shown in
Fig. 2, the concepts returned are filtered to include only those in the
SWRC ontology. The user immediately finds that the SWRC ontology
contains a “Student” class, a “University” class, and a “student”
property, which are structurally related to each other in the ontology,
as shown in their snippets. Consequently, the user determines to reuse
this ontology; otherwise, the user can also select other ontologies and
compare them.
II. S
YSTEM ARCHITECTURE
Fig. 3 presents the architecture of the system.
The multithreading crawler dereferences URIs with content negoti-
ation (accepting only application/rdf+xml) and downloads RDF docu-
ments, which are then parsed by Jena (jena.sourceforge.net).
The URIs newly discovered in these documents are submitted to the
URI repository for further crawling. Initially, the URI repository is
fed with seed URIs obtained from several online ontology repositories
such as pingthesemanticweb.com and schemaweb.info,as
well as retrieving the Swoogle search engine and Google search engine
(for “filetype:rdf” and “filetype:owl”) with keyword queries randomly
generated according to the category names of the Open Directory
Project (dmoz.org). At the time of writing, 21.6 million well-
formed RDF/XML documents have been downloaded and processed,
containing 2.9 billion RDF triples. In the data set, 2 868 214 million
classes and 264 315 properties have been identified, coming from
12 467 ontologies.
1083-4427/$26.00 © 2011 IEEE