QueryMed: An Intuitive SPARQL Query Builder for
Biomedical RDF Data
Oshani Seneviratne
Massachusetts Institute of Technology
Cambridge, MA
USA
oshani@csail.mit.edu
Rachel Sealfon
Massachusetts Institute of Technology
Cambridge, MA
USA
rsealfon@csail.mit.edu
ABSTRACT
We have developed an open-source SPARQL query builder
and result set visualizer for biomedical data, QueryMed,
that allows end users to easily construct and run transla-
tional medicine queries across multiple data sources.
QueryMed is flexible enough to allow queries relevant to
a wide range of biomedical topics, runs queries across mul-
tiple SPARQL endpoints, and is designed to be accessible
to users who d o not know the structure of the und erl yi n g
ontologies used in describing the datasets, or the SPARQL
query language t o query the data. The system allows users
to select the d a t a sources that they wish to use, drawing
on their specialized domain knowledge to decide the most
appropriate data sources to query. Users c a n add additional
data sources if they are interested in querying endpoints that
are not in the default list. After retrieval of the initial result
set, query results can be filtered to imp rove their relevance.
The system also allows the user to exploit the underlying
structure of the RDF data to improve query result s .
Categories and Subject Descriptors
J.3 [Life and Medical Sciences]: Computer Applications;
H.3.3 [Information Search and Retrieval]: Information
Systems
Keywords
Biomedical Ontologies, SPARQL, Query Federation, Query
Building, Semantic Web, User Interfaces
1. INTRODUCTION
The quantity of publicly available data in the biomedical
domain has dramatically increased over recent years. Pub-
licly avai la b l e biomedica l resources include data on drug dis-
covery [?, ?], clin i ca l trials, diseases, disease genes, and phe-
notypes. With the linked open data movement, the semantic
web community has been very proactive in converting these
rich information resources to RDF [?]. In fact, the biomedi-
cal domain is among the early successes o f the sema ntic web
due to the rapidity with which the community has made its
data available in RDF triple stores [18].
To allow end users to exploit the abun d a n ce of useful
biomedical data that is currently available in RDF, there is a
need for easy-to use systems that do not require the end user
Copyright is held by the author/owner(s).
WWW2010, April 26-30, 2010, Raleigh, North Carolina.
.
to have knowledge of the underlying structure of the data,
and that also allow users to run federated queries on multiple
SPARQL endpoints. There is also a need for efficient hybrid
interfaces that a ll ow both browsing a n d querying [?], since
many currently avail abl e systems are linked data browsers
such as the Tabulator [ ? ], which allow a user to navigate the
data in an exp lo ra t o ry manner but lack support for filtering
and querying the data.
Answering many medically and biologically relevant ques-
tions requires searching, filtering, and combining informa-
tion from multiple endpoints. For example, a physician may
know her patient’s personal information, symptoms, current
medications, and genotype. She may wish to determine the
patient’s treatment plan and identify clinical trials for which
the patient is eligible. Although the physician has a single
question–“based on the information I have about this pa-
tient, what is the best treatment plan and set of clinical tri-
als available?”–the re is no single d a t a source that the physi-
cian can use to answer this question. The information that
the physician needs must be gathered from numerous da ta
sources such as Pubmed, DailyMed, Drug b a n k , LinkedCT,
Diseasome, and GO [7, 1, 3, 6, 2, 4]. Her question must be
broken up into discrete pieces that can be execut ed individ-
ually at one data source a t a time.
Since the physician must search many databases in or-
der to find an answer to her single q u est i on, she requires
a system tha t can automatically ru n queries over multiple
data sources. Also, the physician may not know SPARQL
query syntax, the location of the SPARQL endpoints, or the
structure of the relevant ontologies. She is likely to want an
intuitive way to query and to display the query result. De-
veloping intuitive ways to query multiple data sources and
display results is both an important and a challenging prob-
lem. Our system, QueryMed, allows users with no knowl-
edge of the SPARQL query language or the structure of the
underlying ontologies to easily run queries across multiple
SPARQL endpoints.
This paper is organized as follows: Section 2 provides
background information on the semantic web and its rel-
evance for the biomedical d o m a in . Section 3 describes our
system. Section 4 discusses related work and illustrates how
QueryMed differs from previous systems. Finally, section
5 outlines future work and su mm a riz es the contributions of
our system.
2. BACKGROUND
The semantic web can be viewed as a global database sys-
tem for the informa ti o n available on the world wide web.