
26 1541-1672/15/$31.00 © 2015 IEEE IEEE INTELLIGENT SYSTEMS
Published by the IEEE Computer Society
Question Answering
over Knowledge Bases
Kang Liu, Jun Zhao, Shizhu He, and Yuanzhe Zhang, Institute of Automation,
Chinese Academy of Sciences
Previous research on
question answering
over knowledge bases
has focused on a
constrained domain,
but with the increase
in existing knowledge
bases, understanding
and translating it is
challenging.
appropriate answers. To fulll this aim, aca-
demics and industry researchers have put
forth more efforts in knowledge bases (KBs),
where information is organized in a net struc-
ture and semantic relations can be effectively
reected. Semantic items in the text, including
entities, classes, and their semantic relations,
can be extracted from the raw data—an-
swers corresponding to users’ questions can
be grasped through direct matching in the KB.
Several KBs have been constructed and
published, such as DBpedia,
2
Freebase,
3
and YAGO.
4
These KBs usually have com-
plex structures and are highly heteroge-
neous—accessing them is a big obstacle for
the task of question answering over KBs.
Although structured query languages (such
as SPARQL) have been designed and pro-
vided for visiting these structured data, only
a few experts and developers know how to
use them. In contrast, common users usually
raise questions in natural language forms.
Therefore, determining how to translate
natural language questions into structured
language-based queries is the core goal of
question answering over KBs, which has at-
tracted a lot of attention lately.
5–10
For ex-
ample, with respect to the question, “Which
software has been developed by organiza-
tions founded in California?,” the aim is to
automatically convert this utterance into a
SPARQL query that contains the following
subject-property-object (SPO) triple format:
SELECT DISTINCT ?uri
WHER E{
?uri rdf:type dbo:Software.
?uri dbo:developer ?x1.
?x1 rdf:type dbo:Company.
?x1 dbo:foundationPlace
dbr:California.}
The key of such translation is to under-
stand the meaning of the question. The
dominant methods usually convert a natu-
ral language question into a complete and
formal meaning representation (FMR) rst,
such as logical form. Based on FMR, the
structured query is then smoothly generated.
However, completing this aim isn’t trivial.
Four questions should be addressed:
• How do we represent the meaning of
questions grounded to a specic KB? This
meaning representation should reect the
corresponding concepts in the KB and or-
ganize them according to their semantic re-
lations in the question. The representation
D
eep Web search is on the cusp of a profound change, from simple docu-
ment retrieval to natural language question answering (QA).
1
Ultimately,
search needs to precisely understand the meanings of users’ natural language
questions, extract useful facts from all information on the Web, and select
Natural laNguage ProcessiNg