A Joint Model for Question Answering over Multiple Knowledge Bases
Yuanzhe Zhang, Shizhu He, Kang Liu, Jun Zhao
National Laboratory of Pattern Recognition (NLPR)
Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
{yzzhang, shizhu.he, kliu, jzhao}@nlpr.ia.ac.cn
Abstract
As the amount of knowledge bases (KBs) grows rapidly, the
problem of question answering (QA) over multiple KBs has
drawn more attention. The most significant distinction be-
tween multiple KB-QA and single KB-QA is that the former
must consider the alignments between KBs. The pipeline s-
trategy first constructs the alignments independently, and then
uses the obtained alignments to construct queries. However,
alignment construction is not a trivial task, and the introduced
noises would be passed on to query construction. By contrast,
we notice that alignment construction and query construction
are interactive steps, and jointly considering them would be
beneficial. To this end, we present a novel joint model based
on integer linear programming (ILP), uniting these two pro-
cedures into a uniform framework. The experimental results
demonstrate that the proposed approach outperforms state-
of-the-art systems, and is able to improve the performance of
both alignment construction and query construction.
Introduction
With the continued growth of knowledge bases (KBs) on the
web, how to access such precious intellectual resources be-
comes increasingly important (Unger, Freitas, and Cimiano
2014). Knowledge base based question answering (KB-QA)
just focuses on this problem and is able to use natural lan-
guage as query language. Therefore, it has received more
attention in recent years.
The key problem in KB-QA is to convert natural lan-
guage questions into structured queries, such as SPARQL
queries. There are many researches that focus on this prob-
lem, and most of them are single KB-QA (Frank et al. 2007;
Zettlemoyer and Collins 2005; 2007; 2009; Kwiatkowski et
al. 2011; 2013). They often assume that the answers could
be acquired from a single KB. However, it is almost un-
practical that using a single KB could cover all questions.
A plenty of KBs exist on the web and they could focus on
different domains. It is not rare that a natural language ques-
tion involves many aspects, and each aspect is covered by
a relevant KB. Such question would be answered by using
multiple KBs. We name this task as multiple KB-QA, which
is seldom investigated before, except for (Lopez et al. 2012;
Copyright
c
2016, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Music
General
Movie
mue:I_Dreamed
_a_Dream
mur:performer
mue:Anne_
Hathaway
gee:Anne_Hathaway gee:New_York
ger:birthPlace
moe:Anne_Hathaway
moe:Valentines
_Day(2010)
mor:starring
owl:sameAs
owl:sameAs
owl:sameAs
Figure 1: Three KBs should be used to answer the ques-
tion “Which songs are performed by person who was born
in New York and played a role in Valentine’s Day?”.
Shekarpour et al. 2014; Fader, Zettlemoyer, and Etzion-
i 2014).
This is a challenging task. For example, consider the fol-
lowing question:
Which songs are performed by person who was born in
New York and played a role in Valentine’s Day?
1
As illustrated in Figure 1, the answer to “songs performed
by” is in a music domain KB, and the answer to “born
in New York” is in a general domain KB, and answering
“played a role in Valentine’s Day” should turn to a movie
domain KB. The final structured query is generated by unit-
ing different fragments as follows:
SELECT ?v1 WHERE {
h?v1, mur:perfomer, ?v2i
2
h?v2, owl:sameAs, ?v3i
h?v3, mor:starring, moe:Valentines Day(2010)i
h?v3, owl:sameAs, ?v4i
h?v4, ger:birthPlace, gee:New Yorki }
From this example, we can see that the most significan-
t difference between multiple KB-QA and single KB-QA
is that the former needs to consider the interconnection-
s between heterogeneous KBs, such as h?v2, owl:sameAs,
1
This is a real case in Chinese QA scenario, and there is no such
a Chinese KB could answer it alone.
2
This triple pattern means that ?v2 is the performer of ?v1. The
first two letters of the prefix represent the source KB (mo: movie,
mu: music and ge: general), and the last letter represents the type
(e: entity, c: class and r: relation). E.g., mur means the resource is
from music KB and is a relation.