Paragraph 1: Australia at the 2012 Winter Youth Olympics
Australia competed at the 2012 Winter Youth Olympics in Innsbruck. The chef de
mission of the team will be former Olympic champion Alisa Camplin, the first time
a woman is the chef de mission of any Australian Olympic team. The Australian
team will consist of 13 athletes in 8 sports.
Paragraph 2: Alisa Camplin
Alisa Peta Camplin OAM (born 10 November 1974) is an Australian aerial skier
who won gold at the 2002 Winter Olympics, the second ever winter Olympic gold
medal for Australia. At the 2006 Winter Olympics, Camplin finished third to receive
a bronze medal. She is the first Australian skier to win medals at consecutive
Winter Olympics, making her one of Australia's best skiers.
Distractor Paragraphs 3 - 10 ...
Q: The first woman to be the chef de mission of an Australian Olympic
team won gold medal in which winter Olympics ?
A: 2002 Winter Olympics
The Hanging Gardens, in Mumbai, also known as
Pherozeshah Mehta Gardens, are terraced gardens
? They provide sunset views over the Arabian Sea.
Mumbai (also known as Bombay, the official name
until 1995) is the capital city of the Indian state of
Maharashtra. It is the most populous city in India ?
The Arabian Sea is a region of the northern Indian
Ocean bounded on the north by Pakistan and Iran,
on the west by northeastern Somalia and the
Arabian Peninsula, and on the east by India ?
Q: (Hanging gardens of Mumbai, country, ?)
Options: {Iran, India, Pakistan, Somalia, ? }
A: India
HotpotQA WikiHop
Figure 2: Comparison between HotpotQA (left) and WikiHop (right). In HotpotQA, the questions are proposed
by crowd workers and the blue words in paragraphs are labeled supporting facts corresponding to the question. In
WikiHop, the questions and answers are formed with relations and entities in the underlying KB respectively, thus
the questions are inherently restricted by the KB schema. The colored words and phrases are entities in the KB.
et al., 2018) and ComplexWebQuestions (Talmor
and Berant, 2018). In this paper, we focus on
TBQA, since TBQA tests a system’s end-to-end
capability of extracting relevant facts from raw
language and reasoning about them.
Depending on the complexity in underlying
reasoning, QA problems can be categorized into
single-hop and multi-hop ones. Single-hop QA
only requires one fact extracted from the underly-
ing information, no matter structured or unstruc-
tured, e.g. “which city is the capital of Califor-
nia”. The SQuAD dataset belongs to this type (Ra-
jpurkar et al., 2016). On the contrary, multi-hop
QA requires identifying multiple related facts and
reasoning about them, e.g. “what is the capital city
of the largest state in the U.S.”. Example tasks and
benchmarks of this kind include WikiHop, Com-
plexWebQuestions, and HotpotQA. Many IR tech-
niques can be applied to answer single-hop ques-
tions (Rajpurkar et al., 2016). However, these IR
techniques are hardly introduced in multi-hop QA,
since a single fact can only partially match a ques-
tion.
Note that existing multi-hop QA datasets Wik-
iHop and ComplexWebQuestions, are constructed
using existing KBs and constrained by the schema
of the KBs they use. For example, the answers are
limited in entities in WikiHop rather than formed
by free texts in HotpotQA (see Figure 2 for an ex-
ample). In this work, we focus on multi-hop text-
based QA, so we only evaluate on HotpotQA.
Multi-hop Reasoning for QA Popular GNN
frameworks, e.g. graph convolution network
(Kipf and Welling, 2017), graph attention network
(Veli
ˇ
ckovi
´
c et al., 2018), and graph recurrent net-
work (Song et al., 2018b), have been previously
studied and show promising results in QA tasks
requiring reasoning (Dhingra et al., 2018; De Cao
et al., 2018; Song et al., 2018a).
Coref-GRN extracts and aggregates entity in-
formation in different references from scattered
paragraphs (Dhingra et al., 2018). Coref-GRN
utilizes co-reference resolution to detect different
mentions of the same entity. These mentions are
combined with a graph recurrent neural network
(GRN) (Song et al., 2018b) to produce aggregated
entity representations. MHQA-GRN (Song et al.,
2018a) follows Coref-GRN and refines the graph
construction procedure with more connections:
sliding-window, same entity, and co-reference,
which shows further improvements. Entity-GCN
(De Cao et al., 2018) proposes to distinguish dif-
ferent relations in the graphs through a relational
graph convolutional neural network (GCN) (Kipf
and Welling, 2017). Coref-GRN, MHQA-GRN
and Entity-GCN explore the graph construction
problem in answering real-world questions. How-
ever, it is yet to investigate how to effectively rea-
son about the constructed graphs, which is the
main problem studied in this work.
Another group of sequential models deals with
multi-hop reasoning following Memory Networks
(Sukhbaatar et al., 2015). Such models construct
representations for queries and memory cells for
contexts, then make interactions between them in
a multi-hop manner. Munkhdalai and Yu (2017)