Passage Scoring for Question answering via Bayesian inference on lexical
relations
Deepa Paranjpe, Ganesh Ramakrishnan, Sumana Srinivasan
adeepa,hare
@cse.iitb.ac.in, sumana@it.iitb.ac.in
Dept. of Computer Science and Engg.,
Indian Institute of Technology, Mumbai, India
Abstract
Many researchers have used lexical networks
and ontologies to mitigate synonymy and polysemy
problems in Question Answering (QA), systems
coupled with taggers, query classifiers, and answer
extractors in complex and ad-hoc ways. We seek
to make QA systems reproducible with shared and
modest human effort, carefully separating knowl-
edge from algorithms. To this end, we propose
an aesthetically “clean” Bayesian inference scheme
for exploiting lexical relations for passage-scoring
for QA . The factors which contribute to the effi-
cacy of Bayesian Inferencing on lexical relations are
soft word sense disambiguation, parameter smooth-
ing which ameliorates the data sparsity problem and
estimation of joint probability over words which
overcomes the deficiency of naive-bayes-like ap-
proaches.
1 Introduction
This paper describes an approach to probabilistic in-
ference using lexical relations, such as expressed by
a WordNet, an ontology, or a combination, with ap-
plications to passage-scoring for open-domain ques-
tion answering (QA).
The use of lexical resources in Information Re-
trieval (IR) is not new; for almost a decade, the
IR community has considered the use of natural
language processing techniques (Lewis and Jones,
1996) to circumvent synonymy, polysemy, and other
barriers to purely string-matching search engines. In
particular, a number of researchers have attempted
to use the English WordNet to “bridge the gap” be-
tween query and response. Interestingly, the results
have mostly been inconclusive or negative (Fell-
baum, 1998a). A number of explanations have been
offered for this lack of success, some of which are
presence of unnecessary links and absence of
necessary links in the WordNet (Fellbaum,
1998b),
hurdle of Word Sense Disambiguation (WSD)
(Sanderson, 1994)
ad-hocness in the distance and scoring func-
tions (Abe et al., 1996).
2 Proposed approach
2.1 An inferencing approach to QA
Given a question and a passage that contains the an-
swer, how do we correlate the two ? Take for exam-
ple, the following question
What type of animal is Winnie the Pooh?
and the answer passage is
A Canadian town that claims to be the birthplace
of Winnie the Pooh wants to erect a giant statue of
the famous bear; but Walt Disney Studios will not
permit it.
It is clear that there is a linkage between the ques-
tion word animal and the answer word bear. That
the word bear occurred in the answer, in the context
of Winnie, means that there was a hidden ”cause”
for the occurrence of bear, and that was the concept
of
animal
.
In general, there could be multiple words in the
question and answer that are connected by many hid-
den causes. The causes themselves may have hid-
den causes associated with them. These causal re-
lationships are represented in ontologies and Word-
Nets. The familiar English WordNet, in particular,
encodes relations between words and concepts. For
instance WordNet gives the hypernymy relation be-
tween the concepts
animal
and
bear
.
2.2 WordNet
WordNet (Fellbaum, 1998b) is an online lexical ref-
erence system in which English nouns, verbs, ad-
jectives and adverbs are organized into synonym
sets or synsets, each representing one underly-
ing lexical concept. Noun synsets are related to
each other through hypernymy (generalization), hy-
ponymy (specialization), holonymy (whole of) and
meronymy (part of) relations. Of these, (hypernymy,
hyponymy) and (meronymy,holonymy) are comple-
mentary pairs.
The verb and adjective synsets are very sparsely
connected with each other. No relation is available
1