利用贝叶斯推理解决QA中的同义性和多义性问题：段落评分方法

需积分: 9 139 浏览量更新于2024-09-16 收藏 87KB PDF 举报

本文主要探讨了在问题回答系统（Question Answering, QA）中，利用贝叶斯推理技术进行段落评分（Passage Scoring）的方法。许多研究者曾尝试通过构建词汇网络（lexical networks）和本体论（ontologies）来解决QA系统中的同义词和多义词问题，这些方法通常与分词器、查询分类器和答案抽取器结合使用，但实施过程复杂且缺乏统一的标准。为了提升QA系统的可复现性和人类参与度，研究人员提出了一种简洁且美观的贝叶斯推理方案。这种方法旨在明确地分离知识和算法，使得知识库的建设和优化更加透明和易于管理。具体来说，作者关注的关键因素包括： 1. **软词义消歧（Soft Word Sense Disambiguation）**：传统的词义消歧问题在处理多义词时可能存在局限性，通过采用软消歧策略，系统能够根据上下文的语境线索更准确地确定单词的确切含义，从而提高答案匹配的准确性。 2. **参数平滑（Parameter Smoothing）**：针对数据稀疏性问题，贝叶斯模型往往需要大量标注数据支持，参数平滑技术通过对稀有事件进行概率估计，缓解了数据不足对模型性能的影响，确保了模型在有限数据下的稳健性。 3. **联合概率估计（Joint Probability Estimation）**：与朴素贝叶斯等简单方法相比，该方法考虑了词语之间的关联性，通过估计单词组合的联合概率，提高了模型对复杂句子结构的理解和处理能力，从而提升了段落评分的质量。 4. **“干净”（Aesthetic）的贝叶斯推理框架**：这个设计原则强调了模型的清晰性和可解释性，使得知识表示和推理过程更为直观，有助于研究人员更好地理解和调整模型，同时也方便了其他研究人员的复制和扩展工作。这篇文章提出了一个基于贝叶斯推理的段落评分策略，它通过整合词汇关系处理和统计建模技巧，旨在改进问题回答系统在面对同义词和多义词挑战时的表现，同时注重系统的可重复性和知识与算法的分离，为QA领域的研究提供了新的实践方向。

Passage Scoring for Question answering via Bayesian inference on lexical

relations

Deepa Paranjpe, Ganesh Ramakrishnan, Sumana Srinivasan



adeepa,hare



@cse.iitb.ac.in, sumana@it.iitb.ac.in

Dept. of Computer Science and Engg.,

Indian Institute of Technology, Mumbai, India

Abstract

Many researchers have used lexical networks

and ontologies to mitigate synonymy and polysemy

problems in Question Answering (QA), systems

coupled with taggers, query classiﬁers, and answer

extractors in complex and ad-hoc ways. We seek

to make QA systems reproducible with shared and

modest human effort, carefully separating knowl-

edge from algorithms. To this end, we propose

an aesthetically “clean” Bayesian inference scheme

for exploiting lexical relations for passage-scoring

for QA . The factors which contribute to the efﬁ-

cacy of Bayesian Inferencing on lexical relations are

soft word sense disambiguation, parameter smooth-

ing which ameliorates the data sparsity problem and

estimation of joint probability over words which

overcomes the deﬁciency of naive-bayes-like ap-

proaches.

1 Introduction

This paper describes an approach to probabilistic in-

ference using lexical relations, such as expressed by

a WordNet, an ontology, or a combination, with ap-

plications to passage-scoring for open-domain ques-

tion answering (QA).

The use of lexical resources in Information Re-

trieval (IR) is not new; for almost a decade, the

IR community has considered the use of natural

language processing techniques (Lewis and Jones,

1996) to circumvent synonymy, polysemy, and other

barriers to purely string-matching search engines. In

particular, a number of researchers have attempted

to use the English WordNet to “bridge the gap” be-

tween query and response. Interestingly, the results

have mostly been inconclusive or negative (Fell-

baum, 1998a). A number of explanations have been

offered for this lack of success, some of which are



presence of unnecessary links and absence of

necessary links in the WordNet (Fellbaum,

1998b),



hurdle of Word Sense Disambiguation (WSD)

(Sanderson, 1994)



ad-hocness in the distance and scoring func-

tions (Abe et al., 1996).

2 Proposed approach

2.1 An inferencing approach to QA

Given a question and a passage that contains the an-

swer, how do we correlate the two ? Take for exam-

ple, the following question

What type of animal is Winnie the Pooh?

and the answer passage is

A Canadian town that claims to be the birthplace

of Winnie the Pooh wants to erect a giant statue of

the famous bear; but Walt Disney Studios will not

permit it.

It is clear that there is a linkage between the ques-

tion word animal and the answer word bear. That

the word bear occurred in the answer, in the context

of Winnie, means that there was a hidden ”cause”

for the occurrence of bear, and that was the concept



animal



In general, there could be multiple words in the

question and answer that are connected by many hid-

den causes. The causes themselves may have hid-

den causes associated with them. These causal re-

lationships are represented in ontologies and Word-

Nets. The familiar English WordNet, in particular,

encodes relations between words and concepts. For

instance WordNet gives the hypernymy relation be-

tween the concepts



animal



and



bear



2.2 WordNet

WordNet (Fellbaum, 1998b) is an online lexical ref-

erence system in which English nouns, verbs, ad-

jectives and adverbs are organized into synonym

sets or synsets, each representing one underly-

ing lexical concept. Noun synsets are related to

each other through hypernymy (generalization), hy-

ponymy (specialization), holonymy (whole of) and

meronymy (part of) relations. Of these, (hypernymy,

hyponymy) and (meronymy,holonymy) are comple-

mentary pairs.

The verb and adjective synsets are very sparsely

connected with each other. No relation is available

下载后可阅读完整内容，剩余5页未读，立即下载

MichaelFan

粉丝: 1
资源: 4

利用贝叶斯推理解决QA中的同义性和多义性问题：段落评分方法

基于散列的开放域问答高效段落检索_Efficient Passage Retrieval with Hashing for Op

First Passage Models for Denumerable Semi-Markov Decision Processes with Nonnegative Discounted Costs

passage-reranking-transformers:通过Transformers进行现成的句子段落排名

passage-solver:Algo项目旨在解决英语考试中看不见的简单段落

ASP.NET企业门户和内容管理网站系统 - Passage.NET 企业级门户系统 （安装包）

ASP.NET企业门户和内容管理网站系统 - Passage.NET 企业级门户系统（源码）

computer passage

Short-Passage.zip_Passage

Dense Passage Retriever - 是一套用于开放领域问答任务的工具和模型Q&A-python

corvallis-bus:适用于Corvallis公交系统的Rite-of-passage移动应用

最新资源

ASP.NET企业门户和内容管理网站系统 - Passage.NET 企业级门户系统（安装包）