模板学习：亿级知识库问答系统的新突破

需积分: 34 9 浏览量更新于2024-09-10 收藏 720KB PDF 举报

"这篇论文提出了一种基于知识图谱的问答系统，该系统利用模板学习来理解并回答各种形式的问题，显著提高了问答系统的准确性和覆盖率。通过对亿级知识库和百万级问答语料库的学习，他们开发了2700万个模板来涵盖2782个意图，支持二元事实问题以及复杂的多步骤问题。此外，他们还扩展了RDF知识库中的谓词，使知识库的覆盖范围增加了57倍。在QALD基准测试中，他们的系统在效果和效率上都超过了其他最先进的工作。" 基于知识图谱的问答系统是近年来自然语言处理领域的一个重要研究方向，其目标是使用户能够通过自然语言提问，从大规模知识库中获取精确答案。传统的问答系统存在局限性，例如基于规则的方法只能处理有限的预定义问题，而基于关键词或同义词的方法则难以理解复杂多变的提问方式。本文介绍了一种新的问题表示方法——模板。模板是一种能够捕获问题结构的模式，对于特定类型的问题（如关于城市人口的问题），可以学习到如“$city$的人口是多少？”或“$city$有多少人？”这样的模板。这种方法使得系统能够理解并映射大量的自然语言提问方式，从而提高问答的准确性和灵活性。为了构建这个系统，研究者首先从大规模的问答语料库中学习到了2700万个模板，这些模板对应于2782个不同的意图类别，覆盖了各种常见问题类型。通过这种方式，系统能够有效地处理二元事实问题，即只需要两个实体就能回答的问题，同时也能应对更复杂的多步问题，这些问题需要串联多个二元事实来得到答案。此外，为了进一步提升知识库的覆盖能力，研究者对RDF知识库中的谓词进行了扩展。这使得原本可能无法回答的问题因为知识库的扩大而变得可行，提高了57倍的覆盖率，从而显著增强了系统的回答能力。在QALD基准测试中，这个基于模板的问答系统在准确性和效率上均优于现有的最优方法，展示了其在实际应用中的潜力。这表明，结合知识图谱和模板学习的问答系统可以提供更加智能、全面的问答服务，有望在未来的人机交互和信息检索中发挥重要作用。

different templates for 2782 predicates. The large amount guaran-

tees the wide coverage of template-based QA.

The procedure of learning the predicate of a template is as fol-

lows. First, for each QA pair in Yahoo! Answer, we extract the

entity in question and the corresponding value. Then, we ﬁnd the

predicate from the knowledge base by looking up the direct predi-

cate connecting the entity and the value. Our basic idea is, if most

instances of a template share the same predicate, we map the tem-

plate to this predicate. For example, suppose questions derived

by template how many people are there in $city? al-

ways map to the predicate population, no matter what speciﬁc

$city it is. We can conclude that for certain probability the tem-

plate maps to population. Learning templates that map to a com-

plex knowledge base structure employs a similar process. The only

difference is that we ﬁnd “expanded predicates” that correspond to

a path consisting of multiple edges which lead from an entity to a

certain value (e.g., marriage → person → name).

1.4 Paper Organization

The rest of the paper is organized as follows. In Sec 2, we give an

overview of KBQA. The major contribution of this paper is learn-

ing templates from QA corpora. All technique parts are close-

ly related to it. Sec 3 shows the online question answering with

templates. Sec 4 elaborates the predicates inference for templates,

which is the key step to use templates. Sec 5 extends our solution to

answer a complex question. Sec 6 extends the ability of templates

to infer complex predicates. We present experimental studies in

Sec 7, discuss more related works in Sec 8, and conclude in Sec 9.

2. SYSTEM OVERVIEW

In this section, we introduce some background knowledge and

give an overview of KBQA. In Table 2, we list the notations used

in this paper.

Table 2: Notations

Notation Description Notation Description

q question s subject

a answer p predicate

QA QA corpus o object

e entity K knowledge base

v value c category

t template p

expanded predicate

V (e, p) {v|(e, p, v) ∈ K} s

⊂ s

is a substring of s

t(q, e, c) template of q by θ

(s)

estimation of θ

conceptualizing e to c at iteration s

Binary factoid QA We focus on binary factoid questions

(BFQs), that is, questions asking about a speciﬁc property of an

entity. For example, all questions except

 in Table 1 are BFQs.

RDF knowledge base Given a question, we ﬁnd its answer in

an RDF knowledge base. An RDF knowledge base K is a set of

triples in the form of (s, p, o), where s, p, and o denote subjec-

t, predicate, and object respectively. Figure 1 shows a toy RDF

knowledge base via an edge-labeled directed graph. Each (s, p, o)

is represented by a directed edge from s to o labeled with predicate

p. For example, the edge from a to 1961 with label dob represents

an RDF triple (a, dob, 1961), which represents the knowledge of

Barack Obama’s birthday.

Table 3: Sample QA Pairs from a QA Corpus

Id Question Answer

, a

) When was Barack Obama

born?

The politician was born in

1961.

, a

) When was Barack Obama

born?

He was born in 1961.

, a

) How many people are

there in Honolulu?

It’s 390K.

QA corpora We learn question templates from Yahoo! Answer,

which consists of 41 million QA pairs. The QA corpora is denoted

by QA = {(q

, a

), (q

, a

), ..., (q

, a

)}, where q

is a ques-

tion and a

is the reply to q

. Each reply a

consists of several

sentences, and the exact factoid answer is contained in the reply.

Table 3 shows a sample from a QA corpus.

Templates. We derive a template t from a question q by replacing

each entity e with one of e’s categories c. We denote this template

as t = t(q, e, c). A question may contain multiple entities, and

an entity may belong to multiple categories. We obtain concept

distribution of e through context-aware conceptualization [32]. For

example, question q

in Table 3 contains entity a in Figure 1. Since

a belongs to two categories: $Person, $Politician, we can derive

two templates from the question: When was $Person born?

and When was $Politician born?.

Figure 3: System Overview

System Architecture. Figure 3 shows the pipeline of our QA sys-

tem, which consists of two major procedures:

• Online procedure: When a question comes in, we ﬁrst parse

and decompose it into a series of binary factoid questions. The

decomposition process is described in Sec 5. For each binary

factoid question, we use a probabilistic inference approach to

ﬁnd its value, shown in Sec 3. The inference is based on the

predicate distribution of given templates, i.e. P (p|t). Such dis-

tribution is learned ofﬂine.

• Ofﬂine procedure: The goal of ofﬂine procedure is to learn the

mapping from templates to predicates. This is represented by

P (p|t), which is estimated in Sec 4. And we expand predicates

in the knowledge base in Sec 6, so that we can learn more com-

plex predicate forms (e.g., marriage → person → name in

Figure 1).

3. OUR APPROACH: KBQA

In this section, we ﬁrst formalize our problem in a probabilistic

framework in Sec 3.1. We present the details for most probability

estimations in Sec 3.2, leaving only the estimation of P(p|t) in

Sec 4. We elaborate the online procedure in Sec 3.3.

3.1 Problem Model

KBQA learns question answering by using a QA corpus and a

knowledge base. Due to issues such as uncertainty (e.g. some ques-

tions’ intents are vague), incompleteness (e.g. the knowledge base

is almost always incomplete), and noise (e.g. answers in the QA

567

剩余11页未读，继续阅读

qq_40658408

粉丝: 0
资源: 3

模板学习：亿级知识库问答系统的新突破

基于知识图谱的自动问答系统

基于知识图谱的问答系统 - SpringBoot整合Neo4j开发问答系统（完整版）

Spring-boot框架连结Neo4j搭建课程知识图谱，实现课程的KBQA问答系统以及相关课程信息的查询和D3.JS可视化

基于知识图谱问答系统

基于知识图谱问答实现

基于知识图谱的问答系统综述

知识图谱问答系统-基于医疗领域知识图谱的问答系统python源码+运行说明（高分毕业设计）.zip

基于neo4j的知识图谱问答系统.rar

基于知识图谱的问答系统，结合MetaQA知识图谱，用于解决电影领域的问答问题

基于知识图谱的问答系统关键技术研究

最新资源