关系查询在扩展知识图谱上的改进搜索与自动补全

95 浏览量更新于2024-08-27 收藏 528KB PDF 举报

关系查询在扩展知识图谱中的研究是当前信息技术领域的一个重要议题。本文的标题"Relationship Queries on Extended Knowledge Graphs"关注于如何有效地处理实体搜索中涉及到的关系查询，这些问题通常涉及多文档中的相关实体集合，且答案表现为一系列关联实体的元组。在传统的文本搜索引擎中，对于这类查询的支持有限，因为它们主要设计用于单一实体搜索，而关系查询往往需要从多个文档中抽取线索并进行跨文档的关联。扩展知识图谱（Extended Knowledge Graphs）融合了结构化的知识关系和文本网络内容，为关系查询提供了一个更丰富的查询环境。然而，挑战在于用户查询与知识图谱之间可能存在不匹配，或者某些关系在图谱中尚未充分填充，导致查询召回率不高。为此，研究人员提出了TriniT搜索引擎，该系统特别设计用于在这些扩展知识图谱上执行查询和排名。 TriniT引擎的核心查询语言基于SPO（Subject-Predicate-Object）三元组模式，但在此基础上进行了扩展，支持对SPO每个部分使用文本短语，提高了查询表达能力。这使得用户能够更加自然地输入他们的查询，即使存在词汇或语义上的差异，也能通过自动查询放松技术来补偿这些不匹配。这一模型旨在提升查询理解和处理的准确性，从而改善查询结果的相关性和完整性。在实现上，TriniT可能采用了机器学习和自然语言处理技术，如语义解析、信息检索和知识图谱推理，来理解和转换用户的自然语言查询，将其映射到知识图谱中的适当节点和关系。同时，它可能还会利用文本蕴含和链接预测等方法来增强查询的覆盖率和精确度。 "Relationship Queries on Extended Knowledge Graphs"这篇研究论文探索了如何利用扩展知识图谱的特性来改进关系查询处理，通过创新的查询语言和自动放松机制，解决查询与知识图谱之间的匹配问题，以提高信息检索的效率和质量。这对于信息检索、知识图谱应用以及智能问答等领域具有重要的理论和实践价值。

Relationship Queries on Extended Knowledge Graphs

Mohamed Yahya

, Denilson Barbosa

, Klaus Berberich

, Qiuyue Wang

, Gerhard Weikum

Max-Planck Institute for Informatics

University of Alberta

Renmin University of China

{myahya, kberberi, weikum}@mpi-inf.mpg.de

denilson@ualberta.ca qiuyuew@ruc.edu.cn

ABSTRACT

Entity search over text corpora is not geared for relation-

ship queries where answers are tuples of related entities and

where a query often requires joining cues from multiple doc-

uments. With large knowledge graphs, structured querying

on their relational facts is an alternative, but often suﬀers

from poor recall because of mismatches between user queries

and the knowledge graph or because of weakly populated re-

lations.

This paper presents the TriniT search engine for querying

and ranking on extended knowledge graphs that combine

relational facts with textual web contents. Our query lan-

guage is designed on the paradigm of SPO triple patterns,

but is more expressive, supporting textual phrases for each

of the SPO arguments. We present a model for automatic

query relaxation to compensate for mismatches between the

data and a user’s query. Query answers – tuples of entities

– are ranked by a statistical language model. We present

experiments with diﬀerent benchmarks, including complex

relationship queries, over a combination of the Yago knowl-

edge graph and the entity-annotated ClueWeb’09 corpus.

Keywords

Relationship Queries; Extended Knowledge Graphs; Query

Relaxation

1. INTRODUCTION

1.1 Motivation and Problem

Searching for entities and associated properties has re-

ceived much attention for both web contents and enterprise

documents. Examples are queries for “musicians who con-

tributed to movie soundtracks” or “companies that acquired

Internet startups”. IR-centric approaches typically associate

entities with statistical language models in order to match

and rank entity mentions and surrounding phrases in text

corpora [3]. Semantic-Web-style approaches, on the other

hand, rather tap structured knowledge graphs (KGs) such

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full citation

on the ﬁrst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from permissions@acm.org.

WSDM’16, February 22–25, 2016, San Francisco, CA, USA.

 2015 Copyright held by the owner/author(s). Publication rights licensed to ACM.

ISBN 978-1-4503-3716-8/16/02. . . $15.00

DOI: http://dx.doi.org/10.1145/2835776.2835795

as Freebase or Linked Open Data (LOD) collections such as

combinations of DBpedia, Yago, and MusicBrainz, and use

SPARQL queries to retrieve relevant RDF triples [16].

Neither of these paradigms provides good support for re-

lationship queries that connect multiple entities in a speciﬁc

way and return tuples of connected entities. Consider, for

example, the task of ﬁnding songs that appear in movies

and returning a list of hsong, moviei pairs. This cannot be

fully expressed by IR-centric entity search which is bound to

return spurious results where a movie is merely mentioned in

the textual proximity of a song (e.g., “My Way” and the ﬁlm

“The Man with the Golden Arm” in a Frank Sinatra biogra-

phy). On the other hand, the SPARQL language supports

expressive and precise queries over RDF graphs of entity-

relationship triples, but its results are limited by the prop-

erties (i.e., relation types, binary predicates) and facts (i.e.,

relation instances, predicate arguments) that the underlying

KG or LOD collection contains. So none of the established

paradigms can adequately cope with relationship queries.

Examples: In principle, the above example could be for-

mulated by the following SPARQL query:

SELECT ?s ?m WHERE {

?s type song . ?m type movie . ?s musicInFilm ?m }

where ?s and ?m are variables and the second line contains

three triple patterns over the subject-predicate-object (SPO)

triples of the underlying KG. The query should return bind-

ings for song-movie pairs that are in the desired relationship.

However, this works only if the KG does indeed oﬀer the

predicate musicInFilm and that predicate is suﬃciently well

populated. If the KG instead contained predicates ﬁlmHas-

Soundtrack of type movie × album and albumContainsSong

of type album × song, the user would need to formulate a

very diﬀerent query and non-expert users would typically

fail to get this right. Even if the user succeeded in posing

the best query formulation, the answers would be limited

by the facts of the KG, while the web or social media could

potentially hold many additional answers.

Similar cases arise in domains like business or sports, for

example, when searching for “South-American football play-

ers and European championships that they have won” (with

answers such as Lionel Messi and the UEFA Champions

League). Here one challenge that users would struggle with

is to properly formulate the two-hop join query with triple

patterns ?p type player . ?p playedFor ?t . ?t won ?c .

?t type footballClub as KGs associate championships with

teams, not players. IR-style entity search would be more

convenient for users, but loses precision and would be mis-

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38588854

粉丝: 11
资源: 958

关系查询在扩展知识图谱上的改进搜索与自动补全

Embedding Logical Queries on Knowledge Graphs.pdf

Knowledge Graphs.pdf

Answering Reachability Queries on Large Directed Graphs - INF-SCR-10-10 (September, 2010)-计算机科学

On the Role of Knowledge Graphs in Explainable AI A Machine Learning Perspective

On the Role of Knowledge Graphs in Explainable Machine Learning - 20210505.pdf

Knowledge Graphs and Language Technology

Knowledge Graphs for RAG - Jupyter

Hierarchical Random Walk Inference in Knowledge Graphs

Knowledge Graphs Fundamentals, Techniques, and Applications

Representing Web Applications As Knowledge Graphs.pdf

最新资源