树模板驱动的RDF数据关键词查询策略

需积分: 9 194 浏览量更新于2024-08-12 收藏 527KB PDF 举报

"基于树模板的RDF数据关键词查询方法" 在当前互联网时代，语义数据的快速增长使得对这些数据的访问需求不断攀升。关键词查询作为一种直观的查询模式，尤其适用于不熟悉数据结构和RDF查询语言的普通用户。这篇研究论文提出了一种基于树模板的RDF数据关键词查询方法，该方法无需依赖数据模式即可返回排名的答案。论文作者来自东南大学计算机科学与工程学院，通过此方法，即使没有具体的数据模式，也能有效地执行关键词查询。核心概念包括K步树（K-step tree）和树模板（Tree Template）。K步树是通过广度优先遍历RDF数据图得到的，它能够捕获数据的主要结构。而树模板则从K步树中提取出模式信息，帮助理解和处理查询。该方法将在线关键词查询问题分解为两个步骤：(1) 寻找包含所有查询关键词的K步树；(2) 从K步树中生成最小子树作为候选结果。这样的两步策略优化了查询效率，确保了在无模式信息的情况下也能生成相关性高的查询结果。关键词查询首先遍历整个RDF数据图，寻找包含所有查询关键词的节点集合，形成K步树。然后，通过对这些K步树进行分析，提取出包含所有查询关键词的最小结构——子树，这些子树作为最终结果的候选。通过对这些候选子树进行排名，可以得到与查询关键词最相关的部分。在实际应用中，这种方法对于提高非专业用户的查询体验具有重要意义。通过使用树模板，即使没有详细的元数据或数据模式，系统也能理解用户的查询意图，并提供相关的结果。这种方法对于语义Web和大数据环境下的信息检索具有很高的实用价值。 "基于树模板的RDF数据关键词查询方法"是一种创新的查询策略，它简化了RDF数据的查询过程，降低了用户的技术门槛，提高了查询效率和结果的相关性。这对于促进语义数据的普及和使用具有积极的推动作用。

Keyword Query Approach over RDF Data Based on Tree Template

Qiang Sima

School of Computer Science and Engineering,

Southeast University

Nanjing, P.R.China 210096

220131534@seu.edu.cn

Huiying Li

School of Computer Science and Engineering,

Southeast University

Nanjing, P.R.China 210096

huiyingli@seu.edu.cn

Abstract—With the large increment of semantic data available

in the web, the demand for access to Semantic data is

increasing. Keyword query is regarded as an intuitive

paradigm, especially for the users who are not familiar with

the data and the RDF query language. In this paper, an

approach based on tree template is proposed, which can return

ranked answers to keyword query without the help of data

schema. With the proposed approach, K-step tree and tree

template are presented to support efficient keyword query. K-

step tree can be derived by breadth-first traversal of RDF data

graph, while tree template extracts the schema of K-step tree.

The online process conducted keyword query problem into two

steps: (1) find K-step trees containing all query keywords; (2)

generate minimal sub-trees as candidate results from K-step

trees. Search space of sub-graph is greatly reduced from the

original entire RDF data graph to the K-step tree. The

experimental results show that the approach is feasible and

effective.

Keywords-keyword query; tree template; K-step tree; RDF

data

INTRODUCTION

In recent years, with some projects such as DBPedia and

Linking Open Data in full swing, the online semantic data

has dramatically increased. Faced with such a large scale of

RDF data, the problem that how to help users query data

conveniently is one of the most concerned problems in

current semantic web search. Although W3C organization

has proposed formal queries for RDF data, such as

SPARQL[1], it is too complicated for ordinary users to write

even a relatively simple query without detailed knowledge of

data schema and query language.

The keyword query has proven to be a popular

information search method on the Web because of the fact

that the user does not need to know any query language or

underlying structure of data. Many approaches (e.g. [2], [3],

[4], [5], [6]) implement information retrieval (IR) strategies

on top of traditional database systems. However, none of

them can be directly applied to keyword query on RDF data

because its underlying data model is a graph rather than

relational data. Recently, there are some research work (e.g.

[7], [8]) dealing with keyword query to semantic search.

However, these works mainly focus on translating keyword

queries and cannot deal with schema-less data.

In this paper, we focus on implementing efficient ranked

keyword query on RDF data. On a large data graph, many

substructures may contain the query keywords, we restrict

answers to those connected substructures that are "minimal".

The main contributions of this paper are as follows.

1. We propose a keyword query model on RDF data. We

partition a large data graph into many K-step trees. The

index stores the content nodes as well as edge labels of K-

step trees, which makes the keyword query problem being

converted into a string matching problem. An answer is a

minimal sub-tree of K-step trees containing all query

keywords.

2. We present a keyword query algorithm based on tree

template. We explore the relationship between tree template

and content nodes of K-step trees, which makes us generate

minimal sub-trees as candidates quickly when K-step trees

are determined. We sort candidates and return top-k

candidates as final results to users.

A KEYWORD QUERY MODEL ON RDF DATA

For keyword query methods which support graph-

structure data currently, some are suitable for tree, some are

for graph. Two main steps are involved in these methods.

Firstly, the methods retrieve those parts of the data graph that

match users keywords with a subsequent identification of

any connections across graph segments returned by the query

process. Secondly, the methods rank the combined graph

segments, and top-k results are presented from the candidate

result-set. While locating nodes matching the keywords is

relatively efficient using index information, determining the

connections between graph segments is a complex and time-

consuming task that must be solved on-the-fly at query

processing phase.

In order to avoid time-consuming searching at query

processing phase, paper[9] proposed clustering technique

that reduces the search space by avoiding the exploration of

overlapping solutions, analyzing the paths in the graph and

extracting their schemas as templates. Paths with the same

template are grouped together in a cluster, each cluster being

represented by a template. Solutions, answers to the input

search, are built by composing paths starting with the most

relevant path in the most relevant cluster. Such strategy tends

to produce solutions exhaustive but not optimally specific,

leading to the solution generated at each step may not be the

optimum solution. On this basis, we proposed tree template

query model, using tree template to store the schema of

original data graph so that final candidate results will follow

the original structure, and eventually they can meet the users'

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38738977

粉丝: 6
资源: 971

树模板驱动的RDF数据关键词查询策略

最新资源