XML查询处理：扩展树模式匹配的理论与算法

82 浏览量更新于2024-07-15 收藏 1.27MB PDF 举报

“扩展的XML树模式匹配：理论和算法探讨了在XML数据处理中的查询优化问题，特别是针对XML树模式匹配。文章提出了扩展XML树模式的概念，这些模式可能包含父子（PC）、祖先后代（AD）关系、否定函数、通配符以及顺序限制。作者建立了一个名为“匹配叉”的理论框架，揭示了整体算法最优性的内在原理，并在此基础上设计了一组新的算法，用于高效处理这类扩展的XML树模式查询。通过实证和合成数据集的实验，证明了提出的理论和算法的有效性和高效性。” 在XML数据处理领域，查询性能是至关重要的，尤其是随着XML数据的广泛应用。传统的树模式匹配方法主要关注PC和AD关系，但实际的XML查询语言如XPath和XQuery提供了更为丰富的功能，包括否定函数（如not()）、基于顺序的轴（如following-sibling）和通配符（如*）。这些特性使得查询表达能力增强，但同时也增加了处理复杂性的挑战。本文的核心贡献在于对“扩展XML树模式”的深入研究。这一概念扩大了树模式匹配的范围，允许匹配更为复杂的查询结构。作者提出的“匹配叉”理论框架是对整体算法优化性的理论解释，它揭示了控制中间结果大小以优化查询处理的关键因素。这个框架有助于理解如何设计出更为高效的匹配算法。基于这个理论，作者开发了一套新颖的算法，专门用于处理包含PC、AD关系、否定、通配符以及顺序限制的扩展XML树模式。这些算法的目标是减少查询执行过程中的计算开销，提高查询效率。实验结果表明，这些算法在处理真实世界和合成数据集时表现出优秀的性能，验证了理论的有效性和算法的实用性。这篇文章对XML查询处理领域做出了重要贡献，不仅丰富了理论基础，还提供了实用的算法解决方案，有助于提升XML数据库的查询效率，满足日益增长的XML数据处理需求。对于XML查询语言的设计者、数据库开发者以及需要处理大量XML数据的应用程序开发者来说，这篇研究具有很高的参考价值。

holistic XML tree pattern matching algorithms. The

experimental results show that our algorithm can

correctly process extended XML tree patterns,

achieving performance speedup for tested queries

and data sets, even in their restricted focus. The

improvement mainly owes to the reduction of the

size of intermediate results.

1.2 Outline

The rest of the paper is organized as follows: Section 2 gives

the preliminaries about research problem and the proces-

sing model. Section 3 shows a set of theories about

matching cross and Section 4 presents an extended XML

tree pattern matching algorithm called TreeMatch. Section 5

presents thorough experimental studies between the novel

algorithms and the prior methods. Finally, Section 6

presents previous work related to the XML tree pattern

matching and Section 7 concludes this paper.

2PRELIMINARIES

2.1 Modeling of XML Data and Extended Tree

Pattern Query

An XML database D is usually modeled as a rooted, node-

labeled tree (in this paper, we use D to represent the

database and the related tree model exchangeably without

specific declaration), element tags and attributes are

mapped to nodes in the trees and the edges are used to

represent the direct nesting relationships. Our primary

focus is on element nodes; and it is not difficult to extend

our methods to process the other types of nodes, including

attribute and character data. For convenience, we distin-

guish between query nodes and database nodes by using

the term “node” to refer to a query node and the term

“element” to refer to a data element in D.

An extended tree query Q describes a complex traversal of

the XML document and retrieves relevant tree-structured

portions of it. The nodes in Q include element tags, attributes,

and character data. We use “*” to denote the wildcard, which

can match any single tree element. There are four kinds of

query edges, which are the four combinations between

(positive and negative) and (parent-child and ancestor-descen-

dant). For example, in Fig. 2b, (A; B)isapositiveparent-child

edge and (A; C)isanegative parent-child edge. We use a

symbol “:” to denote a negative edge. There are two kinds of

query node: ordered and unordered node. We use “< ” in a box

to denote the ordered node, otherwise it is an unordered node.

For example, the node A, in Figs. 2c and 2d are ordered nodes.

In each extended tree query pattern, there is one or multiple

nodes which are assigned as the selected return nodes,

denoted with an underline. For example, in Fig. 2a, C is the

selected return node.

Given an extended tree query Q with n selected return

nodes and an XML database D,amatch of Q in D is

identified by a mapping from nodes in Q to the elements in

D, such that:

1. query node types (i.e., tag name) are satisfied by the

corresponding database elements and wildcards “*”

can match any single database element;

2. the positive edge relationships (including positive

parent-child and positives ancestor-descendant

edges) between query nodes are satisfied by the

corresponding database elements;

3. the negative edge relationships (including negative

parent-child and negative ancestor-descendant

edges) are satisfied, that is, no corresponding

database element pairs exist; and

4. the order relationship of children of each ordered

node is satisfied by the corresponding database

elements.

The answers of a query can be represented as a set of

database elements, where each element identifies a distinct

match of the selected return nodes on D. For example, Fig. 5

shows an example mapping relationship between an

extended XML tree pattern and a document tree.

2.2 Labeling Schemes

Most XML query processing algorithms on XML documents

rely on certain labeling schemes, such as region encoding

scheme [27], prefix scheme [13], ORDPATH [19], and

extended Dewey scheme [16]. In this paper, we use the

extended Dewey labeling scheme, proposed in paper [16], to

assign each node in XML documents a sequence of integers

to capture the structure information of documents.

Extended Dewey labeling scheme is a variant scheme of

the prefix labeling scheme. In the prefix labeling scheme, the

root is labeled by an empty string and for a nonroot element

u, labelðuÞ¼labelðvÞ:n, where u is the nth child of v.In

Extended Dewey labeling scheme, each label provides

complete information about ancestors’ names and labels.

For example, given an element e with label “1.2.3,” prefix

labeling schemes can tell us parentðeÞ¼‘‘1:2’’ and

grandparentðeÞ¼‘‘1’’, but extended Dewey labeling scheme

can also tell us the tag name of elements, say, tagðeÞ¼‘‘A’’,

tagðparentðeÞÞ¼‘‘B’’, and tagðgrandparentðeÞÞ¼‘‘C’’. In or-

der to achieve this goal, paper [16] uses module function to

encode the element tag information to prefix labels, and use

finite state transducer (FST) to decode the the types

information for a single extended Dewey label. The details

LU ET AL.: EXTENDED XML TREE PATTERN MATCHING: THEORIES AND ALGORITHMS 3

Fig. 4. Illustration to the relationship between BMC and UMC. The

shaded portions demonstrate the optimal query classes.

Fig. 5. Mapping relationship between an extended tree pattern and a

document tree.

剩余14页未读，继续阅读

weixin_38636577

粉丝: 4

XML查询处理：扩展树模式匹配的理论与算法

模式匹配算法

模式匹配算法的原理及应用

基于XML的树型结构编码及结构相似性匹配方法

OTJFast与OTJFaster：有序XML树枝模式匹配的高效算法

基于XML的树匹配：提高构件检索效率与精度

提升XML查询效率：扩展杜威标签方案的树模式索引与匹配算法

XML结构化匹配：位图过滤加速技术

优化XML小枝模式匹配算法：POTwigStack提升效率

XML树枝查询的自底向上算法

diff:Saulx diff算法

最新资源