加速XML查询处理：自底向上挖掘查询模式

138 浏览量更新于2024-07-15 收藏 469KB PDF 举报

"这篇论文提出了一种通过底部向上挖掘XML查询模式来提升XML查询效率的方法。该方法称为VBUXMiner，它首先将所有查询合并到一个名为'压缩全局树指南'（CGTG）的总结结构中，然后基于CGTG进行底部向上的遍历以生成频繁查询模式。这些频繁查询模式被用于缓存机制，以优化XML查询性能。实验结果显示，VBUXMiner在提升查询性能方面优于传统方法。" 在XML数据查询中，由于XML数据的复杂性和XML查询的特性，查询处理通常是一项计算密集型任务。为了改善这一状况，该研究提出了一个创新的策略，即通过缓存频繁查询的结果来加速XML查询的处理。这个策略的核心是一个名为VBUXMiner的高效底部向上挖掘算法。 VBUXMiner算法分为两个主要步骤。第一步是将用户发起的所有XML查询整合到一个压缩全局树指南（CGTG）中。CGTG是一种概括性的数据结构，能够捕获查询的共同特征，从而减少需要处理的原始XML数据量。这个过程通过合并和压缩查询，减少了后续处理的复杂性。第二步，VBUXMiner利用CGTG执行底部向上的遍历策略来识别频繁出现的查询模式。这种遍历方式能有效地发现用户经常执行的查询序列，这些模式代表了用户对数据的常见访问模式。识别出这些频繁查询模式后，系统可以将它们的结果存储在缓存中，以备后续相同或相似查询时快速返回结果，避免了重复计算，从而显著提高了查询性能。缓存机制的应用是提升XML查询性能的关键。通过预先计算并存储频繁查询的结果，当遇到相同的查询请求时，系统可以直接从缓存中获取结果，避免了再次解析XML文档和执行查询操作，大大降低了延迟，提升了用户体验。实验结果表明，VBUXMiner提出的底部向上挖掘XML查询模式的方法在提升查询效率方面优于传统方法。这意味着对于处理大量XML数据和频繁查询的环境，如Web服务、大数据分析等，VBUXMiner可以提供更高效的解决方案。这篇论文通过引入VBUXMiner算法，展示了如何利用数据挖掘技术优化XML查询，特别是在频繁查询场景下，该方法能够有效提高系统的响应速度和整体性能。这种方法的实施对于XML数据处理和查询性能的提升具有重要的实际意义，尤其对于那些需要快速响应用户查询的在线应用和服务来说，价值尤为突出。

Bei et al. / J Zhejiang Univ Sci A 2008 9(6):744-757

746

from bottom to top based on a compact global tree

guide. All infrequent nodes are pruned to accelerate

tree enumeration. The support of a candidate tree is

computed without scanning the database as it is cal-

culated directly from the global tree guide.

Caching results of XML queries has been con-

sidered a useful strategy to improve performance of

XML query processing. XCache (Chen et al., 2002) is

a holistic XQuery-based semantic caching system.

Mining approaches for finding frequent queries are

also incorporated into caching (Yang et al., 2003a;

Chen et al., 2005). Yang et al.(2003a) employ

FastXMiner to discover frequent XML query patterns

and demonstrate how the frequent patterns can be

used to improve caching performance. Chen et al.

(2005) take into account temporal features of queries

for frequent queries discovery and design an appro-

priate cache replacement strategy by finding both

positive and negative association rules. Hong and

Kang (2005) integrate heterogeneous data sources on

the Web and cache results of queries through XML

views of data sources to accelerate query processing.

PRELIMINARY CONCEPTS

Frequent rooted query pattern tree

Definition 1 (Query pattern tree, QPT) An XML

query can be modeled as a query pattern tree

QPT=<R, N, E>, where R is the root node, N is the

node set, and E is the edge set. Each node n has a label

whose value is in {“*”, “//”}

∪labelSet where the

labelSet is the label set of all elements and attributes.

For each edge e=(n

, n

), node n

is the parent of n

Definition 2 (Query pattern subtree, QPS) Given

two query pattern trees T and S, S is considered to be a

query pattern subtree of T iff there exists a one-to-one

mapping φ: V

→V

satisfying the following condi-

tions: (1) φ preserves the labels, i.e., L(v)=L(φ(v))

∀v∈V

; (2) φ preserves the parent relation, i.e.,

(u,v)∈E

iff (φ(u), φ(v))∈E

Definition 3 (Rooted query pattern subtree, RQPS)

Given two query pattern trees T and S, we say that S is

a rooted query pattern subtree of T iff S is a query

pattern subtree of T and the trees S and T have the

same root label.

Definition 4 (Query database tree, QDT) An XML

query database, which is a collection of XML queries,

can be represented as QDT=<T, R, Q, Φ>, where T is

a tree whose root is R; Q is the set of query pattern

trees {q

, q

, …, q

}; R is the virtual root node of the

tree with a special label not belonging to labelSet; Φ:

V→Q is a query mapping function from all children

of the root R to the trees Q, where V represents the set

of all children of the root R. For a complete tree with

the root node being the ith node of v

∈V, we have

Φ(v

)=q

Definition 5 (Frequent rooted query pattern tree,

FRQPT) Let D denote all the query pattern trees of

the issued queries and d

be an indicator variable with

(S)=1 if the query pattern tree S is a rooted query

pattern subtree of T and d

(S)=0 if tree S is not. The

support of query pattern tree S in D can be defined as

σ(S)=∑

∈

(S)/∑

∈

, i.e., the percentage of the

number of trees in D that contain tree S. A rooted

query pattern tree is frequent if its support is more

than, or equal to, a user-specified minimum support,

defined as minsupp.

With the help of the QDT, we can transform the

problem of discovering FRQPTs from the original

query database into the problem of discovering

FRQPTs over the QDT. Let n

(S) denote the number

of occurrences of the rooted subtree S in a tree T.

Then the support of a rooted query pattern tree S can

be defined as σ(S)=n

QDT

(S)/|Q|. In this way, we can

deal with query pattern trees with different root nodes,

and discover frequent query pattern trees while only

considering the rooted query pattern subtrees. After

finding all the FRQPTs over the QDT, the frequent

query patterns are obtained by simply removing the

virtual root of each FRQPT. Fig.1 shows a query

database tree composed of five XML queries. Given

the minimum support 0.6, we can obtain six FRQPTs.

Compressed global tree guide

For each user issued QPT, we assign a unique ID,

denoted as QPT.ID, which will be used for the con-

struction of a global tree guide in the mining process.

Definition 6 (Global tree guide, GTG) We merge all

issued queries over the query database tree to create a

global tree guide, where the ID list of each node

represents the queries containing the path from the

root to the current node. Fig.2 shows a GTG con-

structed using 15 query pattern trees. The QPT list for

node “Java” indicates that there are six queries that

contain the path “R/order/items/book/title/Java”.

万方数据

剩余14页未读，继续阅读

weixin_38694006

粉丝: 6
资源: 923

加速XML查询处理：自底向上挖掘查询模式

最新资源