XRel：关系数据库中XML文档存储与检索的路径方法

需积分: 4 2 浏览量更新于2024-08-01 收藏 186KB PDF 举报

"XRel是一种基于路径的方法，用于使用关系数据库存储和检索XML文档。通过将XML文档分解为其结构中的节点，并根据节点类型在关系表中存储每个节点，同时保留从根到每个节点的路径信息。这种方法允许在不知道DTD（文档类型定义）和元素类型的情况下，使用固定的关系模式存储XML文档，并利用数据库管理系统支持的如B+树和R树等索引。对于XML查询的处理，XRel提供了一种算法，将XPath表达式的核心子集翻译成SQL查询，因此无需扩展关系数据库来存储XML文档，而是通过数据库查询语言预处理器实现基于XPath表达式的查询检索。" 在现代信息技术领域，XML（可扩展标记语言）被广泛用于数据交换和结构化数据存储。然而，传统的关系数据库在处理XML文档时可能遇到挑战，因为它们设计用于处理结构化的表格数据，而非XML的半结构化特性。XRel的出现解决了这个问题，它是一种创新的方法，旨在将XML的灵活性与关系数据库的效率相结合。 XRel的核心思想是将XML文档分解为一系列基于路径的节点。这种分解方法保留了文档的层次结构，每个节点都与其在文档中的位置相关联。这样，每个节点可以根据其类型存储在不同的关系表中，路径信息则用于关联这些节点。这种方法的优点在于，即使没有关于文档的DTD或元素类型的元数据，也可以存储XML文档。这降低了存储和查询XML文档的复杂性。为了支持查询操作，XRel引入了一种算法，能够将XPath，XML的路径表达式，转换为SQL查询。XPath是一种强大的语言，用于选取XML文档中的特定部分。通过这种方式，XRel不需要对关系数据库进行扩展，就可以处理XPath查询，这极大地增强了XML文档在关系数据库中的检索能力。预处理器会解析XPath表达式，然后生成相应的SQL语句，由数据库管理系统执行，从而实现高效的查询。 XRel的这种设计还充分利用了关系数据库的索引机制，如B+树和R树。这些索引结构提高了查找特定节点或满足特定条件的节点的速度，使得在大量XML数据中进行高效检索成为可能。总结来说，XRel是一种有效的方法，它通过将XML文档映射到关系数据库模型，同时保留XML的结构信息，实现了XML数据的存储和检索。这种方法不仅简化了存储过程，还提供了XPath查询的支持，无需对现有的关系数据库系统进行重大修改。因此，XRel为XML和关系数据库之间的交互提供了一个实用的桥梁，对于需要处理大量XML数据的系统而言，具有很高的价值。

6 · M. Yoshikawa et al.

technique, elements having an in-degree greater than 2 are also inlined if they are reachable

without passing “*”. Incidentally, order information among elements that is discarded in

the ﬁrst step can be represented by adding positional information in the relational schema.

2.1.2 Storing Structured Documents without Information about DTD. There have been

several studies that used ﬁxed relational schemas to store structured documents. For ex-

ample, [Horowits and Williamson 1986] proposed to store structured documents (ordered

trees) by decomposing them into relational tables. Also, in a study by [Zhang 1995],

a method to manage SGML documents using object-oriented database systems was pro-

posed. In that work, all text nodes were maintained by a class NODE. In addition, [Flo-

rescu and Kossmann 1999b; Florescu and Kossmann 1999a] proposed several relational

schemas, and performed performance analysis on them. The method proposed in this pa-

per differs from these previous methods in that in this method, information about paths

from the root to each node and its position in the document is maintained in relational ta-

bles. In addition, our proposal does not impose any prerequisites on XML documents to be

stored, whereas [Florescu and Kossmann 1999b; Florescu and Kossmann 1999a] assumes

that each element has an ID attribute.

2.2 Other Approaches

Regarding index ﬁles for structured documents, several studies such as PAT [Salminen and

Tompa 1994], Burkowski [Burkowski 1992], Clarke et al. [Clarke et al. 1995a; Clarke

et al. 1995b] and Navarro et al. [Navarro and Baeza-Yates 1997] have appeared. [Sacks-

Davis et al. 1998] categorized such indexes into position-based and path-based indexings.

In position-based indexes, queries are processed using word element and position. On the

other hand, paths in tree structure are used in path-based indexes. In this paper, we do

not use special indexes for structured documents. However, our storage method is closely

related to the concept of those indexes.

Finally, the topic of abstract data type is related to both storage and query retrieval.

In [Blake et al. 1995], the authors described an approach in which an XML document is

regarde as just a sequence of characters, then operations on tree structures are replaced

with those on character strings, and an abstract data type is deﬁned in a database having

such operations. Our approach differs from those of previous research in that we simply

use an off-the-shelf database system; that is, we do not need any special full-text search

system or indexing structure, and translate XPath queries into SQL.

3. AN OVERVIEW OF XML DOCUMENTS

An XML document consists of three parts: an XML declaration, a DTD (Document Type

Deﬁnition), and an XML instance

. An XML declaration and a DTD are not mandatory

for an XML document. An XML declaration speciﬁes the version and the encoding of

XML being used. A DTD is a schema that constrains the structure of XML instances, and

corresponds to an extended context-free grammar. An XML instance is a tagged document.

We omit concrete descriptions of an XML declaration and a DTD.

An XML instance is a hierarchy of elements, the boundaries of which are either delim-

ited by start-tags and end-tags, or, for empty elements, by empty-element tags. Character

Although the term ‘XML instance’ does not appear in the XML Recommendation [World Wide Web Consor-

tium 1998], we use this term to represent XML document data excluding an XML declaration and a DTD.

剩余28页未读，继续阅读

lixiang_angel

粉丝: 1
资源: 21

XRel：关系数据库中XML文档存储与检索的路径方法

最新资源