XML文档查询：关系数据库的转化与应用

需积分: 9 105 浏览量更新于2024-11-26 收藏 179KB PDF 举报

"这篇学术论文探讨了在关系数据库中查询XML文档的方法，通过将XML文档转换为关系元组，利用传统的SQL查询语言处理XML数据，然后将结果转换回XML格式。这种方法旨在克服XML查询的局限性，同时利用成熟的关系数据库技术。" 在当今的互联网世界中，XML（可扩展标记语言）已成为存储和交换结构化数据的主要标准。它提供了灵活的数据表示方式，适合复杂的数据结构。然而，随着XML数据量的增长，如何有效查询和管理这些数据成为一个挑战。这篇论文由来自威斯康星大学麦迪逊分校计算机科学系的研究团队撰写，他们提出了一个相对保守但实用的解决方案，即利用现有的关系数据库引擎来处理XML文档。论文指出，虽然有许多新型的半结构化数据模型和查询语言被提出用于处理XML，但这些新方法可能带来额外的学习成本和系统复杂性。因此，研究团队研究了一种将XML文档与关系数据库相结合的方法，特别是针对遵循文档类型定义（DTD）的XML文档。他们开发了算法并构建了一个原型系统，该系统能够执行以下操作： 1. **XML到关系的转换**：将XML文档分解为关系数据库中的元组，这样就可以利用关系数据库的强大功能进行数据处理。 2. **查询转换**：将用户对XML文档的半结构化查询转换为SQL查询，使查询能够在关系数据库上执行。 3. **结果转换**：将查询结果从关系数据库格式重新转换回XML，以便用户能以熟悉的XML格式接收数据。这一方法的优势在于，它允许用户利用已知的SQL语法进行查询，无需学习新的查询语言，同时也保留了XML的灵活性。此外，由于关系数据库在数据一致性和事务处理方面有深厚的基础，这种方法还可以提供数据管理和并发控制的保证。论文进行了定性评估，可能包括性能测试、查询效率分析以及与现有XML查询技术的比较。不过，具体细节在提供的摘要中并未详述。这项工作为处理XML数据提供了一条新的途径，将传统数据库技术与XML的广泛应用相结合，以满足大数据时代的需求。

2.1 Extensible Markup Language

Extensible Markup Language (XML) is a hierarchical

data format for information exchange in the World Wide

Web. An XML document consists of nested element

structures, starting with a root element. Element data can

be in the form of attributes or sub-elements. Figure 1

shows an XML document that contains information about

a book. In this example, there is a book element that has

two sub-elements, booktitle and author. The author

element has an id attribute with value “dawkins” and is

further nested to provide name and address information.

Further information on XML can be found in [3,8].

Figure 1

Figure 2

2.2 DTDs and other XML Schemas

Document Type Descriptors (DTDs) [2] describe the

structure of XML documents and are like a schema for

XML documents. A DTD specifies the structure of an

XML element by specifying the names of its sub-elements

and attributes. Sub-element structure is specified using the

operators * (set with zero or more elements), + (set with

one or more elements), ? (optional), and | (or). All values

are assumed to be string values, unless the type is ANY in

which case the value can be an arbitrary XML fragment.

There is a special attribute, id, which can occur once for

each element. The id attribute uniquely identifies an

element within a document and can be referenced through

an IDREF field in another element. IDREFs are untyped.

Finally, there is no concept of a root of a DTD – an XML

document conforming to a DTD can be rooted at any

element specified in the DTD. Figure 2 shows a DTD

specification, while Figure 1 gives an XML document that

conforms to this DTD.

Document Content Descriptors (DCDs) [4] and XML

Schemas [16] are extensions to DTDs. For our purposes,

the main difference between these and DTDs is that they

allow typing of values and set size specification. If DCDs

and XML Schemas become standard, the additional

information would aid in our translation process; for

example, we could create tables with integer attributes

where appropriate instead of using just strings. The types

in the current DCD proposal are compatible with types

supported by current relational systems. More complex

types will require object-relational extensions.

2.3 XML Query Languages

Figure 3

Figure 4

There are many semi-structured query languages that can

be used to query XML documents, including XML-QL

[9], Lorel [1], UnQL [5] and XQL (from Microsoft). All

these query languages have a notion of path expressions

for navigating the nested structure of XML. XML-QL

uses a nested XML-like structure to specify the part of a

document to be selected and the structure of the result

XML document.

Figure 4 shows an XML-QL query to determine the

last name of an author of a book having title “The Selfish

Gene”, specified over a set of XML documents

conforming to the DTD in Figure 2. The last names thus

selected will be nested within a lastname tag, as specified

in the construct clause of the query. Lorel is more like

SQL and its representation of the same query is shown in

Figure 3. In this paper, we use a combination of XML-QL

and Lorel (modified appropriately for our purposes).

WHERE <book>

<booktitle> The Selfish Gene </booktitle>

</>

</> IN a.xml, b.xml

CONSTRUCT <lastname> $l </lastname>

SELECT X.author.lastname

FROM book X

WHERE X.booktitle = “The Selfish Gene”

<!ELEMENT book (booktitle, author)

<!ELEMENT article (title, author*, contactauthor)>

<!ELEMENT contactauthor EMPTY>

<!ATTLIST contactauthor authorID IDREF IMPLIED>

<!ELEMENT monograph (title, author, editor)>

<!ELEMENT editor (monograph*)>

<!ATTLIST editor name CDATA #REQUIRED>

<!ELEMENT author (name, address)>

<!ATTLIST author id ID #REQUIRED>

<!ELEMENT name (firstname?, lastname)>

<!ELEMENT firstname (#PCDATA)>

<!ELEMENT lastname (#PCDATA)>

<!ELEMENT address ANY>

book>

<booktitle> The Selfish Gene </booktitle>

<name>

<firstname> Richard </firstname>

<lastname> Dawkins </lastname>

</name>

<city> Timbuktu </city>

</address>

</author>

</book>

剩余12页未读，继续阅读

womenshihuagongren

粉丝: 4
资源: 1

XML文档查询：关系数据库的转化与应用

Relational Databases for Agile Developers epub

Ontop: Answering SPARQL Queries over Relational Databases

Relational-Databases-Coding-HW:一个显示 MYSQL 数据库上的一些 SQL 查询的小型 Java 程序

XRel：A Path-Based Approach to Storage and Retrieval of XML Documents Using Relational Databases

Relational Databases for Agile Developers azw3

Relational Databases vs. Non-Relational Databases: DBeaver Cross-Database Operations

Java Persistence for Relational Databases

Data Page Layouts for Relational Databases.pdf

RAD for zope and relational databases-开源

Object-Relational Databases

最新资源