implicitly or explicitly during assignments of either string
or binary SQL values to XML columns, variables and
parameters.
XML values are stored in an internal format as large
binary objects (“XML blob”) in order to support the XML
data model characteristics more faithfully such as
document order and recursive structures.
The following statement creates a table DOCS with an
integer, primary key column PK and an XML column
XDOC:
CREATE TABLE DOCS (
PK INT PRIMARY KEY, XDOC XML)
2.2 XML Schema Support
SQL Server 2005 provides XML schema collections as a
mechanism for managing W3C XML schema documents
[21] as metadata. XML data type can be associated with
an XML schema collection to have XML schema
constraints enforced on XML instances. Such XML data
types are called “typed XML”. Non-XML schema bound
XML data type is referred to as “untyped XML”.
Both typed and untyped XML are supported within a
single framework, the XML data model is preserved, and
query processing enforces XQuery semantics. The
underlying relational infrastructure is used extensively for
this purpose.
2.3 Querying XML Data
XML instances can be retrieved using the SQL SELECT
statement. Four built-in methods on the XML data type,
namely query(), value(), exist() and nodes(), are available
for fine-grained querying. A fifth built-in method modify()
allows fine-grained modification of XML instances but is
not discussed further in this paper.
The query methods on XML data type accept the
XQuery language [15][16][22], which is an emerging
W3C recommendation (currently in Last Call), and
includes the navigational language XPath 2.0 [20].
Together with a large set of functions, XQuery provides
rich support for manipulating XML data. The supported
features of the XQuery language are shown below:
• XQuery clauses “for”, “where”, “return” and
“order by”.
• XPath axes child, descendant, parent, attribute,
self and descendant-or-self.
• Functions – numeric, string, Boolean, nodes,
context, sequences, aggregate, constructor, data
accessor, and SQL Server extension functions to
access SQL variable and column data within
XQuery.
• Numeric operators (+, -, *, div, mod).
• Value comparison operators (eq, ne, lt, gt, le,
ge).
• General comparison operators (=, !=, <, >, <=,
>=).
The following is an example of a query in which
section titles are retrieved from books and wrapped in
new <topic> elements:
SELECT PK, XDOC.query('
for $s in /BOOK/SECTION
return <topic>
{data($s/TITLE)}
</topic>')
FROM DOCS
The query execution is tuple-oriented – the SELECT
list is evaluated on each row of the DOCS table, the
query() method is processed on the XDOC column in
each row, and the result is a two-column rowset where the
column types are integer (for PK) and untyped XML (for
the XML result). The query methods are evaluated on
single XML instances, so that XQuery evaluation over
multiple XML documents is currently not supported by
the syntax but is allowed by the architecture. Scalar value-
based joins over XML instances are possible.
2.4 Indexing XML Data
Query execution processes each XML instance at runtime;
this becomes expensive whenever the XML blob is large
in size, the query is evaluated on a large number of rows
in a table, or a single SQL query executes multiple
XQuery expressions requiring the XML blob to be parsed
multiple times. Consequently, a mechanism for indexing
XML columns is supported in SQL Server 2005 to speed
up queries.
A primary XML index [12] on an XML column creates
a B
+
tree index on the data model content of the XML
nodes, and adds a column Path_ID for the reversed,
encoded path from each XML node to the root of the
XML tree.
The structural properties of the XML instance, such as
relative order of nodes and document hierarchy, are
captured in the OrdPath column for each node [11]. The
primary XML index is clustered on the OrdPath value of
each XML instance in the XML column. The other
noteworthy columns are the name, type and the value of a
node.
XML indexes provide efficient evaluation of queries
on XML data, and reassembly of the XML result from the
B
+
tree. These use the relational infrastructure while
preserving document order and document structure.
OrdPath encodes the parent-child relationship of XML
nodes by extending the parent’s OrdPath with a labelling
component for the child. This allows efficient
determination of parent-child and ancestor-descendant
relationships. Furthermore, the subtree of any XML node
N can be retrieved from the primary XML index using a
1177