Hu et al. / J Zhejiang Univ Sci A 2008 9(6):758-769
760
ARCHITECTURE AND RATIONALE OF AX2RM
SYSTEM
Architecture overview
Fig.1 illustrates the architecture of the AX2RM
system. Inputs of the system are XML schema
(DTD/XML schema), XML workload (path expres-
sions/XPath/XQuery and their frequency), and XML
data. The system generates an optimal relational
schema and a physical schema to store the XML data,
together with optimal translation of the XML work-
load. The AX2RM system contains an integrated and
iterative X2R mapping procedure, which is divided
into four correlated sub-procedures: logical database
design (LD), data scale estimation (SE), workload
translation (WT), and physical database design (PD).
In the LD procedure, XML schema is mapped
into an equivalent relational form. Since there are
many possible mappings, XML workload and XML
data are taken into account to determine the optimal
one. Note that the “optimal mapping” costs the lowest
when executing the translated SQL workload on the
produced relational database schema. The optimal
mapping is hence determined by the workload and the
physical schema, which are outcomes of the other two
procedures.
Concerning a specific logical design, the rela-
tional schema will be used for both estimating the
relational database scale in the SE procedure and
translating XML workload into SQL workload in the
WT procedure. The SE procedure analyzes the input
XML data and generates pseudo statistics for each
relation and each attribute. However, for the same
relational mapping schema and XML query, there are
many equivalent but different query translations
(Krishnamurthy et al., 2003). The WT procedure tries
to find the best translation.
After collecting information on the relational
schema, data scale, and workloads, the problem of
finding the optimal X2R mapping evolves into a
classical problem of finding the optimal physical
database design, i.e., the automatic physical database
design problem in RDBMS domain. The PD proce-
dure finds the optimal physical database design
schema (indexes, materialized views, partitions,
compressions, etc.) to maximize the performance.
All these procedures are optimization procedures
except the SE, and each procedure depends on con-
sequent procedures in that the objective function of a
procedure is the optimization procedure of the con-
sequent ones. For instance, the objective function of
the LD procedure is the minimum cost of the work-
load in a specific relational schema, where the
minimum cost is decided by the WT and the PD.
AX2RM architecture is a flexible framework, and
current X2R mapping techniques can incorporate
seamlessly into it. We will exploit merits of different
techniques to generate the best X2R mapping itera-
tively and adaptively.
Rationales for XML to relational mapping
For further discussion, we formalize the con-
cepts of X2R mapping in this subsection. Since de-
scribing all kinds of schema-mapping methods and
XML query representation based on a single infra-
Relational
schema
Physical
schema
Physical
database
design
(PD)
XML
schema
Logical
database
design
(
LD)
XML
workload
SQL
workload
Workload
translation
(WT)
XML data
Relational
data scale
Data scale
estimation
(SE)
XML
DTD
XSD
XPath
XQuery
SQL
DDL
SQL
DML
Input
Output
Input
Fig.1 Architecture of the adaptive XML to relational mapping system. There are four correlated proce-
dures in the system: the logical database design procedure, the data scale estimation procedure, the work-
load translation procedure, and the physical database design procedure
万方数据