6 International Journal of Data Warehousing & Mining, 5(3), 1-27, July-September 2009
Copyright © 2009, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
UML and Attribute Mappings
One of the main assumptions of the approach of
Trujillo & Luján-Mora (2003) was that the user
must not be overwhelmed with the multitude
of attribute mappings between sources, ETL
activities and target warehouse tables. Still, as
already mentioned, such detail is important for
the back-stage of the data warehouse. Without
capturing the details of the inter-attribute
mappings, important transformations, checks
and contingency actions are not present in
the documentation of the process and can be
ignored at the construction/generation of code.
Despite the effort needed, this documentation
can be useful at the early stages of the project,
where the designer is familiarized with the
internal structure and contents of the sources
(which include data quality problems, cryptic
codes, conventions made by the administrators
and the programmers of the sources and so on).
Moreover, this documentation can also be useful
during later stages of the project where the data
schemata as well as the ETL tasks evolve and
sensitive parts of the ETL workow, both at the
data and the activities, need to be highlighted
and protected.
It is interesting that no standard formalism
like the ER model or the UML treats attributes
as rst-class citizens –and as such, they are un-
able to participate in relationships. Therefore,
Luján-Mora, Vassiliadis & Trujillo (2004) stress
the need to devise a mechanism for capturing
the relationships of attributes in a way that
is (a) as standard as possible and (b) allows
different levels of zooming, in order to avoid
overloading the designer with the large amount
of attribute relationships that are present in a
data warehouse setting.
To this end, the authors devise a mechanism
for capturing these relationships, via a UML
data mapping diagram. UML is employed
as a standard notation and its extensibility
mechanism is exploited, in order to provide a
standard model to the designers. Data mapping
diagrams treat relations as classes (like the
UML relational prole does). Attributes are
represented via proxy classes, connected to
the relation classes via stereotyped “Contain”
relationships. Attributes can be related to each
other via stereotyped “Map” relationships.
A particular point of emphasis made by
Luján-Mora et al. (2004) is the requirement
for multiple, complementary diagrams at dif-
ferent levels of detail. The authors propose four
different layers of data mappings, specically,
(a) the database level, where the involved data-
bases are represented as UML packages, (b) the
dataow level, where the relationships among
source and target relations are captured, each
in a single UML package, (c) the table level,
where the dataow diagram is zoomed in and
each individual transformation is captured as
a package, and (d) the attribute level, which
offers a zoom-in to a table-level data mapping
diagram, with all the attributes and the individual
attribute level mappings captured.
State-of-the-Art at the Logical
Level
Apart from the conceptual modeling process
that constructs a rst design of the ETL process,
once the process has been implemented, there
is a need to organize and document the meta-
information around it. The organization of the
metadata for the ETL process constitutes its
logical level description – much like the system’s
catalog acts as the logical level description of
a relational database.
Davidson & Kosky (1999) present WOL,
a Horn-clause language, to specify transforma-
tions between complex types. The transforma-
tions are specied as rules in a Horn-clause
language. An interesting idea behind this ap-
proach is that a transformation of an element
can be decomposed to a set of rules for its ele-
ments, thus avoiding the difculty of employing
complex denitions.
As already mentioned, the rst attempt
towards a systematic description for the meta-
data of the ETL process go back to the works
by Stöhr et al. (1999) and Vassiliadis, Quix,
Vassiliou & Jarke (2001). This research has
been complemented by the approach of Vas-
siliadis, Simitsis & Skiadopoulos (DMDW