LivePAGE - A multimedia database system to supp ort
World-Wide Web development
Donald D. Cowan
Department of Computer Science
University of Waterlo o
Waterloo ON N2L 3G1
E-mail dcowan@csg.uwaterloo.ca
Daniel M. German
Department of Computer Science
University of Waterlo o
Waterloo ON N2L 3G1
E-mail dmg@csg.uwaterloo.ca
Eric Mackie
Inforium Technologies Inc.
158 University Ave. W.
Waterloo ON N2L 3E9
E-mail eric@inforium.com
Abstract
The rampant growth of the World-Wide Web
(WWW) is largely a consequence of its simplicity.
A typical person can quickly learn HTML and
start creating WWW pages in an afternoon. As
WWW sites b ecome larger and more complex,
this inherent simplicity causes multiple problems
as many of the current tools and techniques are
stretched to address issues they were not designed
to handle. These problems are compounded by
the rapid proliferation of solutions which are often
fairly ad-ho c in nature. In this paper we present
a layered model which through its tools and
techniques provide a more disciplined approach to
constructing and maintaining a WWW site. We
then describe an implementation of this mo del
which is based on two more mature technologies;
SGML and relational database systems.
Keywords
Architecture, World-Wide Web, mul-
timedia, SGML, Web site development, do cument
database, hypermedia.
1 Intro duction
The accessibility of simple-to-use, but p owerful in-
terfaces or browsers, and the apparent simplicity of
HTML has prompted the development of literally
millions of hypermedia World-Wide Web (WWW)
application sites. In many of these applications
both the size and number of the source do cuments
are suciently small that each application can be
developed and maintained by a single p erson with-
out the assistance of any metho dology or to ols (ex-
cept for a text editor). In contrast, the develop er
Proceedings of the Second Australian Docu-
ment Computing Symp osium, Melbourne, Aus-
tralia, April 5, 1997.
of a large WWW site is constantly struggling to
master the complexity involved in the design, devel-
opment and maintenance of such a site with scores
of pages and links.
The WWW is a large collection of interrelated
resources, linked through WWW pages tagged
using HTML (HyperText Markup Language [3]).
HTML pages can point to other resources on the
WWW, such as images, video, sound and text.
Each resource on the WWW has, at least, one
address, known as a URL (Uniform Resource
Locator, see [4]). URLs are strongly tied to the
le system of the machine in which they reside,
and often rely on common dierences b etween le
systems such as case-sensitivity and the length
of le names. This reliance makes p ortability of
documents b etween machines dicult.
The WWW is quite relaxed ab out the typ e of
its comp onents. Although HTML is dened by
an SGML DTD,
1
an HTML document rarely com-
plies with this denition.
2
Consequently, a WWW
server normally relies solely on the extension to a
le's name to determine its type.
Because of the fo cus on the markup language
HTML, most of the to ols pro duced to date are
oriented toward editing HTML les. Very little
research has b een directed toward authoring sys-
tems that manipulate WWW sites as a collection
of nodes and links, and that view HTML more as
a presentation language than as a storage format.
The WWW is thriving mainly b ecause of its
simplicity. A typical p erson can quickly learn
1
See [10] for a description of SGML and DTDs
2
Bray found that only 4.9% percent of 3 million HTML
documents he analyzed have a
<
!DOCTYPE declaration
(which is required in a SGML do cument). It is unknown
what percentage of these were actually compliant with any
of the HTML DTDs [6].