Perl and XML
age 13
HTML is indeed successful, but it has limitations. It's a very small language, and not very descriptive. It is closer
to troff in function than to DocBook and other SGML applications. It has tags like
<i> and <b> that change the
font style without saying why. Because HTML is so limited and at least partly presentational, it doesn't represent
an overwhelming success for SGML, at least not in spirit. Instead of bringing the power of generic coding to the
people, it brought another one-trick pony, in which you could display your content in a particular venue and
couldn't do much else with it.
Thus, the standards folk decided to try again and see if they couldn't arrive at a compromise between the
descriptive power of SGML and the simplicity of HTML. They came up with the Extensible Markup Language
(XML). The "X" stands for "extensible," pointing out the first obvious difference from HTML, which is that
some people think that "X" is a cooler-sounding letter than "E" when used in an acronym. The second and more
relevant difference is that your documents don't have to be stuck in the anemic tag set of HTML. You can extend
the tag namespace to be as descriptive as you want - as descriptive, even, as SGML. Voilà! The bridge is built.
By all accounts, XML is a smashing success. It has lived up to the hype and keeps on growing: XML-RPC,
XHTML, SVG, and DocBook XML are some of its products. It comes with several accessories, including XSL
for formatting, XSLT for transforming, XPath for searching, and XLink for linking. Much of the standards work
is under the auspices of the World Wide Web Consortium (W3C), an organization whose members include
Microsoft, Sun, IBM, and many academic and public institutions.
The W3C's mandate is to research and foster new technology for the Internet. That's a rather broad statement, but
if you visit their site at http://www.w3.org/ you'll see that they cover a lot of bases. The W3C doesn't create,
police, or license standards. Rather, they make recommendations that developers are encouraged, but not
required, to follow.
4
However, the system remains open enough to allow healthy dissent, such as the recent and interesting case of
XML Schema, a W3C standard that has generated controversy and competition. We'll examine this particular
story further in Chapter 3. It's strong enough to be taken seriously, but loose enough not to scare people away.
The recommendations are always available to the public.
Every developer should have working knowledge of XML, since it's the universal packing material for data, and
so many programs are all about crunching data. The rest of this chapter gives a quick introduction to XML for
developers.
2.2 Markup, Elements, and Structure
A markup language provides a way to embed instructions inside data to help a computer program process the
data. Most markup schemes, such as troff, TeX, and HTML, have instructions that are optimized for one
purpose, such as formatting the document to be printed or to be displayed on a computer screen. These
languages rely on a presentational description of data, which controls typeface, font size, color, or other media-
specific properties. Although such markup can result in nicely formatted documents, it can be like a prison for
your data, consigning it to one format forever; you won't be able to extract your data for other purposes without
significant work.
That's where XML comes in. It's a generic markup language that describes data according to its structure and
purpose, rather than with specific formatting instructions. The actual presentation information is stored
somewhere else, such as in a stylesheet. What's left is a functional description of the parts of your document,
which is suitable for many different kinds of processing. With proper use of XML, your document will be ready
for an unlimited variety of applications and purposes.
4
When a trusted body like the W3C makes a recommendation, it often has the effect of a law; many developers
begin to follow the recommendation upon its release, and developers who hope to write software that is compatible
with everyone else's (which is the whole point behind standards like XML) had better follow the recommendation as
well.