JDOM and XML Parsing, Part 1
JDOM makes XML manipulation in Java easier than ever.
hances are, you’ve probably used one of a
number of Java libraries to manipulate
XML data structures in the past. So what’s
the point of JDOM (Java Document Object
Model), and why do developers need it?
JDOM is an open source library for
Java-optimized XML data manipulations. Although it’s
similar to the World Wide Web Consortium’s (W3C)
DOM, it’s an alternative document object model that
was not built on DOM or modeled after DOM. The
main difference is that while DOM was created to be
language-neutral and initially used for JavaScript
manipulation of HTML pages, JDOM was created to be
Java-specific and thereby take advantage of Java’s
features, including method overloading, collections,
reflection, and familiar programming idioms. For Java
programmers, JDOM tends to feel more natural and
“right.” It’s similar to how the Java-optimized remote
method invocation library feels more natural than the
language-neutral Common Object Request Broker
Architecture.
You can find JDOM at www.jdom.org under an open
source Apache-style (commercial-friendly) license. It’s
collaboratively designed and developed and has mailing
lists with more than 3,000 subscribers. The library has
also been accepted by Sun’s Java Community Process
(JCP) as a Java Specification Request (JSR-102) and is
on track to become a formal Java specification.
The articles in this series will provide a technical
introduction to JDOM. This article provides information
about important classes. The next article will give you a
feel for how to use JDOM inside your own Java programs.
THE JDOM PACKAGE STRUCTURE
The JDOM library consists of six packages. First, the
org.jdom package holds the classes representing an
XML document and its components:
Attribute,
CDATA, Comment, DocType, Document, Element, EntityRef,
Namespace, ProcessingInstruction, and Text. If you’re
familiar with XML, the class names should be self-
explanatory.
Next is the
org.jdom.input package. which holds
classes that build XML documents. The main and most
important class is
SAXBuilder. SAXBuilder builds a
document by listening to incoming SAX events and
constructing a corresponding document. When you want
to build from a file or other stream, you use
SAXBuilder.
It uses a SAX parser to read the stream and then builds
the document according to the SAX parser callbacks. The
good part of this design is that as SAX parsers get faster,
SAXBuilder gets faster. The other main input class is
DOMBuilder. DOMBuilder builds from a DOM tree. This
class comes in handy when you have a preexisting DOM
tree and want a JDOM version instead.
There’s no limit to the potential builders. For example,
now that Xerces has the Xerces Native Interface (XNI) to
operate at a lower level than SAX, it may make sense to
write an
XNIBuilder to support some parser knowledge
not exposed via SAX. One popular builder that has been
contributed to the project is the
ResultSetBuilder. It
takes a JDBC result set and creates an XML document
representation of the SQL result, with various
configurations regarding what should be an element and
what should be an attribute.
The
org.jdom.output package holds the classes that
output XML documents. The most important class is
XMLOutputter. It converts documents to a stream of
bytes for output to files, streams, and sockets. The
XMLOutputter has many special configuration options
supporting raw output, pretty output, or compressed
output, among others. It’s a fairly complicated class.
That’s probably why this capability still doesn’t exist
in DOM Level 2.
Other outputters include the
SAXOutputter, which
generates SAX events based on the document content.
Although seemingly arcane, this class proves extremely
useful in XSLT transforms, because SAX events can be a
more efficient way than bytes to transfer document data
to an engine. There’s also a
DOMOutputter, which builds a
DOM tree representation of the document. An
C
68 SEPTEMBER/OCTOBER 2002 OTN.ORACLE.COM/ORACLEMAGAZINE
GETTY ONE/EYEWIRE