http://www.megginson.com/SAX. Although SAX has been ported to several other
languages, we will focus on the Java features. SAX is only responsible for scanning through XML
data top to bottom and sending event notifications as elements, text, and other items are
encountered; it is up to the recipient of these events to process the data. SAX parsers do not
store the entire document in memory, therefore they have the potential to be very fast for even
huge files.
[2]
One does not generally need to download SAX directly because it is supported by and included with all of
the popular XML parsers.
Currently, there are two versions of SAX: 1.0 and 2.0. Many changes were made in version 2.0,
and the SAX examples in this book use this version. Most SAX parsers should support the older
1.0 classes and interfaces, however, you will receive deprecation warnings from the Java
compiler if you use these older features.
Java SAX parsers are implemented using a series of interfaces. The most important interface is
org.xml.sax.ContentHandler , which has methods such as startDocument( ) ,
startElement( ) , characters( ) , endElement( ) , and endDocument( ) . During the
parsing process, startDocument( ) is called once, then startElement( ) and
endElement( ) are called once for each tag in the XML data. For the following XML:
<first>George</first>
the startElement( ) method will be called, followed by characters( ), followed by
endElement( ). The characters( ) method provides the text "George" in this example.
This basic process continues until the end of the document, at which time endDocument( ) is
called.
Depending on the SAX implementation, the characters( )
method may break up contiguous character data into several
chunks of data. In this case, the characters( ) method will
be called several times until the character data is entirely
parsed.
Since ContentHandler is an interface, it is up to your application code to somehow implement
this interface and subsequently do something when the parser invokes its methods. SAX does
provide a class called DefaultHandler that implements the ContentHandler interface. To
use DefaultHandler, create a subclass and override the methods that interest you. The other
methods can safely be ignored, since they are just empty methods. If you are familiar with AWT
programming, you may recognize that this idiom is identical to event adapter classes such as
java.awt.event.WindowAdapter.
Getting back to XSLT, you may be wondering where SAX fits into the picture. It turns out that
XSLT processors typically have the ability to gather input from a series of SAX events as an
alternative to static XML files. Somewhat nonintuitively, it also turns out that you can generate
your own series of SAX events rather easily -- without using a SAX parser. Since a SAX parser
just calls a series of methods on the ContentHandler interface, you can write your own
pseudo-parser that does the same thing. We will explore this in Chapter 5 when we talk about
using SAX and an XSLT processor to apply transformations to non-XML data, such as results
from a database query or content of a comma separated values (CSV) file.
1.2.4.2 DOM