Simple API for XML
SAX (Simple API for XML) is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents.[1] SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially.
Contents |
Definition
Unlike DOM, there is no formal specification for SAX. The Java implementation of SAX is considered to be normative.[2] It is used for state-independent processing of XML documents, in contrast to StAX that processes the documents state-dependently.[3]
Benefits
SAX parsers have certain benefits over DOM-style parsers. The quantity of memory that a SAX parser must use in order to function is typically much smaller than that of a DOM parser. DOM parsers must have the entire tree in memory before any processing can begin, so the amount of memory used by a DOM parser depends entirely on the size of the input data. The memory footprint of a SAX parser, by contrast, is based only on the maximum depth of the XML file (the maximum depth of the XML tree) and the maximum data stored in XML attributes on a single XML element. Both of these are always smaller than the size of the parsed tree itself.
Because of the event-driven nature of SAX, processing documents can often be faster than DOM-style parsers. Memory allocation takes time, so the larger memory footprint of the DOM is also a performance issue.
Due to the nature of DOM, streamed reading from disk is impossible. Processing XML documents larger than main memory is also impossible with DOM parsers, but can be done with SAX parsers. However, DOM parsers may make use of disk space as memory to sidestep this limitation.[citation needed]
Drawbacks
The event-driven model of SAX is useful for XML parsing, but it does have certain drawbacks.
Certain kinds of XML validation require access to the document in full. For example, a DTD IDREF attribute requires that there be an element in the document that uses the given string as a DTD ID attribute. To validate this in a SAX parser, one would need to keep track of every previously encountered ID attribute and every previously encountered IDREF attribute, to see if any matches are made. Furthermore, if an IDREF does not match an ID, the user only discovers this after the document has been parsed; if this linkage was important to building functioning output, then time has been wasted in processing the entire document only to throw it away.
Additionally, some kinds of XML processing simply require having access to the entire document. XSLT and XPath, for example, need to be able to access any node at any time in the parsed XML tree. While a SAX parser could be used to construct such a tree, the DOM already does so by design.
XML processing with SAX
A parser that implements SAX (i.e., a SAX Parser) functions as a stream parser, with an event-driven API.[1] The user defines a number of callback methods that will be called when events occur during parsing. The SAX events include:
- XML Text nodes
- XML Element nodes
- XML Processing Instructions
- XML Comments
Events are fired when each of these XML features are encountered, and again when the end of them is encountered. XML attributes are provided as part of the data passed to element events.
SAX parsing is unidirectional; previously parsed data cannot be re-read without starting the parsing operation again.
Example
Given the following XML document:
<?xml version="1.0" encoding="UTF-8"?> <RootElement param="value"> <FirstElement> Some Text </FirstElement> <?some_pi some_attr="some_value"?> <SecondElement param2="something"> Pre-Text <Inline>Inlined text</Inline> Post-text. </SecondElement> </RootElement>
This XML document, when passed through a SAX parser, will generate a sequence of events like the following:
- XML Element start, named RootElement, with an attribute param equal to "value"
- XML Element start, named FirstElement
- XML Text node, with data equal to "Some Text" (note: text processing, with regard to spaces, can be changed)
- XML Element end, named FirstElement
- Processing Instruction event, with the target some_pi and data some_attr="some_value"
- XML Element start, named SecondElement, with an attribute param2 equal to "something"
- XML Text node, with data equal to "Pre-Text"
- XML Element start, named Inline
- XML Text node, with data equal to "Inlined text"
- XML Element end, named Inline
- XML Text node, with data equal to "Post-text."
- XML Element end, named SecondElement
- XML Element end, named RootElement
Note that the first line of the sample above is the XML Declaration and not a processing instruction; as such it will not be reported as a processing instruction event.
The result above may vary: the SAX specification deliberately states that a given section of text may be reported as multiple sequential text events. Thus in the example above, a SAX parser may generate a different series of events, part of which might include:
- XML Element start, named FirstElement
- XML Text node, with data equal to "Some "
- XML Text node, with data equal to "Text"
- XML Element end, named FirstElement
References
- ^ a b "SAX". http://www.webopedia.com/: WEBOPEDIA. http://www.webopedia.com/TERM/S/SAX.html. Retrieved 2011-05-02. "Short for Simple API for XML, an event-based API that, as an alternative to DOM, allows someone to access the contents of an XML document. SAX was originally a Java-only API. The current version supports several programming language environments other then Java. SAX was developed by the members of the XML-DEV mailing list."
- ^ http://www.saxproject.org/
- ^ "Simple API for XML". http://oracle.com/: ORACLE. http://download.oracle.com/javaee/1.4/tutorial/doc/JAXPSAX.html. Retrieved 2011-05-02. "Note: In a nutshell, SAX is oriented towards state independent processing, where the handling of an element does not depend on the elements that came before. StAX, on the other hand, is oriented towards state dependent processing. For a more detailed comparison, see SAX and StAX in Basic Standards and When to Use SAX."
Further reading
- David Brownell: SAX2, O'Reilly, ISBN 0-596-00237-8
- W. Scott Means, Michael A. Bodie: The Book of SAX, No Starch Press, ISBN 1-886411-77-8
See also
- Document Object Model
- Expat (XML)
- Java API for XML Processing
- LibXML
- List of XML markup languages
- List of XML schemas
- MSXML
- StAX
- Streaming XML
- VTD-XML
- Xerces
- XSL Transformations