Search This Blog

Sunday, 14 November 2010

StAX and its advantages over SAX and DOM


Today, XML has emerged as a versatile and platform independent format for describing and delivering high-value solutions. Services using XML can be accessed from cellular phones, PDAs, and desktops. To use XML meaningfully in an application, it needs to be parsed and the relevant data extracted. There are a variety of ways to achieve this like Simple API for XML (SAX) and Document Object Model (DOM), but more recently a new breed of parsers based on pull-parsing techniques has emerged as the popular choice amongst developers. Most developers are familiar with two approaches for processing XML:
1.      Simple API for XML processing (SAX)
2.       Document Object Model (DOM)
SAX is a standard low-level push API for XML, whose main benefit is efficiency. When parsing an XML document with SAX, events are generated and passed to the application using callbacks to handlers that implement the SAX handler APIs. The events are of a very low level, e.g. startCDATA() and endCDATA(). All of these low level events must be taken into account by the developer and it is also the developer’s responsibility to maintain full document state information during the parse. SAX also requires the entire document to be parsed.
DOM is a platform-independent and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. It is a high-level parsing API whose main benefit is ease of use. DOM presents the application with an in-memory tree-like structure and provides random-access. But, with simplicity, there are some performance disadvantages in DOM. It requires the entire document to be parsed and created as Objects before any part of the document can be processed or any actions taken.

StAX, the Streaming API for XML, was created to address limitations in the two most rampant parsing APIs, SAX and DOM.  It is a bi-directional API for reading and writing XML. It is formally specified by JSR 173 specification. StAX is often referred to as “pull parsing.” Pull API’s are considered to be better for parsing streaming XML. The developer uses a simple iterator based API to “pull” the next XML construct in the document. In StAX, It is possible to skip ahead to areas of interest in the document, get only subsections of the document and arbitrarily stop or suspend processing at any time.



StAX has two basic functions: 
 1. To allow users to read and write XML as efficiently as possible and be easy to use (cursor API). The cursor API has two interfaces:  XMLStreamReader and XMLStreamWriter.
 2. Be easy to extend and allow for easy pipelining (event iterator API). The event iterator API has two main interfaces:  XMLEventReader and XMLEventWriter.


XMLStreamReader is the key interface in StAX. This interface represents a cursor that's moved across an XML document from beginning to end. At any given time, this cursor points at one thing: a text node, a start-tag, a comment, the beginning of the document, etc. The cursor always moves forward, never backward and normally only moves one item at a time.
A typical StAX program begins by using the XMLInputFactory class to load an implementation dependent instance of XMLStreamReader:

URL u = new URL("http://www.cafeconleche.org/");
InputStream in = u.openStream();
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);

Now it's time to actually read the document. The next method in XMLStreamReader advances the cursor to the next item. Various getter methods to extract data from the current item. Some of the most important of these getters include:


public QName 
getName()
public String
getLocalName()
public String
getNamespaceURI()
public String
getText()
public String
getElementText()
public int
getEventType()
public Location
getLocation()
public int
getAttributeCount()
public QName
getAttributeName(int index)
public String
getAttributeValue(String namespaceURI, String localName)
 
For example, here's a simple bit of code that iterates through an XML document and prints out the names of the different elements it encounters:
while (true) {
    int event = parser.next();
    if (event == XMLStreamConstants.END_DOCUMENT) {
       parser.close();
       break;
    }
    if (event == XMLStreamConstants.START_ELEMENT) {
        System.out.println(parser.getLocalName());
    }
}
Advantages of StAX over SAX
1.   SAX as a parser pushes the lower-level event at the application. With StAX, the application is in total control and drives the parser instead parser driving the application.  
2.   With StAX, the client controls the application thread, and can call methods on the parser when needed. In contrast, SAX controls the application thread, and the client can only accept invocations from the parser.
3.   StAX libraries are much smaller and the client code which interacts with those libraries, is much simpler, even for more complex documents.
4.   Pull clients can read multiple documents at one time with a single thread. Eg. When one document includes or imports another document, the application can process the imported document while processing the original document.
5.   A StAX pull parser can filter XML documents such that elements unnecessary to the client can be ignored, and it can support XML views of non-XML data.

Advantages of StAX over DOM                                                                   
1.   SAX as a parser pushes the lower-level event at the application. With StAX, the application is in total control and drives the parser instead parser driving the application.  
2.   With StAX, the client controls the application thread, and can call methods on the parser when needed. In contrast, SAX controls the application thread, and the client can only accept invocations from the parser.
3.   StAX libraries are much smaller and the client code which interacts with those libraries, is much simpler, even for more complex documents.
4.   Pull clients can read multiple documents at one time with a single thread. Eg. When one document includes or imports another document, the application can process the imported document while processing the original document. A StAX pull parser can filter XML documents such that elements unnecessary to the client can be ignored, and it can support XML views of non-XML data.


    





No comments:

Post a Comment