FastParser

http://www.japisoft.com

Contact
: http://www.japisoft.com/feedback.html

v1.6.2



Features
:
* Benchmarks on Linux/JDK1.4.1 with 100 iterations on a 16Ko XML file in "normal" parsing mode

FastParser is an XML parser for non validating XML processing, it is based on a Turing finite state automaton.

Benchmark parsing result with JDK1.4 usage  :

File FastParser
Xerces 2
JDK 1.4 parser
test2.xml (16089 bytes)
First:71 ms / Last:3 ms / Average:4 ms
First:397 ms/ Last:6 ms/ Average:13 ms
First:13 ms/Last:12ms/Average:7 ms

FastParser is provided with a single fp.jar package : You may have to use only one or several subparts.

Package sample
Role
fp.jar
Core parser / SAX1, SAX2, JAXP, DOM, Swing node
sax2.jar SAX API (mainly interfaces)
dom.jar DOM API (mainly interfaces)

FastParser is a shareware, it is free to try for 30 days, else you must register the full version at : http://www.japisoft.com/buy.html

I.
Simple Parsing
II.
Swing usage
III.
Tree walker usage and SimpleNode facilities
IV.
SAX 1 usage
V.
SAX 2 usage
VI.
JAXP usage
VII.
XSLT usage
VIII.
DOM usage
IX.
Parsing mode and optimization
X.
JDOM usage

Note :  All following sample are fully available from the samples directory.

I. Parsing simple usage

Required package : fp.jar

import java.io.*;

import java.util.*;

import com.japisoft.fastparser.*;
import com.japisoft.fastparser.node.*;
import com.japisoft.fastparser.walker.*;

/**
 * Simple sample of parsing
 */
public class Demo {

    public static void main( String[] args ) throws Throwable {
        Parser p = new Parser();
        p.setInputStream( new FileInputStream( args[ 0 ] ) );
        System.out.println( "Parsing " + args[ 0 ] );
        p.parse();
        SimpleNode root = (SimpleNode)p.getDocument().getRoot();
        System.out.println( "Parsing root result = " + root );
    }
}

This is a demonstration without SAX and DOM from the FastParser core package (fp-parser.jar). SimpleNode contains a lot
of facilities like (cloning, navigating, attributes, mutation....). Note that this node can be easily replaced thanks to the NodeFactory class.

XPath subset is available in two ways :

II. Parsing with Nodes for the Swing JTree

Required package : fp.jar

Here a sample of Nodes change for Swing JTree TreeNode.


import java.awt.*;
import java.awt.event.*;
import java.io.*;
import javax.swing.*;
import javax.swing.tree.*;

import com.japisoft.fastparser.*;
import com.japisoft.fastparser.tools.*;
import com.japisoft.fastparser.document.*;

/**
* Demonstration of parsing with a Jtree
 */
public class Demo extends JFrame implements ActionListener {

private JTree tree;

    public Demo() {
        super();
        JButton b = new JButton( "Click to Select an XML file (use /xml-data)" );
        b.addActionListener( this );
        getContentPane().add( BorderLayout.NORTH, b );
        getContentPane().add( BorderLayout.CENTER,new JScrollPane( tree = new JTree() ) );
        setSize( 300, 400 );
        setTitle( "Swing demo" );
    }

    public void actionPerformed( ActionEvent e ) {
        // XML Selection
        JFileChooser chooser = new JFileChooser();
        int returnVal = chooser.showOpenDialog( this );
        if ( returnVal == JFileChooser.APPROVE_OPTION ) {
            File f = chooser.getSelectedFile();
            try {
             Parser p = new Parser();
            p.setNodeFactory( new SwingNodeFactory() );
p.setInputStream( new FileInputStream( f ) );
            p.parse();
Document d = p.getDocument();  
tree.setModel( new DefaultTreeModel( (TreeNode)d.getRoot() ) );
            } catch( Throwable th ) {
               th.printStackTrace();
               JOptionPane.showMessageDialog( null, "Error", th.getMessage(),JOptionPane.ERROR_MESSAGE );
            }
        }
    }

    public static void main( String[] a ) {
        Demo d = new Demo();
        d.setVisible( true );
    }
}

In this sample, we provide to the parser a custom NodeFactory (SwingNodeFactory) that generates Node supporting the javax.swing.tree.TreeNode interface.
Such nodes are created from the SimpleNode.

III. Navigate easily with Walkers and SimpleNode methods

Required package : fp.jar

Walker is a toolkit for navigating between tags, it supports a deeply and non deeply mode.


import java.io.*;

import java.util.*;

import com.japisoft.fastparser.*;
import com.japisoft.fastparser.node.*;
import com.japisoft.fastparser.walker.*;

/**
 * Simple of the <code>TreeWalker</code> for navigating in the parsed tree
 */
public class Demo {

    public static void main( String[] args ) throws Throwable {
        Parser p = new Parser();
        p.setInputStream( new FileInputStream( args[ 0 ] ) );
        p.parse();
        SimpleNode root = (SimpleNode)p.getDocument().getRoot();
        TreeWalker t = new TreeWalker( root );
        Enumeration enum = t.getTagNodeByName( "loc", true );
        System.out.println( "Show loc tag" );
        while ( enum.hasMoreElements() ) {
            System.out.println( "Match node :" + enum.nextElement() );
        }
    }
}

This sample parses a document and prints all "loc" tag node.

TreeWalker and SimpleNode uses also supports criterias for finding subnodes. Criterias can be tied for building
complex searching instruction. This sample 'new OrCriteria( new NodeNameCriteria( "b" ), new NodeNameCriteria( "c" ) )'
will find any subnode matching 'b' or 'c'.

import java.io.*;
import java.util.*;
import com.japisoft.fastparser.*;
import com.japisoft.fastparser.node.*;
import com.japisoft.fastparser.walker.*;

/**
* Tests for multiple criterias node search
* @author "Alexandre Brillant"
* @version 1.0 */
public class Tests {
public Tests() {
super();
}

public static void main( String[] args ) throws Throwable {
Parser p = new Parser();
p.setInputStream( new FileInputStream( args[ 0 ] ) );
p.parse();
SimpleNode sn = ( SimpleNode )p.getDocument().getRoot();
System.out.println( "V1:" + sn.getAttribute( "v1", "test Bad" ) );
System.out.println( "V2:" + sn.getAttribute( "v2", "test Good" ) );
System.out.println( "Search b:" );
Enumeration enum;
enum = sn.getNodeByName( "b", true );
while ( enum.hasMoreElements() )
System.out.println( "-" + enum.nextElement() );
System.out.println( "Search id:" );
enum = sn.getNodeByCriteria( new AttributeCriteria( "id" ), true );
while ( enum.hasMoreElements() )
System.out.println( "-" + enum.nextElement() );
System.out.println( "Search text BB:" );
enum = sn.getNodeByCriteria( new TextCriteria( "BB" ), true );
while ( enum.hasMoreElements() )
System.out.println( "-" + enum.nextElement() );
System.out.println( "Search b ou c" );
enum = sn.getNodeByCriteria( new OrCriteria( new NodeNameCriteria( "b" ), new NodeNameCriteria
( "c" ) ), true );
while ( enum.hasMoreElements() )
System.out.println( "-" + enum.nextElement() );
}
}

XML sample for tests :

<?xml version="1.0"?>

<a v1="ok2">
<b id="1" id2="2">
>c id="2"<
>/c<
<text1>AABBCC</text1>
<b
a="ok">AAABB
</a>


Result :

V1:ok2
V2:test Good

Search b:
-b
-b

Search id:
-b
-c

Search text BB:
-AABBCC
-AAABB

Search b ou c
-b
-c
-b

IV. SAX 1 usage

Required packages : fp.jar, sax2.jar
 
The SaxParser class is for SAX level 1, The SaxParser2 class is for SAX level 2.


import org.xml.sax.*;
import org.xml.sax.helpers.*;
import com.japisoft.fastparser.sax.*;

import java.io.*;

/**
 * Sample of Sax usage inside the Xerces API. This is
 * a case of <code>FastParser</code> integration without changing
 * your DOM API usage.
 *
 * This class shows all SAX 1 event during parsing. */
public class Demo implements DocumentHandler {

    public void setDocumentLocator (Locator locator) {
    }

    public void startDocument()
        throws SAXException {
        System.out.println( "- start document" );
    }

    public void endDocument()
        throws SAXException {
        System.out.println( "- end document" );
    }

    public void startElement(String name, AttributeList atts)
        throws SAXException {
        System.out.println( "* start tag " + name + " / " + printAttributes( atts ) );
    }

    public void endElement(String name)
        throws SAXException {
        System.out.println( "* end tag" + name );
    }

    public void characters(char ch[], int start, int length)
        throws SAXException {
        System.out.println( "+ text [" + new String( ch ) + "]" );
    }

public void ignorableWhitespace(char ch[], int start,int length)
        throws SAXException {
    }

    public void processingInstruction(String target, String data)
        throws SAXException {
        System.out.println( "! instruction " + target + " " + data );
    }

    private String printAttributes( AttributeList atts ) {
        StringBuffer s = new StringBuffer();
        if ( atts != null ) {
            for ( int i = 0; i < atts.getLength(); i++ ) {
s.append( atts.getName( i ) + "=" + atts.getValue( i ) ).append( " /" );
            }
        }
        return s.toString();
    }

    public static void main( String[] args ) throws Throwable {
        System.out.println( "SAX usage sample" );
        SaxParser p = new SaxParser();
        p.setDocumentHandler( new Demo() );
        p.parse( new InputSource( new FileInputStream( args[ 0 ] ) ) );
    }
}

This sample prints all SAX 1 events (DocumentHandler interface)

Note :
- For Namespaces, such parser is not adapted you have to use rather the Sax2Parser.

V. Sax 2 usage

Required packages : fp.jar, sax2.jar
 
The SaxParser class is for SAX level 1, The SaxParser2 class is for SAX level 2.

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import com.japisoft.fastparser.sax.*;
import java.io.*;

/**
* Sample of Sax usage. This is a case of FastParser integration without changing
* your DOM API usage. This class shows all SAX 2 event during parsing.
*/
public class Demo implements ContentHandler {

public void setDocumentLocator (Locator locator) {
}

public void startDocument()
throws SAXException {
System.out.println( "- start document" );
}

public void endDocument()
throws SAXException {
System.out.println( "- end document" );
}

public void startPrefixMapping (String prefix, String uri)
throws SAXException {
System.out.println( "- startPrefixMapping " + prefix + " / uri=" + uri );
}

public void endPrefixMapping (String prefix)
throws SAXException {
System.out.println( "- endPrefixMapping " + prefix );
}

public void startElement( String uri, String localName, String qname, Attributes atts)
throws SAXException {
System.out.println( "* start tag uri=" + uri + " localname=" + localName + " qname=" + qname + " " + atts );
}

public void endElement(String uri, String localName, String qname)
throws SAXException {
System.out.println( "* end tag uri=" + uri + " localname=" + localName + " qname=" + qname );
}

public void characters(char ch[], int start, int length)
throws SAXException {
System.out.println( "+ text [" + new String( ch ) + "]" );
}

public void ignorableWhitespace(char ch[], int start, int length)
throws SAXException {

}

public void processingInstruction(String target, String data)
throws SAXException {
System.out.println( "! instruction " + target + " " + data );
}

public void skippedEntity( String name ) throws SAXException {

}

public static void main( String[] args ) throws Throwable {
System.out.println( "SAX 2 usage sample" );
Sax2Parser p = new Sax2Parser();
p.setContentHandler( new Demo() );
p.parse( new InputSource( new FileInputStream( args[ 0 ] ) ) );
System.exit( 0 );
}
}

This sample prints all SAX 2 events (ContentHandler interface).

Note : FastParser bufferizes the XML document before parsing that is good for short or medium file but not for heavy content. So for the last case, you
may disable this feature by invoking bufferingMode(false).

VI. JAXP usage

Required packages : fp.jar, sax2.jar

JDK integrates a facility for plugging Parser.

...
System.setProperty( "javax.xml.parsers.SAXParserFactory", "com.japisoft.fastparser.tools.FPSAXParserFactory" );
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
System.out.println( "Parsing " + args[ 0 ] );
XMLReader reader = sp.getXMLReader();
reader.setContentHandler( new Demo() );
reader.parse( new InputSource( args[ 0 ] ) );
...

You just have to reset the 'javax.xml.parsers.SAXParserFactory' system property in two ways :

VII. XSLT usage

XSLT is a sample of FastParser JAXP support. It increases all the XSLT velocity.

import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import java.io.*;

/**
* Sample of usage with the JDK1.4 / XSL Transformer
*/
public class Demo {

public static void main( String[] args ) throws Throwable {
if ( args.length != 3 ) {
System.out.println( "usage : [XSL file] [XML file] [Output file]" );
System.exit( 1 );
}

String xslFile = args[ 0 ];
String xmlFile = args[ 1 ];
String outFile = args[ 2 ];

// Here I replace the default SAXParserFactory by fastParser parser for XSL parsing
System.setProperty( "javax.xml.parsers.SAXParserFactory", "com.japisoft.fastparser.tools.FPSAXParserFactory" );

TransformerFactory tFactory =
TransformerFactory.newInstance();

Transformer transformer = tFactory.newTransformer( new StreamSource( xslFile ) );

transformer.transform( new StreamSource( xmlFile ),
new StreamResult( new FileOutputStream( outFile ) ) );

}
}

VIII. DOM Usage

Required package : fp.jar, dom.jar

DOM API used the FastParser NodeFactory for producing adapted node.


import com.japisoft.fastparser.*;
import com.japisoft.fastparser.document.Document;
import com.japisoft.fastparser.dom.*;


import java.io.*;

import org.w3c.dom.*;

/**
 * Here a sample of DOM usage. This sample parse and creates a DOM tree,it walks
 * through it and writes the tree structure.
 *
 * @author (c) 2002-2003 JAPISOFT
 * @version 1.0
 */
public class Demo {

public Demo() {
        super();
    }

static void walk( Node n ) {
        if ( n instanceof Text ) {
            System.out.println( "TEXT" );
        } else
            
if ( n instanceof Comment ) {
              System.out.println( "COMMENT" );
} else           
if ( n instanceof Element ) {
System.out.println( ( (Element)n ).getTagName() );
NodeList l = ( (Element)n ).getChildNodes();       
for ( int i = 0; i < l.getLength(); i++ ) {           
walk( l.item( i ) );
}   
}
    }

    public static void main( String[] args ) throws Throwable {
        Parser p  = new Parser();
        p.setNodeFactory( new DomNodeFactory() );
        // Parse the first argument file
        System.out.println( "Parse " + args[ 0 ] );
        p.setInputStream( new FileInputStream( args[ 0 ] ) );
        p.parse();
        Document d = p.getDocument();
        // Extract the DOM root
        Element e = ( Element )d.getRoot();
        System.out.println( "SHOW FOUND TAGS" );
        System.out.println( "ROOT = "+ e );
        walk( e );
    }

}

This sample navigates through DOM API node parsing result

IX. Parsing mode and optimization

FastParser offers the following parsing mode :
These modes are available using the setParserMode method on the Parser class.

Here options that can impact for the velocity too :

Parser property
Property role
cdataEnabled
by default true
preserveComment
by default false
mandatoryProlog
by default true
parsingMode
by default MEDIUM_PARSING_MODE
preserveWhiteSpace
by default false
enabledNameSpace
by default true
bufferingMode
by default true

X.JDOM usage

A sample of JDom usage is available in the samples/JDom directory containing too JDOM Beta 9

import org.xml.sax.*;
import com.japisoft.fastparser.sax.*;

import org.jdom.output.*;
import org.jdom.input.*;
import org.jdom.*;

import java.io.*;

/**
* Sample of JDOM usage printing the argument XML file
* @author (c) 2002-2003 JAPISOFT
* @version 1.0
* @since 1.0 */
public class Demo {

public static void main( String[] args ) throws Throwable {
System.out.println( "JDOM usage sample" );
// Prepare Parsing with SAX2
Sax2Parser p = new Sax2Parser();
SAXHandler h = new SAXHandler();
p.setContentHandler( h );
p.setErrorHandler( h );
// Parse it
p.parse( new InputSource( new FileInputStream( args[ 0 ] ) ) );
// JDom document
Document d = h.getDocument();

// Print it
XMLOutputter out = new XMLOutputter();
System.out.println( "----------------" );
System.out.println( out.outputString( d ) );
}
}



(c) 2002-2003 Alexandre Brillant / JAPISOFT