Walkthrough: How to use the TQt SAX2 classes

For a general discussion of the XML topics in TQt please refer to the document XML Module. To learn more about SAX2 see the document describing the TQt SAX2 implementation.

Before reading on you should at least be familiar with the Introduction to SAX2.

A tiny parser

In this section we will present a small example reader that outputs the names of all elements in an XML document on the command line. The element names are indented corresponding to their nesting level.

As mentioned in Introduction to SAX2 we have to implement the functions of the handler classes that we are interested in. In our case these are only three: TQXmlContentHandler::startDocument(), TQXmlContentHandler::startElement() and TQXmlContentHandler::endElement().

For this purpose we use a subclass of the TQXmlDefaultHandler (remember that the special handler classes are all abstract and the default handler class provides an implementation that does not change the parsing behavior):

/****************************************************************************
** $Id: qt/structureparser.h   3.3.8   edited Jan 11 14:37 $
**
** Copyright (C) 1992-2007 Trolltech ASA.  All rights reserved.
**
** This file is part of an example program for TQt.  This example
** program may be used, distributed and modified without limitation.
**
*****************************************************************************/

#ifndef STRUCTUREPARSER_H
#define STRUCTUREPARSER_H

#include <ntqxml.h>

class TQString;

class StructureParser : public TQXmlDefaultHandler
{
public:
    bool startDocument();
    bool startElement( const TQString&, const TQString&, const TQString& ,
                       const TQXmlAttributes& );
    bool endElement( const TQString&, const TQString&, const TQString& );

private:
    TQString indent;
};

#endif

Apart from the private helper variable indent that we will use to get indentation right, there is nothing special about our new StructureParser class.

Even the implementation is straight-forward:

    #include "structureparser.h"

    #include <stdio.h>
    #include <ntqstring.h>

First we overload TQXmlContentHandler::startDocument() with a non-empty version.

    bool StructureParser::startDocument()
    {
        indent = "";
        return TRUE;
    }

At the beginning of the document we simply set indent to an empty string because we want to print out the root element without any indentation. Also we return TRUE so that the parser continues without reporting an error.

Because we want to be informed when the parser comes accross a start tag of an element and subsequently print it out, we have to overload TQXmlContentHandler::startElement().

    bool StructureParser::startElement( const TQString&, const TQString&,
                                        const TQString& qName,
                                        const TQXmlAttributes& )
    {
        printf( "%s%s\n", (const char*)indent, (const char*)qName );
        indent += "    ";
        return TRUE;
    }

This is what the implementation does: The name of the element with preceding indentation is printed out followed by a linebreak. Strictly speaking qName contains the local element name without an eventual prefix denoting the namespace.

If another element follows before the current element's end tag it should be indented. Therefore we add four spaces to the indent string.

Finally we return TRUE in order to let the parser continue without errors.

The last functionality we need to add is the parser's behaviour when an end tag occurs. This means overloading TQXmlContentHandler::endElement().

    bool StructureParser::endElement( const TQString&, const TQString&, const TQString& )
    {
        indent.remove( (uint)0, 4 );
        return TRUE;
    }

Obviously we then should shorten the indent string by the four whitespaces added in startElement().

With this we're done with our parser and can start writing the main() program.

    #include "structureparser.h"
    #include <ntqfile.h>
    #include <ntqxml.h>
    #include <ntqwindowdefs.h>

    int main( int argc, char **argv )
    {
        if ( argc < 2 ) {
            fprintf( stderr, "Usage: %s <xmlfile> [<xmlfile> ...]\n", argv[0] );
            return 1;
        }

This check ensures that we have a sequence of files from the command line to examine.

        StructureParser handler;

The next step is to create an instance of the StructureParser.

        TQXmlSimpleReader reader;
        reader.setContentHandler( &handler );

After that we set up the reader. As our StructureParser class deals with TQXmlContentHandler functionality only we simply register it as the content handler of our choice.

        for ( int i=1; i < argc; i++ ) {

Successively we deal with all files given as command line arguments.

            TQFile xmlFile( argv[i] );
            TQXmlInputSource source( &xmlFile );

Then we create a TQXmlInputSource for the XML file to be parsed.

            reader.parse( source );

Now we take our input source and start parsing.

        }
        return 0;
    }

Running the program on the following XML file...

<animals>
<mammals>
  <monkeys> <gorilla/> <orangutan/> </monkeys>
</mammals>
<birds> <pigeon/> <penguin/> </birds>
</animals>

... produces the following output:

animals
    mammals
        monkeys
            gorilla
            orang-utan
    birds
        pigeon
        penguin

It will however refuse to produce the correct result if you e.g. insert a whitespace between a < and the element name in your test-XML file. To prevent such annoyances you should always install an error handler with TQXmlReader::setErrorHandler(). This allows you to report parsing errors to the user.