LT PyXML: A Fast Validating XML Parser Embedded in Python

Henry Thompson, University of Edinburgh

For the full presentation, click here.

ABSTRACT

By the time DevCon happens, the HCRC Language Technology Group will have put out a major set of new releases of our free (for non-commercial use) XML API (LT XML), XML editor (XED) and for the first time, the bridge between the two (LT PyXML). This presentation will describe the form the embedding of our LT XML C API into Python takes, and illustrate its use with at least the following three applications:

1) XED
An XML-smart text editor, which maintains well-formedness at all times, supports fast keyboard-only XML document authoring, and with the forthcoming release, makes DTD-compliant authoring fast and easy;
2) XML Schema workbench
A simple Python tool using LT PyXML to graph the archetype lattice implicit in any XML Schema (WD of 6 May) schema, and output a normalised XML DTD as close as possible in coverage as the schema;
3) XML DTD normaliser
An even simpler Python tool using LT PyXML to normalise XML DTDs for comparison with the output of (2).

I'll finish with a comparison between the LT XML API and the DOM, particularly with reference to access to the DTD and to streaming (i.e. not whole tree) access to large documents.