XML, XSL and TeX: Room for Cooperation

Michel Goossens and Sebastian Rahtz

For the full presentation, click here.

ABSTRACT

This presentation will describe how the immensely stable, sophisticated and free TeX typesetting system can be used for high-quality batch formatting of XML documents. We will briefly cover the `traditional' ways of converting XML to TeX, but the main part of the talk will be about using LaTeX to process XSL formatting objects. We will also comment on the issues surrounding conversion of legacy LaTeX documents to XML

The XML/XSL/LaTeX formatting system works by applying a full XSL style sheet to an XML document, and then running TeX on the resulting XML file. The TeX setup is in two parts. Firstly, there a generic package to configure TeX to recognise < > and so on, and to process attributes. Secondly, there is a package to instantiate the ``fo:'' XSL entities and handle their characteristics. Parsing UTF8 input also requires special handling.

Although this work only handles a subset of XSL formatting objects so far, we will show that it is robust and reasonably fast; we will discuss the features of TeX (including direct PDF generation) that make it a good candidate for the task.

Personal Notes: Sebastian Rahtz (sebastian.rahtz@oucs.ox.ac.uk) and Michel Goossens (m.goossens@cern.ch) are technical documentation specialists at, respectively, Oxford University Computing Services and the European Particle Physics Laboratory, CERN. They have been heavily involved in TeX for over a decade, and have published widely on LaTeX. They are particularly concerned with the use of SGML/XML for technical and scientific publication.