July 27th, 2008, Birmingham, UK c/o MKM 2008
We describe a general schema for automated journal production entirely based on a LaTeX input format. We will try to show how the very basic ideas that initiated the whole effort turned into an efficient system because of the ability of LaTeX markup to parametrise simultaneously and without compromise high typographical quality for the PDF output as well as accurate XML metadata with (presentation) MathML formulas. Which was made possible by the availability of two entirely independent LaTeX-source processors with specific focus but full TeX-macro language support: pdfLaTeX by Han The Thanh, and Tralics by José Grimm.
We will further report on explorations we made to produce some sort of structured full-text XML format that could serve as a better metadata (for structure-tuned, math-aware searching or ranking) than the flat text extracted from the PDF we currently use.
The ultimate goal of our experiments would be to produce some XML form of the full text that could be simultaneously used for all purposes (from metadata to end-user consumption, in a hopefully accessible manner). While software exists that can convert any unsupported construct to a bitmap image, it seems completely out of sight to convert anything in a current article to pure XML. Nevertheless, we support further investigation in two directions: implementing SVG output from Tralics, possibly for a limited set of graphical languages as a start (Metapost, PGF,...); enriching the graphical PDF output with parallel alternative representations keyed to the pages displayed in such a manner that you could fallback on an accessible version of, or copy the LaTeX source code associated to a formula.
|
|
|
|
|
Comments/questions/inquiries: to be sent to:
dml2008 at easychair dot org.
Last modification: