Fri, 07 Oct 2005
The joy of XML
I am reworking a mechanism for exporting data from IS MU to an external database. Because the data form a tree-like structure, I have decided to use XML as a data-interchange format. I knew basic facts about XML, but this was the first time I got my feet wet with it. So far it is pretty unpleasant experience.
Firstly it is necessary to describe the structure of the data. The XML world offers about half a dozen different incompatible ways to do it - for example DTD, XML Schema, or Relax NG. Each of those technologies show pretty well the main problem of declarative definitions of anything: declarative definitions are either not strong enough to express what you want (DTD, server-side includes in HTML, etc.), or they are bound to contain something close to programming language (like PHP has evolved almost into a programming language, or for example XML Schema allows to write regular expressions or to specify minimum/maximum value of an integer number). Why they are trying to be both declarative and expressive-enough? The result simply has to be ugly, yet not expressive enough for some cases.
I have decided to go with the XML Schema. So the next question is, what is a valid XML Schema. Interestingly enough, the XML Schema can itself be defined as an XML schema . So what happens, if you try to check whether the XMLSchema.xsd file is a valid XML Schema? I have tried the XML::Validator::Schema Perl module, and some web-based validator, which I cannot remember now. Both had problems with validity of the XML Schema definition.
My other complaint is that XML is too much verbose. Why the tag name should be repeated at the end of the group? And why they often use the namespace prefix in the schema definition, even though it is the sole namespace in the whole document? This makes the whole XML file both harder to write, and harder to check for syntactical and logical errors. And why the namespace is usually labeled as URL, even though it is an arbitrary string and the referenced URL even does not have to exist? It looks as an URL, but it is not. Another point against readability. And after all, they do not use the URL directly, but instead it is immediately mapped to an abbreviation, like xsi: or something like that.
Another problem is the problem of XML-handling libraries. There are many, such as libxml2 or expat (not to mention Java-based solutions). And each of them have its own Perl front-end, usually with an API 1:1 mapped from C instead of an API designed for ease of use. For example, it requires lot of not-so-pretty code to allow your element-handler throw an exception, which you want to be reported together with the line number in the source XML file.
That said, XML has also few good features: the concept of well-formedness, character set specified inside the file itself, XSLT transforms, the fact that the XML Schema definition or XSLT definition are also XML files, etc. However, I can imagine there can be a less-verbose and less-bloated technology, which would serve the same purposes as XML.