XML - past, present and future
Chewing the fat with MS's Jean Paoli
Last week I had the pleasure of meeting up with Jean Paoli of Microsoft. In November, Jean was presented with the XML Cup 2004 to recognise his lifelong work in XML and its precursor SGML. The meeting gave me an opportunity to hear about the fascinating history of XML and understand some of its importance to Microsoft and the industry.
Jean Paoli was one of the leading members of the original XML working party and he had been working with SGML since 1985. SGML was a mark-up language that was mainly designed to allow manufacturers to pass complex design documents around. It worked very well at that task but never found its way into the mainstream of computing. Its biggest problem was its size, the specification was about a thousand pages and there was only one parser that implemented the complete standard. The other problem was that it was document centric, rather than data centric.
When Jean joined Microsoft in April 1996, officially to help develop IE4, it was a good chance to put into practice ideas that had floated around the SGML community for several years. Jean helped set up the first W3C committee for XML and by the end of the year 80 per cent of the standard was complete. Jean found that his knowledge and understanding of the power of SGML and mark-up languages in general, combined with the Microsoft engineers’ passion and understanding of simplicity and ease of use, enabled him to define XML. The XML specification was less than five per cent the size of SGML but in many ways more powerful.
Defining XML was Jean’s night job and during the day he helped develop Internet Explorer 4.0. The two came together by XML support being included in IE4 when it was launched at the end of 97. This was the time of the IE-Netscape wars and that discussion rather overshadowed the really important new bit of IE that was the XML support. Included in IE4 was the implementation of CDF (the precursor of RSS) which was the first use of XML. The importance of CDF was that it showed the power of XML to transport data from one environment to another in such a way that the producers and consumers did not need to have any direct knowledge of each others environments.
The amazing thing about this story is the speed at which it happened; less than two years from a standards committee being set up, to product coming out in the market, is unusual. This happened because the requirement was well understood and Bill Gates recognised its importance and gave it his backing.
XML is now imbedded into most of Microsoft’s products and central to all of its strategy. And, as they say... the rest is history.
I asked Jean about WordML. When it was first announced, it seemed very Office-centric to me, and I felt that it should have been a more generalised document mark-up language. Jean explained that the raison d’etre for WordML is for archiving Word documents. There is a real problem with documents that have to be kept for a long time (think of birth certificates) if they are stored in internal Word format. The problem is that in 30 years' time they will probably be unreadable as the software will have moved on, let alone 100 years from now. So there is a need to be able to store these documents in a vendor and software neutral format and that is what WordML is designed to do. The schema definition is open source so that anyone can write a parser at any time to read and format the documents. To do this, WordML has to support all the functionality and the quirkiness of Word, and hence the WordML schema is by definition Word-centric.
On the other hand, what is more generally important is Offices’ support of any XML schema. This is an area that has quietly grown up and the first tech conference on the subject last week attracted more than 500 delegates.