Condividi tramite


Abriendo Puertas con XML

When Win Office 2003 shipped, there was a great deal of debate as to the “openness” of Microsoft’s use of XML. The debate resurfaced with the announcement of the new Office 12 XML-based file formats, and it’s been further brought to the fore in recent days with Massachusetts’ recent decision regarding the adoption of “open” file formats.

On one side of the debate, you have those who would argue that, as long as it’s XML, it’s open. The other side of the debate would argue that mere use of XML isn’t sufficient, that the schemas need to be established or endorsed by an independent standards body. There are subtle shades on both sides of the debate, including schema publishing and licensing, but even with a published schema with a royalty-free license, there are those who would argue that a format isn’t open unless the schema itself has been approved by a standards body.

One of the most articulate proponents of the “standards body” side of the debate has been Joe Wilcox over at www.microsoftmonitor.com. Joe's reasoning can be found here and here.

In reading Joe's remarks, however, it's difficult to find a coherent position. At one point, he bases his notion of "open" on the acceptance of a standard by an independent standards body. At another point, he defines "open" based on the extent to which independent software vendors have supported the format with a certain degree of fidelity. Thus, OASIS's OpenDocument XML format is "open," but so is Adobe's PDF.

As a side note, back on September 1, Joe was scratching his head about Massachusetts' inclusion of PDF in their definition of "open." Apparently Joe forgot that he'd done exactly the same thing back in June. To be fair, Joe's reasoning was subtly different from that invoked by the Commonwealth of Massachusetts, but neither line of reasoning is all that coherent in the exclusion of Microsoft's use of XML in Office from the "open" rubric.

Joe's second post from last June comes closest to articulating a coherent stance on the subject. In that post, he likens OASIS's OpenOffice format to a simpler, more widely understood, idiom, albeit within the same XML language, to the idiom adopted in by Microsoft Office. Joe write:

But, even though the two people agreed on a common language, suppose one starts using geeky engineering jargon the other can't understand. Tough to communicate, right? So the one gives the other a big, fat book of definitions for the jargon--kind of like Microsoft publishing its schemas--so that they can talk. But the other person would have to learn the jargon first. Sure all the jargon is in the book, but wouldn't it just be better to communicate (e.g. be more "open") by speaking the basic language previously agreed on?

I think Joe's reasoning would be sound if Microsoft's addition of "geeky engineering jargon" was merely gratuitous. His reasoning breaks down, however, when we note that the "geeky engineering jargon" in Office's use of XML is necessary to adequately describe the features that are available in Office.

We can see this by altering Joe's analogy. Suppose we aren't talking about "geeky engineering jargon." Suppose, rather, we're talking about the jargon used within an academic field. The jargon in any academic field arises when various academics coin new terms to express various ideas within the field. Economists, for example, talk about IS-LM curves. Lay people haven't a clue what that's about, but, among Economists, a great deal of information can be conveyed very succinctly by using the jargon of IS-LM curves.

One important point of the academic jargon analogy is that anybody can extend the vocabulary. No one sits around waiting for some standards body to approve each new term before they're allowed to coin it in some academic paper. Academic jargon is open not only because anyone who is willing to engage in a study of the field is able to understand the lexicon. It's also open because anyone who works in that academic field is able to extend the lexicon.

Moreover, extension of the lexicon is based entirely on voluntary adoption of that lexicon within the field. An academic can coin a new term, but that term won't get adopted into regular usage unless other academics find enough value in the ideas that's expressed by the new terminology.

And, yes, there is a point where the analogy breaks down. Academic jargon doesn't get the same copyright protection that XML schemas get, and no academic field is bifurcated into those who produce new studies and those who only read the new studies the way the software field is bifurcated into vendors and users.

But, I think the difference between Joe's analogy and mine is still instructive in terms of the underlying values that each analogy expresses. Joe's analogy values user choice of equally adept vendors. My analogy values the ability of vendors to extend software to resolve new user problems. I would contend that both values are worth preserving for the benefit of people who use software.

Users do benefit from software commoditization in their ability to choose different vendors and in the ability of offerings from different vendors to interoperate. But this benefit comes at a sacrifice of product differentiation. Users benefit from product differentiation as vendors strive to solve user problems in new, and more effective, ways.

The ideal solution would be able to accommodate both aspects of "openness." In the world of software, it might not be possible to come up with a solution that balances both values, but I have difficulty imagining one that does a better job of balancing both than the approach we've adopted with Office's use of XML. The schemas are published with a royalty free license. Anybody is free to use those schemas.

Moreover, the way XML support is implemented in Office, people can extend those schemas. Word 2003 supports custom schemas, and the number of solutions providers who are incorporating Office 2003 into solutions that make use of a number of XML standards relevant to particular vertical industries is growing at an impressive rate.

Lastly, XML, with the inclusion of XSLTs into the standard, provides a ready tool for translating one idiom into another. Through the use of XSLTs, for example, it's possible to have Office support OASIS' file format out of the box, albeit with a certain loss of information on the save side.

"Abriendo puertas," is Spanish for "I'm opening doors." In an ideal world, we would be "opening doors" for both vendors and for customers to both use common formats and be able to extend them. That is at least what we're trying to do with the new XML formats. The future will tell us how well we've succeeded.

I just hope that the future gets decided by the people who actually have to use the software than either by government fiat or by pundits who have difficulty arriving at a coherent definition of the word "open".

 

Rick

Currently playing in iTunes: Hablemos El Mismo Idioma by Gloria Estefan

Comments

  • Anonymous
    September 29, 2005
    I think the big issue I'm hearing is concern about using the file formats in open source applications. That is, things like Open Office and other applications being able to make use of these file formats. An additional note on this is making sure these are the default file formats for application use.

    Now, with that said, the bigger issue for me is getting XML support in things like Access or Outlook. I've got a lot of applications that will be able to write to an excel spreadsheet easily now (although it wasn't too hard with .csv files beforehand), but it's near impossible to do automatic calendaring entries or database queries on exchange/access systems right now. I'm using java for my development environment, and it'd be WONDERFUL to have a java library that would let me connect to Exchange/Access (using a NATIVE java library, not a jdbc-odbc bridge type system, as these applications could run on any java application server, not necessarily a windows app server).

    SO, a definite step in the right direction - it'd just be nice to have a few other file formats in XML.

  • Anonymous
    September 29, 2005
    Rick, I read this with interest. My definition of ‘open’ when it comes to document formats involves the following two conditions: (1) It is publicly documented, and (2) it does not restrict interoperability. The latter means not that it is simply royalty free, but that it is unconditionally royalty free, i.e., it does not require any kind of licence whatsoever. In this regard I consider PDF, RTF, and more or less also the Word 8 formats ‘open’ (although the latter is poorly documented), but I consider neither the office XML nor the OpenDocument ‘open’, for both formats are encumbered by patents with restrictive licensing terms.

    Complexity of the design and its specificity to a particular application should not contribute to the determination of ‘openness’. One of the key factors in file format design is that it allows my application to work efficiently with it, and if my application works differently from yours, then my preferred file design is likely to be different. I think this whole idea of a ‘Swiss army file format’ to end all file formats, which is behind the OpenDocument, is deeply misguided. I care about interoperability, but interoperability is not uniformity.

    The central issue is the ownership of the data; the data that is stored in files such as word processing documents does not belong to the application designers, and the owner of that data must, therefore, be free to access it in anyway she wishes, not just on terms of the application designer. Consequently, I think distinction needs to be made between reading data from a file and writing it to the file. I can see how an author of a particular clever file format might want to prevent others from benefiting from her work in storing their users data in her clever way, but I see no way in which you could make a case that as a designer of that format you can restrict others in anyway from parsing it to extract the data in it. In any case European Law goes as far as to allow reverse engineering for interoperability purposes, and I think that will be enough for European geeks to design open source importers for the Office xml file formats.

  • Anonymous
    October 04, 2005
    Tomas: OpenDocument is not the subject of any restrictive license that I'm aware of, and to make that doubly clear I've explained Sun's new 'Covenant' in my blog http://blogs.sun.com/roller/page/webmink?entry=raising_the_bar_on_patents

  • Anonymous
    October 08, 2005
    The comment has been removed

  • Anonymous
    May 31, 2009
    PingBack from http://outdoorceilingfansite.info/story.php?id=22312

  • Anonymous
    June 07, 2009
    PingBack from http://greenteafatburner.info/story.php?id=1242

  • Anonymous
    June 13, 2009
    PingBack from http://wheelbarrowstyle.info/story.php?id=1271

  • Anonymous
    June 19, 2009
    PingBack from http://mydebtconsolidator.info/story.php?id=8486