Share via


Implementing document-format specifications

A few folks have pointed out that implementing every detail of the Office Open XML specification would be very difficult. And that's certainly true -- implementing 100% of a document-format specification is a daunting task.

A good example of the complexity of this task can be found in the Intel-sponsored ODF test suite developed by the University of Central Florida. In the Summary section, you'll find links to over 300 specific issues regarding partial or missing implementation of ODF in OpenOffice and KOffice, with screen shots and descriptions of the issues.

In most situations, of course, a developer isn't trying to implement 100% of a spec. For example, Mindjet's integration of MindManager and Word 2007 through the use of Office Open XML only uses a tiny portion of the Office Open XML spec and went from concept to completion in just a few weeks.

Last night I saw another great example: a simple Open XML spreadsheet editor, developed by a college student here in Delhi. It allows the user to open an Open XML shreadsheet, edit values in a grid control or add new rows, and save the result as a valid Open XML spreadsheet. And although it's written in C#, it doesn't use the .NET 3.0 System.IO.Packaging API, instead opening the document as a simple ZIP archive. (I'll write up that application in more detail later when I have a little time, and we'll be covering it on the OpenXmlDeveloper site as well.)

The thoroughness of the Office Open XML specification gives developers all of the information they need to get the job done, and that's a good thing. And there is functionality in the Open XML spec that no other document format provides, such as compatibility with billions of existing Office documents and a variety of ways to support custom-schema interoperability in documents. All of that functionality adds complexity, but most of the details are optional, so implementers don't need to read or understand them. As the creator of the spreadsheet editor mentioned above told me, "I haven't read 6000 pages in my entire life!" Kids these days. :-)

For those who criticize the size of the spec, an interesting rhetorical question -- which I've not seen adressed anywhere -- is "precisely which sections of the spec would you recommend be ommitted?" That would probably lead to an interesting discussion of document-format priorities in general -- to state the obvious, a spec can't offer functionality that isn't specified.

4/28/2008: updated link to ODF test suite.

Comments

  • Anonymous
    February 01, 2007
    Actually, could Microsoft start an effort to create an open or shared source implementation of Office Open XML? There John Tunnicliffe's great effort over on CodePlex on SpreadsheetML, but it would be interesting if there was a funded effort going on with clear deliverable goals -- similar to the P&P teams.

  • Anonymous
    February 02, 2007
    Doug, Perhaps you can realize (or it's already the case since you are a smart person), that the situation of some non-Microsoft people out there able to implement scenarios would be exactly the same should the "specs" not be available at all. In fact, the only difference between the new and old file formats is that they are ZIP based (except if they are password-protected). That alone allows quick read/write either by hand, or with code. At this point, I should add that the "innovation" came from the ODF guys first. Who themselves borrowed it from elsewhere. Microsoft simply levelled up the playing field by adopting a good thing : ZIP containers. The availability of the specs has nothing to do with those scenarios you are so proud to list. Make no mistake. Thanks.

  • Anonymous
    February 04, 2007
    Hi Stephane, I understand your point about developers being able to to implement the formats without the spec.  And, frankly, many of the developers who have already done work around the formats have done it by reverse-engineering the details or copying and modifying code samples on OpenXmlDeveloper and other sources.  The spec covers some of the details that would be hard to figure out on your own, but most of those details aren't relevant or necessary for most applications. The XML-in-ZIP packaging is rapidly becoming a common approach for a variety of formats, and there are ISVs who have been using that approach for years in addition to ODF (which implemented it first as you mention) and Open XML.  It's a flexible approach, since every mainstream dev environment can handle ZIP packages and read/write XML.  The more the merrier!

  • Anonymous
    February 04, 2007
    The comment has been removed

  • Anonymous
    February 05, 2007
    Stephane, I don't feel like I ignored your point.  I pointed out that developers are doing what they're doing regardless of the spec.  I'm really not sure how I could have echoed your point any more clearly, frankly.  I not only didn't ignore you, I very explicitly agreed with you. And I'm not sure what to make of the litany of complaints you've mentioned.  These comments have nothing to do with the subject of my post, which was the general difficulty of implementing 100% a document-format specification, regardless of whether that specification is Open XML, ODF, or anything else. If you want to start a new thread on a new topic, you need to do that somewhere else.  I just don't have the time this week.  I'm packing my bags in my hotel room right now, heading for the airport in an hour, and will be on the 24-hour routine to get from Delhi back to Seattle after that.  Then I'll catch a cab straight to the office from the airport, for a long day of work related to the conference we have going on this week, then home for a few hours sleep before heading over to the Washington State Convention Center to give presentations on Wednesday.  I just don't have time for the emotional debate you apparently want to have right now, sorry.

  • Doug
  • Anonymous
    February 05, 2007
    The comment has been removed
  • Anonymous
    February 06, 2007
    Sorry for the tone there, Stephane, perhaps the long hours the last few days are wearing me down.  It may have been me that was getting emotional.  :-) Your point is well taken, and you're right it is a subject of debate. Peace.