Developing features in order to develop features.

Someone asked me not too long ago for a list of Windows-supported languages that don't rely on white space for word breaking. I gave him a quick answer just because I happened to know, but then Yaniv Feinberg and I spent time trying to figure out a way that this guy might have used our APIs to derive his own answer to that question. That was an interesting exercise, but in the end I was struck by the same thing I'm always struck by whenever people ask questions like that: how inefficient it is that application developers who may or may not know anything about word breaking or writing systems or computational linguistics are stuck trying to intuit the answers to questions like this one. This came really close to home not too long ago when I was having a discussion with my husband, who used to develop the help integration for Visual Studio. This is a guy who knows more linguistics than your average developer, finding himself pretty overwhelmed by the fact that he somehow needed to find a way to provide word breaking support in order to display help content in multiple languages. Multiply his story by a whole bunch of linguistic services and then again by a whole bunch of application developers, and you quickly have a complicated ecosystem where developers can't focus on the real meat of the applications that they're trying to develop because they're so bogged down in the peripheral features required to make their product appealing to wide audiences.

We could actually broaden this even further, where it isn't just linguistic functionality that developers find themselves having to create, but a whole range of other stuff too -- every small piece for which their various target audiences turn out to require personalization. If we're asking individual application developers to reinvent the wheel all across the development space, then we as an industry have a pretty broken model. Especially as regards the creation of truly globalized applications.

If you're a developer, I'm interested in hearing about cases like this where you've found yourself having to create peripheral features in order to provide a personalized experience for your customers. If you were successful, I'd like to hear why. If you weren't successful, I'd like to hear why. Because the more I talk to people, the more I'm convinced this happens all the time.

Comments

  • Anonymous
    July 29, 2006
    The comment has been removed

  • Anonymous
    July 29, 2006
    Wow, it's interesting that you interpreted the post that way. I wasn't really speaking about Microsoft's applications so much as I was speaking about the challenges of developing globalized or personalized applications more generally, for any developer on any platform.

    Actually what I meant to to communicate is that any platform -- Microsoft's or anyone's -- has an interest in making it easy for developers to create globalized or otherwise personalized applications. If we'e expecting developers to create support features that are by way of infrastructure for the features that are really core for their applications, then we're expecting too much. That kind of software ecosystem just can't sustain itself.

    I asked the question because I'm interested in hearing from developers 1. whether this is as big a problem as I perceive it to be. and 2. whether there are particular examples that people have that are noteworthy.

  • Anonymous
    August 02, 2006
    The comment has been removed

  • Anonymous
    August 03, 2006
    The comment has been removed

  • Anonymous
    August 04, 2006
    The comment has been removed

  • Anonymous
    August 04, 2006
    Very interesting post, Kieran (I’ve just discovered your blog).

    There are definitely languages which use white space for word breaking, but which also use other characters in some contexts. French is a case in point (as well as Italian or Romansh, for instance). The apostrophe is definitely a word-breaking character in strings such as l’école (the school), d’hier (of yesterday), l’enfant (the child)… A while ago, I wrote about how difficult it can be to decide on the status of such a character (because the apostrophe is not always a word-breaking character in French, as in aujourd’hui (today), which is only one token). See http://blogs.msdn.com/correcteurorthographiqueoffice/archive/2005/12/07/500807.aspx for more details about this discussion… We definitely have to consider all these aspects for the word breakers we develop for all these languages…

    Thierry

  • Anonymous
    August 04, 2006
    These comments are launching some good discussion.

    The way I see it, there are some globalization best practices that all internationally minded developers need to be aware of (using Unicode, etc). The stuff I was talking about was a bit different, although I think I didn't explain it quite right. Word breakers are just one example -- if we have 300 or 100 or even just 10 developers all independently creating word breaker support for Windows applications, something is horribly wrong. That's more than a best practice; that's the development of an entire feature set. We need to make it easier for developers to pick up broad international support that just works so that they can focus on their core features, the stuff they really care about.

    Unrelated: Thierry, I'm working on an interesting word breaker puzzle these days that I'll have to come bug you guys about soon.

  • Anonymous
    October 02, 2007
    The comment has been removed

  • Anonymous
    May 31, 2009
    PingBack from http://woodtvstand.info/story.php?id=15755

  • Anonymous
    June 13, 2009
    PingBack from http://quickdietsite.info/story.php?id=5213

  • Anonymous
    June 17, 2009
    PingBack from http://pooltoysite.info/story.php?id=9100

  • Anonymous
    June 18, 2009
    PingBack from http://firepitidea.info/story.php?id=1343

  • Anonymous
    June 18, 2009
    PingBack from http://fancyporchswing.info/story.php?id=83