Localization & misspellings
So Swedish XP SP2 has been available from the Download Center since Monday. On Monday I proudly announced the URL as soon as I saw that it was live. I felt really good about this release. I've spent a lot of time on it, tried hard to get the Swedish version to look good and read nice.
Five hours later, Svante said (my translation): "First thing I see after installing XP SP2, rebooting and logging on is a spelling mistake!"
Indeed, at the bottom of the Security Center:
See how it says "sekretessspolicy" - three 's' in a row. Ouch. I don't feel as cocky anymore.
How did that happen? I've spell checked all new and changed resources at least twice. I've been running on SP2 since at least March. I've tried to view all UI at runtime, and I know I've looked at this dialog box probably a hundred times. People at Microsoft in Stockholm have run SP2, filed bugs on translations in the exact same dialog without noticing - and I have fixed those bugs without noticing this problem.
I guess one simply goes word blind after a while, looking at the same strings and the same dialogs time and again...
So what do I do now? I can try and get the string fixed in SP3, but I'm not sure I'll succeed. Also, doing so only addresses the symptom. I need to fix the underlying cause - prevent spelling mistakes to get into the product at all, or at least catch them before they get into a build.
I'm not sure exactly how to do this yet, but here are a few things I've thought of over the last few days.
First problem, my uncoordinated fingers. I'm not hopeful about fixing this. I've tried changing, but I still write "anvnädare" instead of "användare" and "urringning" instead of "utringning"...
Second problem, spell checking several hundred thousand words is error prone and tedious.
Right now I spell check like so:
1) First copy all the strings I want to spell check into Word.
2) Then search and replace to remove a lot of gunk - like change "\r\n" to "^l", change "\t" to "^t", get rid of HTML markup etc.
3) Start spell checking.
4) For any error found, fix in my localization tool.
This is tedious as there's no way I can remove all gunk I should. Because of this Word stumbles on a lot of things that are OK, and so it's easy to oversee an actual misspelling.
For my next project I'll try a few things -
Create a script that dumps out all strings into a text file, cleans up by removing as much gunk as possible, and writes out just a list of unique words. I'll then start by spell checking only this list. This should cut down the amount of words I need to spell check initially. Also, if I'm clever, I can make it remember which of the individual words were false positives and which were genuine misspellings. Next time around I can then exclude the false positives from the word list, and I can find the known misspellings without even having to fire up word.
Another approach is to scan this word list for illegal character combinations. For instance, there's no Swedish word with three 's' in a row. If I had done this during sp2, I would have caught the error Svante found. The only problem with this kind of rules is that it'll give false positives, but I could probably make provisions for that. (RElated to this kind of text is scanning for sentences that start with two capital letters, words that that occur twice in a row and other such easy-to-make mistakes.)
A variation on the word list script would be to create a sentence list. This would allow me to benefit from the grammar check in Word as well, and coupled with a known good/known bad list could help us improve consistency on a sentence level.
Third problem is that I looked at the same dialog a hundred times without seeing the misspelling. Again, I'm not sure I can fix my eyes. I guess we need to look more into getting more people involved in running the builds before release. There are beta program for some languages, but they typically don't give much linguistical feedback. I suppose that could be sorted though, if we managed to give builds to the right people.
Then again, it could be that I'm overstating the problem just because this one missspelling happens to be in such a visible place. I know that we've improved dramatically since NT4 and Win9x. But the only way to know how bad the situation is, is to try and find out what else I've overlooked...
I've got to think some more about this topic. I'll be back...
Comments
- Anonymous
September 08, 2004
Found another weird spelling, take a look at this:
http://www.psikorp.com/sp2/bugg_omfang.gif
"Du kan skapa en egen lista genom att ange en lista med kommateckenavgränsad lista med IP-adresser..."
Shouldn't that be something like:
"Du kan skapa en egen lista genom att ange en kommateckenavgränsad lista med IP-adresser..." - Anonymous
September 08, 2004
<i>First problem, my uncoordinated fingers. I'm not hopeful about fixing this. I've tried changing, but I still write "anvnädare" instead of "användare" and "urringning" instead of "utringning"...</i>
That latter thing is called a Freudian slip, you know... - Anonymous
September 08, 2004
The comment has been removed - Anonymous
September 09, 2004
Per-Olov, nice catch! This one's even trickier to find without proof reading. Hm. I've got a lot of work to do if I'm gonna figure out how to prevent this kinda thing from happening again...
If you see anything else bad, please let me know - I really do appreciate it. - Anonymous
September 09, 2004
Jenny, to be honest that one isn't really mine... "Urringning" is what Office keeps on suggesting to me (and I keep on thinking it's funny). - Anonymous
September 09, 2004
The comment has been removed - Anonymous
September 10, 2004
For proof reading it helps if you do not speak the language (or at least speak it poorly), because it forces much grerater attention to detail - your brain does not automatically correct things for you. - Anonymous
September 10, 2004
John, that's a good point. We've been doing some stuff like that, but not really organized. Maybe we should though, maybe I should install Norwegian MUI on my main machine... - Anonymous
September 22, 2004
So is there a team in charge of translation and localisation, or just a single person? Presumably it would be useful to have at least two people look at these things. - Anonymous
September 22, 2004
jill/txt » proofreading software? - Anonymous
September 22, 2004
Jill, right now I'm the only Swedish Windows localizer. That's about to change though; I'm happy to say that my new colleague is arriving in Seattle next week!
During and before Windows XP, we were two-three localizers who checked each others' work and we also had all user interface proof read by a linguist. The UI review wasn't necessarily cost efficient though (it often descended into debates over where the comma should be), so these days we're mostly working with the language department in researching and deciding on terminology.
During sp2, I have been working closely with people at Microsoft in Stockholm to get internal "beta testing" focusing on language. This was extremely useful - I got loads of great feedback.
On top of this, obviously I spell check and run the build as much as I can.
The thing is though, we didn't catch everything... and that bothers me. Manual proof reading isn't fool proof. Going forward, we'll be focusing on two things to minimize these kinds of unnecessary bugs again - 1) invest more in automated checks (I started playing around with this early this week, and the results look promising so far) and 2) getting more people involved in looking at builds before release. Exactly how this will take shape is still to be decided, but I'm hoping that we can partner with individuals in Sweden (non-Microsofties).
I'll report more progress as we...um...progress...:)