Complex Scripts Overview
A version of this page is also available for
4/8/2010
Complex Scripts are text scripts that require special handling for shaping and layout of language strings. Languages that require Complex Scripts include Arabic, Hebrew, Indic, Thai, Farsi, and Khmer.
A Complex Script has at least one of the following attributes:
- Allows bidirectional rendering.
- Has contextual shaping.
- Has combining characters.
- Has specialized word–breaking and justification rules.
- Filters out illegal character combinations.
Bidirectional rendering refers to the script's ability to handle text that reads both left–to–right and right–to–left. For example, in the bidirectional rendering of Arabic, the default reading direction for text is right–to–left, but for some numbers, it is left–to–right. Processing a complex script must account for the difference between the logical (keystroke) order and the visual order of the glyphs.
In addition, processing must properly deal with caret movement and hit testing. The mapping between screen position and a character index for, say, text selection or caret display requires knowledge of the layout algorithms.
Contextual shaping occurs when a script's characters change shape depending on the characters that surround them. This occurs in English cursive writing when a lowercase "l" changes shape depending on the character that precedes it such as an "a" (connects low to the "l") or an "o" (connects high). Arabic is a script that exhibits contextual shaping.
Combining characters (ligatures) are characters that join into one character when placed together. One example is the "ae" combination in English; it is sometimes represented by a single character. Arabic is a script that has many combining characters.
Specialized word break and justification refers to scripts that have complex rules for dividing words between lines or justifying text on a line. Thai is such a script.
Filtering out invalid character combinations occurs when a language does not allow certain character combinations. Thai is such a script.
Unicode
Applies to Windows Mobile 6.5.3
The Unicode Script Processor (uspce.dll), also known as Uniscribe, is a collection of APIs that enables a text layout client to format complex scripts. Uniscribe supports the complex rules found in scripts such as Arabic, Indian, and Thai. Uniscribe also handles scripts written from right-to-left such as Arabic or Hebrew, and supports the mixing of scripts.
Uniscribe uses multiple shaping engines that contain the layout knowledge for particular scripts. It also takes advantage of the OpenType layout shaping engine for handling font-specific script features such as glyph generation, extent measurement, and word-breaking support.
Uniscribe is supported in the 6.5.3 release for text formatting complex script languages.
Note: Only Arabic is fully supported.
Complex Scripts Languages Supported
Applies to Windows Mobile 6.5.3
Languages: Arabic
Platform: Windows Mobile device that uses Windows Mobile Classic
Unsupported Applications:
- Hijri Calendar in Outlook is not supported. The Outlook Calendar application will show the Gregorian English calendar when Hijri is the system locale calendar.
- Notes applications and RichInk control do not support Complex Scripts. Notes include the note section in Appointment, Contacts, OneNote, and Notes.