6 Appendix A: Product Behavior
The information in this specification is applicable to the following Microsoft products or supplemental software. References to product versions include released service packs.
Windows NT operating system
Windows 2000 operating system
Windows XP operating system
Windows Server 2003 operating system
Windows Vista operating system
Windows Server 2008 operating system
Windows 7 operating system
Windows Server 2008 R2 operating system
Windows 8 operating system
Windows Server 2012 operating system
Windows 8.1 operating system
Windows Server 2012 R2 operating system
Windows 10 operating system
Windows Server 2016 operating system
Windows Server operating system
Windows Server 2019 operating system
Windows Server 2022 operating system
Windows 11 operating system
Windows Server 2025 operating system
Exceptions, if any, are noted in this section. If an update version, service pack or Knowledge Base (KB) number appears with a product name, the behavior changed in that update. The new behavior also applies to subsequent updates unless otherwise specified. If a product edition appears with the product version, behavior is different in that product edition.
Unless otherwise specified, any statement of optional behavior in this specification that is prescribed using the terms "SHOULD" or "SHOULD NOT" implies product behavior in accordance with the SHOULD or SHOULD NOT prescription. Unless otherwise specified, the term "MAY" implies that the product does not follow the prescription.
<1> Section 3.1.5.2.3: Only Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 use record count for DEFAULT.
<2> Section 3.1.5.2.3: An LCID is used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
<3> Section 3.1.5.2.3.1: The files in the download map to specific Windows versions as follows:
Version |
File Name |
---|---|
Windows NT 4.0 operating system, Windows 2000, Windows XP, and Windows Server 2003 |
Windows NT 4.0 through Windows Server 2003 Sorting Weight Table.txt |
Windows Vista |
Windows Vista Sorting Weight Table.txt |
Windows Server 2008 |
Windows Server 2008 Sorting Weight Table.txt |
Windows 7 and Windows Server 2008 R2 |
Windows 7 and Windows Server 2008 R2 Sorting Weight Table.txt |
Windows 8, Windows 8.1, Windows Server 2012, and Windows Server 2012 R2 |
Windows 8 and Windows Server 2012 Sorting Weight
Table.txt |
Windows 10, Windows Server 2016, Windows Server operating system, and Windows Server 2019 |
Windows 10 Sorting Weight Table.txt |
Windows Server 2022 |
Windows Server 2022 Sorting Weight Table.txt |
Windows 11 |
Windows 11 Sorting Weight Table.txt |
<4> Section 3.1.5.2.4: The following algorithm for generation of sort keys for a specific UTF-16 string is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, Windows Server 2008 R2, Windows 8, Windows 8.1, Windows Server 2012, Windows Server 2012 R2, Windows 10, Windows Server 2016, Windows Server operating system, Windows Server 2019, and Windows Server 2022.
-
STRUCTURE CharacterWeightType ( ScriptMember: 8 bit integer PrimaryWeight: 8 bit integer DiacriticWeight: 8 bit integer CaseWeight: 8 bit integer ) STRUCTURE UnicodeWeightType ( ScriptMember: 8 bit integer PrimaryWeight: 8 bit integer ThirdByteWeight: 8 bit integer ) STRUCTURE SpecialWeightType ( Position: 16 bit integer ScriptMember: 8 bit integer PrimaryWeight: 8 bit integer ) STRUCTURE ExtraWeightType ( W6: 8 bit integer W7: 8 bit integer ) SET constant LCID_KOREAN to 0x0412 SET constant LCID_KOREAN_UNICODE_SORT to 0x010412 SET constant LCID_HUNGARIAN to 0x040e SET constant SORTKEY_SEPARATOR to 0x01 SET constant SORTKEY_TERMINATOR to 0x00 SET global KoreanScriptMap to InitKoreanScriptMap // // Script Member Values. // SET constant UNSORTABLE to 0 SET constant NONSPACE_MARK to 1 SET constant EXPANSION to 2 SET constant EASTASIA_SPECIAL to 3 SET constant JAMO_SPECIAL to 4 SET constant EXTENSION_A to 5 SET constant PUNCTUATION to 6 SET constant SYMBOL_1 to 7 SET constant SYMBOL_2 to 8 SET constant SYMBOL_3 to 9 SET constant SYMBOL_4 to 10 SET constant SYMBOL_5 to 11 SET constant SYMBOL_6 to 12 SET constant DIGIT to 13 SET constant LATIN to 14 SET constant KANA to 34 SET constant IDEOGRAPH to 128 IF Windows version is Windows Vista, Windows Server 2008, Windows 7, or Windows Server 2008 R2 THEN SET constant MAX_SPECIAL_CASE to SYMBOL_6 ELSE SET constant MAX_SPECIAL_CASE to SYMBOL_5 ENDIF COMMENT Set the constant for fhe first script member of the Unicode COMMENT Private Use Area (PUA) range SET constant PUA3BYTESTART to 0xA9 COMMENT Set the constant for the last script member of the Unicode COMMENT Private Use Area (PUA) range SET constant PUA3BYTEEND to 0xAF COMMENT Set the constant for the first script member of CJK COMMENT(Chinese/Japanese/Korean) 3 byte weight range SET constant CJK3BYTESTART to 0xC0 COMMMENT Set the constant for the last script member of CJK COMMENT (Chinese/Japanese/Korean) 3 byte weight range SET constant CJK3BYTEEND to 0xEF ENDIF SET constant FIRST_SCRIPT to LATIN SET constant MAX_SCRIPTS to 256 // // Values for CJK Unified Ideographs Extension A range. // 0x3400 thru 0x4dbf // SET constant SCRIPT_MEMBER_EXT_A to 254 // SM for Extension A SET constant PRIMARY_WEIGHT_EXT_A to 255 // AW for Extension A // // Lowest weight values. // Used to remove trailing DW and CW values. // Also used to keep illegal values out of sort keys. // SET constant MIN_DW to 2 SET constant MIN_DW to 2 // // Bit mask values. // // Case Weight (CW) - 8 bits: // bit 0 => width // bit 1,2 => small kana, sei-on // bit 3,4 => upper/lower case // bit 5 => kana // bit 6,7 => contraction // SET constant CONTRACTION_8_MASK to 0xc0 SET constant CONTRACTION_7_MASK to 0xc0 SET constant CONTRACTION_6_MASK to 0xc0 SET constant CONTRACTION_5_MASK to 0x80 SET constant CONTRACTION_4_MASK to 0x80 SET constant CONTRACTION_3_MASK to 0x40 SET constant CONTRACTION_2_MASK to 0x40 SET constant CONTRACTION_MASK to 0xc0 ELSE COMMENT Otherwise, only 2-character or 3-character contractions // are supported. SET constant CONTRACTION_3_MASK to 0xc0 // Bit-mask to check 2 character contraction or 3 character contraction SET constant CONTRACTION_2_MASK to 0x80 // Bit-mask to check 2 character contraction ENDIF SET constant CASE_UPPER_MASK to 0xe7 // zero out case bits SET constant CASE_KANA_MASK to 0xdf // zero out kana bit SET constant CASE_WIDTH_MASK to 0xfe // zero out width bit // // Masks to isolate the various bits in the case weight. // // NOTE: Bit 2 needs to always equal 1 to avoid getting // a byte value of either 0 or 1. // SET constant CASE_EXTRA_WEIGHT_MASK to 0xc4 SET constant ISOLATE_KANA to (~CASE_KANA_MASK) | CASE_EXTRA_WEIGHT_MASK SET constant ISOLATE_WIDTH to (~CASE_WIDTH_MASK) | CASE_EXTRA_WEIGHT_MASK // // Values for East Asia special case primary weights. // SET constant PW_REPEAT to 0 SET constant PW_CHO_ON to 1 SET constant MAX_SPECIAL_PW to PW_CHO_ON // // Values for weight 5 - East Asia Extra Weights. // SET constant WT_FIVE_KANA to 3 SET constant WT_FIVE_REPEAT to 4 SET constant WT_FIVE_CHO_ON to 5 // // PW Mask for Cho-On: // Leaves bit 7 on in PW, so it becomes Repeat // if it follows Kana N. // SET constant CHO_ON_PW_MASK to 0x87 // // Special weight values // SET constant MAP_INVALID_WEIGHT to 0xff // // Some Significant Values for Korean Jamo. // The L, V & T syllables in the 0x1100 Unicode range // can be composed to characters in the 0xac00 range. // See The Unicode Standard for details. // SET constant NLS_CHAR_FIRST_JAMO to 0x1100 // Begin Jamo range SET constant NLS_CHAR_LAST_JAMO to 0x11f9 // End Jamo range SET constant NLS_CHAR_FIRST_VOWEL_JAMO to 0x1160 // First Vowel Jamo SET constant NLS_CHAR_FIRST_TRAILING_JAMO to 0x11a8 // First Trailing Jamo SET constant NLS_JAMO_VOWEL_COUNT to 21 // Number of vowel Jamo (V) SET constant NLS_JAMO_TRAILING_COUNT to 28 // Number of trailing Jamo (L) SET constant NLS_HANGUL_FIRST_COMPOSED to 0xac00 // Begin composed range // // Values for Unicode Weight extra weights (e.g. Jamo (old Hangul)). // The following uses SM for extra UW weights. // SET constant ScriptMember_Extra_UnicodeWeight to 255 // Leading Weight / Vowel Weight / Trailing Weight // according to the current Jamo class. // STRUCTURE JamoSortInfoType ( // true for an old Hangul sequence OldHangulFlag : Boolean // true if U+1160 (Hangul Jungseong Filler) used FillerUsed : Boolean // index to the prior modern Hangul syllable (L) LeadingIndex : 8 bit integer // index to the prior modern Hangul syllable (V) VowelIndex : 8 bit integer // index to the prior modern Hangul syllable (T) TrailingIndex : 8 bit integer // Weight to offset from other old hangul (L) LeadingWeight : 8 bit integer // Weight to offset from other old hangul (V) VowelWeight : 8 bit integer // Weight to offset from other old hangul (T) TrailingWeight : 8 bit integer ) // This is the raw data record type from the data table STRUCTURE JamoStateDataType ( // true for an old Hangul sequence OldHangulFlag : Boolean // index to the prior modern Hangul syllable (L) LeadingIndex : 8 bit integer // index to the prior modern Hangul syllable (V) VowelIndex : 8 bit integer // index to the prior modern Hangul syllable (T) TrailingIndex : 8 bit integer // weight to distinguish from old Hangul ExtraWeight : 8 bit integer // number of additional records in this state TransitionCount : 8 bit integer // Current record in unisort.txt Jamo table: JamoRecord : data record // SORTTABLES\JAMOSORT\[Character] section ) COMMENT GetWindowsSortKey COMMENT COMMENT On Entry: SourceString - Unicode String to compute a COMMENT sort key for COMMENT SortLocale - Locale to determine correct COMMENT linguistic sort COMMENT Flags - Bit Flag to control behavior COMMENT of sort key generation. COMMENT COMMENT NORM_IGNORENONSPACE Ignore diacritic weight COMMENT NORM_IGNORECASE: Ignore case weight COMMENT NORM_IGNOREKANATYPE: Ignore Japanese Katakana/Hiraga COMMENT difference COMMENT NORM_IGNOREWIDTH: Ignore Chinese/Japanese/Korean COMMENT half-width and full-width difference. COMMENT COMMENT On Exit: SortKey - Byte array containing the COMMENT computed sort key. COMMENT PROCEDURE GetWindowsSortKey(IN SourceString : Unicode String, IN SortLocale : LCID, IN Flags : 32 bit integer, OUT SortKey : BYTE String) COMMENT Compute flags for sort conditions COMMENT Based on the case/kana/width flags, COMMENT turn off bits in case mask when comparing case weight. SET CaseMask to 0xff If (NORM_IGNORECASE bit is on in Flags) THEN SET CaseMask to CaseMask LOGICAL AND with CASE_UPPER_MASK ENDIF If (NORM_IGNOREKANATYPE bit is on in Flags) THEN SET CaseMask to CaseMask LOGICAL AND with CASE_KANA_MASK ENDIF If (NORM_IGNOREWIDTH bit is on in Flags) THEN SET CaseMask to CaseMask LOGICAL AND with CASE_WIDTH_MASK ENDIF COMMENT Windows 7 and Windows Server 2008 R2 use 3-byte COMMENT (instead of 2-byte) sequence for Unicode Weights COMMENT for Private Use Area (PUA) and some Chinese/Japanese/Korean ( COMMENT CJK) script members. COMMENT Does this sort have a 3-byte Unicode Weight (CJK sorts)? IF Windows version is Windows 7 and Windows Server 2008 R2 THEN COMMENT Check if the locale can have 3-byte Unicode weight SET Is3ByteWeightLocale to CALL Check3ByteWeightLocale(SortLocale) ENDIF IF Windows version is Windows Vista, Windows Server 2008, Windows 7, or Windows Server 2008 R2 THEN COMMENT For Windows Vista, Windows Server 2008, Windows 7, and COMMENT Windows Server 2008 R2, the algorithm COMMENT does not remap the script for Korean locale SET IsKoreanLocale to false ELSE IF SortLocale is LCID_KOREAN or SortLocale is LCID_KOREAN_UNICODE_SORT THEN SET IsKoreanLocale to true IF KoreanScriptMap is null THEN CALL InitKoreanScriptMap ELSE SET IsKoreanLocale to false ENDIF ENDIF // // Allocate buffer to hold different levels of sort key weights. // UnicodeWeights/ExtraWeights/SpecialWeights will be eventually // to be collected together, in that order, into the returned // Sortkey byte string. // // Maximum expansion size is 3 times the input size // // Unicode Weight => 4 word (16 bit) length // (extension A and Jamo need extra words) SET UnicodeWeights to new empty string of UnicodeWeightType SET DiacriticWeights to new empty string of BYTE SET CaseWeights to new empty string of BYTE // Extra Weight=>4 byte length (4 weights, 1 byte each) FE Special SET ExtraWeights to new empty string of ExtraWeightType // Special Weight => dword length (2 words each of 16 bits) SET SpecialWeights to new empty string of SpecialWeightType // // Go through the string, code point by code point, // testing for contractions and Hungarian special character sequence // // loop presumes 0 based index for source string FOR SourceIndex is 0 to Length(SourceString) -1 // // Get weights // CharacterWeight will contain all of the weight information // for the character tested. // SET CharacterWeight to CALL GetCharacterWeights WITH (SortLocale, SourceString[SourceIndex]) SET ScriptMember to CharacterWeight.ScriptMember // Special case weights have script members less than // MAX_SPECIAL_CASE (11) IF ScriptMember is greater than MAX_SPECIAL_CASE THEN // // No special case on character, but has to check for // contraction characters and Hungarian special // character sequence characters. // SET HasHungarianSpecialCharacterSequence to CALL TestHungarianCharacterSequences WITH (SortLocale, SourceString, SourceIndex) SET Result to CALL GetContractionType WITH (CharacterWeight) CASE Result OF "3-character Contraction": COMMENT This is only possible for Windows versions that COMMENT are Windows NT 4.0 through Windows Server 2003 Set ContractionFound to CALL SortkeyContractionHandler WITH (SortLocale, SourceString, SourceIndex, HasHungarianSpecialCharacterSequence, 3, UnicodeWeights, DiacriticWieghts, CaseWeights) IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ENDIF IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ENDIF COMMENT If no contraction is found, fall through into additional cases. FALLTHROUGH "2-character Contraction": COMMENT This is only possible for Windows versions that are COMMENT Windows NT 4.0 through Windows Server 2003 Set ContractionFound to CALL SortkeyContractionHandler WITH (SortLocale, SourceString, SourceIndex, HasHungarianSpecialCharacterSequence, 2, UnicodeWeights, DiacriticWieghts, CaseWeights) IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ENDIF COMMENT If no contraction is found, fall through into the OTHER case. COMMENT Since "3-character contraction" or "2-character contraction" COMMENT are the only two possible values for COMMENT Windows NT 4.0 through Windows Server 2003, all calls to COMMENT SortkeyContractionHandler will return false. COMMENT So, the fallthrough will go directly to the OTHERS section FALLTHROUGH "6-character contraction, 7-character contraction, or 8-character contraction": Set ContractionFound to CALL SortkeyContractionHandler WITH (SortLocale, SourceString, SourceIndex, HasHungarianSpecialCharacterSequence, 8, UnicodeWeights, DiacriticWieghts, CaseWeights) IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ELSE Set ContractionFound to CALL SortkeyContractionHandler WITH (SortLocale, SourceString, SourceIndex, HasHungarianSpecialCharacterSequence, 7, UnicodeWeights, DiacriticWieghts, CaseWeights) ENDIF IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ELSE Set ContractionFound to CALL SortkeyContractionHandler WITH (SortLocale, SourceString, SourceIndex, HasHungarianSpecialCharacterSequence, 6, UnicodeWeights, DiacriticWieghts, CaseWeights) ENDIF IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ENDIF COMMENT If no contraction is found, fall through into additional cases. FALLTHROUGH "4-character contraction or 5-character contraction": Set ContractionFound to CALL SortkeyContractionHandler WITH (SortLocale, SourceString, SourceIndex, HasHungarianSpecialCharacterSequence, 5, UnicodeWeights, DiacriticWieghts, CaseWeights) IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ELSE Set ContractionFound to CALL SortkeyContractionHandler WITH (SortLocale, SourceString, SourceIndex, HasHungarianSpecialCharacterSequence, 4, UnicodeWeights, DiacriticWieghts, CaseWeights) ENDIF IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ENDIF COMMENT If no contraction is found, fall through into additional cases. FALLTHROUGH "2-character contraction or 3-character contraction": Set ContractionFound to CALL SortkeyContractionHandler WITH (SortLocale, SourceString, SourceIndex, HasHungarianSpecialCharacterSequence, 3, UnicodeWeights, DiacriticWieghts, CaseWeights) IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ELSE Set ContractionFound to CALL SortkeyContractionHandler WITH (SortLocale, SourceString, SourceIndex, HasHungarianSpecialCharacterSequence, 2, UnicodeWeights, DiacriticWieghts, CaseWeights) ENDIF IF ContractionFound is true THEN COMMENT Break out of the case statement BREAK ENDIF COMMENT If no contraction is found, fall through into additional cases. FALLTHROUGH OTHERS : IF Windows version is greater than Windows Server 2008 R2 or Windows 7 THEN COMMENT In Windows Server 2008 R2 or Windows 7, COMMENT Private Use Area (PUA) code points COMMENT and some CJK (Chinese/Japanese/Korean) sorts COMMENT might need 3 byte weights COMMENT Store normal Unicode weight first. Note that there is no COMMENT adjustment of Korean weight anymore. SET UnicodeWeight to CorrectUnicodeWeight(CharacterWeight, FALSE) COMMENT Assume 3-byte Unicode Weight is not used first. COMMENT The alogorithm will check this later. SET UnicodeWeight.ThirdByteWeight to 0 IF (ScriptMember is equal to or greater than PUA3BYTESTART) AND (ScriptMember is less than or equal to PUA3BYTEEND) THEN SET IsScriptMemberPUA3BYTEWeight to true ELSE SET IsScriptMemberPUA3ByteWeight to false ENDIF IF (ScriptMember is equal to or greater than CJK3BYTESTART) AND (ScriptMember is less than or equal to CJK3BYTEEND) THEN SET IsScriptMemberCJK3ByteWeight to true ELSE SET IsScriptMemberCJK3ByteWeight to false ENDIF IF (IsScriptMemberPUA3ByteWeight is true) OR (Is3ByteWeightLocale AND IsScriptMemberCJK3ByteWeight is true) THEN COMMENT PUA code points and some CJK sorts need 3 byte weights SET UnicodeWeight.ThirdByteWeight to CharacterWeight.DiacriticWeight ELSE COMMENT Normal Diacritic Weight APPEND CharacterWeight.DiacriticWeight to DiacriticWeights as a BYTE ENDIF APPEND UnicodeWeight to UnicodeWeights SET CaseWeight to GetCaseWeight(CharacterWeight) APPEND CharacterWeight.CaseWeight to CaseWeights as a BYTE ELSE SET UnicodeWeight to CorrectUnicodeWeight(CharacterWeight, IsKoreanLocale) APPEND UnicodeWeight to UnicodeWeights APPEND CharacterWeight.DiacriticWeight to DiacriticWeights as a BYTE SET CaseWeight to GetCaseWeight(CharacterWeight) APPEND CharacterWeight.CaseWeight to CaseWeights as a BYTE ENDIF ENDCASE ELSE CALL SpecialCaseHandler WITH (SourceString, SourceIndex, UnicodeWeights, ExtraWeights, SpecialWeights, SortLocale, IsKoreanLocale) ENDIF ENDFOR // // Store the Unicode Weights in the destination buffer. // FOR each UnicodeWeight in UnicodeWeights // // Copy Unicode weight to destination buffer. // APPEND UnicodeWeight.ScriptMember to SortKey as a BYTE APPEND UnicodeWeight.PrimaryWeight to SortKey as a BYTE IF Windows version is greater than Windows Server 2008 R2 or Windows 7 THEN IF UnicodeWeight.ThirdByteWeight is not 0 THEN COMMENT When 3-byte Unicode Weight is used, append the additional COMMENT BYTE into SortKey APPEND UnicodeWeight.ThirdByteWeight to SortKey as a BYTE ENDIF ENDIF ENDFOR // // Copy Separator to destination buffer. // APPEND SORTKEY_SEPARATOR to SortKey as a BYTE // // Store Diacritic Weights in the destination buffer. // IF (NORM_IGNORENONSPACE bit is not turned on in Flags) THEN IF (IsReverseDW is TRUE) THEN // // Reverse diacritics: // - remove diacritics from left to right. // - store diacritics from right to left. // FOR each DiacriticWeight in DiacriticWeights in the "first in first out" order IF DiacriticWeight <= MIN_DW THEN REMOVE DiacriticWeight from DiacriticWeights ELSE BREAK from the current FOR loop ENDIF ENDFOR FOR each DiacriticWeight in DiacriticWeights in the "last in first out" order // // Copy Unicode weight to destination buffer. // APPEND DiacriticWeight to SortKey as a BYTE ENDFOR ELSE // // Regular diacritics: // - remove diacritics from right to left. // - store diacritics from left to right. FOR each DiacriticWeight in DiacriticWeights in the "last in first out" order IF DiacriticWeight <= MIN_DW THEN REMOVE DiacriticWeight from DiacriticWeights ELSE BREAK from the current FOR loop ENDIF ENDFOR FOR each DiacriticWeight in DiacriticWeights in the order of "first in first out" // // Copy Unicode weight to destination buffer. // APPEND DiacriticWeight to SortKey as a BYTE ENDFOR ENDIF ENDIF // // Copy Separator to destination buffer. // APPEND SORTKEY_SEPARATOR to SortKey as a BYTE // // Store case Weights // // - Eliminate minimum CW. // - Copy case weights to destination buffer. // IF (NORM_IGNORECASE bit is not turned on in Flags OR NORM_IGNOREWIDTH bit is not turned on in Flags) THEN FOR each CaseWeight in CaseWeights in the "last in first out" order IF CaseWeight <= MIN_CW THEN REMOVE CaseWeight from CaseWeights ELSE BREAK from the current FOR loop ENDIF ENDFOR FOR each CaseWeight in CaseWeights // // Copy Unicode weight to destination buffer. // APPEND CaseWeight to SortKey as a BYTE ENDFOR ENDIF // // Copy Separator to destination buffer. // APPEND SORTKEY_SEPARATOR to SortKey as a BYTE // // Store the Extra Weights in the destination buffer for // EAST ASIA Special. // // - Eliminate unnecessary XW. // - Copy extra weights to destination buffer. // IF Length(ExtraWeights) is greater than 0 THEN IF (NORM_IGNORENONSPACE bit is turned on in Flag) THEN APPEND 0xff to SortKey as a BYTE APPEND 0x02 to SortKey as a BYTE ENDIF // Append W6 group to SortKey // Trim unused values from the end of the string SET EndExtraWeight to Length(ExtraWeights) - 1 WHILE EndExtraWeight greater than 0 and ExtraWeightSeparator[EndExtraWeight].W6 == 0xe4 DECREMENT EndExtraWeight ENDWHILE SET ExtraWeightIndex to 0 WHILE ExtraWeightIndex is less than or equal to EndExtraWeight APPEND ExtraWeightSeparator[ExtraWeightIndex].W6 to SortKey as a BYTE INCREMENT ExtraWeightIndex ENDWHILE // Append W6 separator APPEND 0xff to SortKey as a BYTE // Append W7 group to SortKey // Trim unused values from the end of the string SET EndExtraWeight to Length(ExtraWeights) - 1 WHILE EndExtraWeight greater than 0 and ExtraWeightSeparator[EndExtraWeight].W7 == 0xe4 DECREMENT EndExtraWeight ENDWHILE SET ExtraWeightIndex to 0 WHILE ExtraWeightIndex is less than or equal to EndExtraWeight APPEND ExtraWeightSeparator[ExtraWeightIndex].W7 to SortKey INCREMENT ExtraWeightIndex ENDWHILE // Append W7 separator APPEND 0xff to SortKey as a BYTE ENDIF // // Copy Separator to destination buffer. // APPEND SORTKEY_SEPARATOR to SortKey as a BYTE // // Store the Special Weights in the destination buffer. // // - Copy special weights to destination buffer. // FOR each SpecialWeight in SpecialWeights // High byte (most significant) SET Byte1 to SpecialWeight.Position >> 8 // Low byte (least significant) SET Byte2 to SpecialWeight.Position & 0xff APPEND Byte1 to SortKey as a BYTE APPEND Byte2 to SortKey as a BYTE APPEND SpecialWeight.Script to SortKey as a BYTE APPEND SpecialWeight.Weight to SortKey as a BYTE ENDFOR // // Copy terminator to destination buffer. // APPEND SORTKEY_TERMINATOR to SortKey RETURN SortKey
<5> Section 3.1.5.2.16: The following MapOldHangulSortKey algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
-
COMMENT MapOldHangulSortKey COMMENT COMMENT On Entry: SourceString - Unicode String to test COMMENT SourceIndex - Index of leading Jamo to start COMMENT from COMMENT SortLocale - Locale to use for linguistic COMMENT sort data COMMENT UnicodeWeights - String to store any Unicode COMMENT weight found COMMENT for this character(s) COMMENT COMMENT On Exit: CharactersRead - Number of old Hangul found COMMENT UnicodeWeights - Any Unicode weights found are COMMENT appended COMMENT PROCEDURE MapOldHangulSortKey(IN SourceString : Unicode String, IN SourceIndex : 32 bit integer, IN SortLocale : LCID, IN OUTUnicodeWeights : String of UnicodeWeightType, IN IsKoreanLocale : Boolean, OUT CharactersRead : 32 bit integer) SET CurrentIndex to SourceIndex SET JamoSortInfo to empty JamoSortInfoType // Get any Old Hangul Leading Jamo composition for our Leading Jamo SET JamoClass to CALL GetJamoComposition WITH (SourceString, SourceIndex, "Leading Jamo Class", JamoSortInfo) IF JamoClass is equal to "Vowel Jamo Class" THEN // A Vowel Jamo, try to find an // Old Hangul Vowel Jamo composition. SET JamoClass to CALL GetJamoComposition WITH (SourceString, SourceIndex, "Vowel Jamo Class", JamoSortInfo) ENDIF IF JamoClass is equal to "Trailing Jamo Class" THEN // A Trailing Jamo, try to find an // Old Hangul Trailing Jamo composition. SET JamoClass CALL GetJamoComposition WITH (SourceString, SourceIndex, "Trailing Jamo Class", JamoSortInfo) ENDIF // A valid leading and vowel sequence and this is // old Hangul... IF JamoSortInfo.OldHangulFlag is true THEN // Compute the modern hangul syllable prior to this composition // Users formula from Unicode 3.0 Section 3.11 p54 // "Hangul Syllable Composition" // This converts a U+11.. sequence to a U+AC00 character SET ModernHangul to (JamoSortInfo.LeadingIndex * NLS_JAMO_VOWELCOUNT + JamoSortInfo.VowelIndex) * NLS_JAMO_TRAILING_COUNT + JamoSortInfo.TrailingIndex + NLS_HANGUL_FIRST_SYLLABLE IF JamoSortInfo.FillerUsed is true THEN // If the filler is used, sort before the modern Hangul, // instead of after DECREMENT ModernHangul // If falling off the modern Hangul syllable block... IF ModernHangul is less than NLS_HANGUL_FIRST_SYLLABLE THEN // Sort after the previous character // (Circled Hangul Kiyeok A) SET ModernHangul to 0x326e ENDIF // Shift the leading weight past any old Hangul // that sorts after this modern Hangul SET JamoSortInfo.LeadingWeight to JamoSortInfo.LeadingWeight + 0x80 ENDIF // Store the weights SET CharacterWeight to CALL GetCharacterWeights WITH (ModernHangul) SET UnicodeWeight to CALL CorrectUnicodeWeight WITH (CharacterWeight, IsKoreanLocale) APPEND UnicodeWeight to UnicodeWeights // Add additional weights SET UnicodeWeight to CALL MakeUnicodeWeight WITH (ScriptMember_Extra_UnicodeWeight, JamoSortInfo.LeadingWeight, false) APPEND UnicodeWeight to UnicodeWeights SET UnicodeWeight to CALL MakeUnicodeWeight WITH (ScriptMember_Extra_UnicodeWeight, JamoSortInfo.VowelWeight, false) APPEND UnicodeWeight to UnicodeWeights SET UnicodeWeight to CALL MakeUnicodeWeight WITH (ScriptMember_Extra_UnicodeWeight, JamoSortInfo.TrailingWeight, false) APPEND UnicodeWeight to UnicodeWeights // Return the characters consumed SET CharactersRead to CurrentIndex - SourceIndex RETURN CharactersRead ENDIF // Otherwise it isn't a valid old Hangul composition // and don't do anything with it SET CharactersRead to 0 RETURN CharactersRead
<6> Section 3.1.5.2.17: The GetJamoComposition algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
<7> Section 3.1.5.2.18: The following GetJamoStateData algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
-
COMMENT GetJamoStateData COMMENT COMMENT On Entry: Character - Unicode Character to get Jamo COMMENT information for COMMENT COMMENT On Exit: JamoStateData - Jamo state information from COMMENT the data file COMMENT COMMENT Jamo State information looks like this in the database: COMMENT COMMENT SORTTABLES COMMENT ... COMMENT JAMOSORT395 COMMENT ... COMMENT 0x11724 COMMENT 0x1172 0x00 0x00 0x11 0x00 0x380x03; U+1172 COMMENT 0x1161 0x01 0x00 0x00 0x00 0x000x01; U+1172,1161 COMMENT 0x1175 0x01 0x00 0x11 0x1b 0x3a0x00; U+1172,1161,1175 COMMENT 0x1169 0x01 0x00 0x11 0x1b 0x3f0x00; U+1172,1169 PROCEDURE GetJamoStateData (IN Character : Unicode Character, OUT JamoStateData : JamoStateDateType) // Get the Jamo section for this character. // If Character was 0x1172, this would access the following section: // 0x11724 // 0x1172 0x00 0x00 0x11 0x00 0x38 0x03 ; U+1172 record 0 // 0x1161 0x01 0x00 0x00 0x00 0x00 0x01 ; U+1172,1161 record 1 // 0x1175 0x01 0x00 0x11 0x1b 0x3a 0x00 ; U+1172,1161,1175 record 2 // 0x1169 0x01 0x00 0x11 0x1b 0x3f 0x00 ; U+1172,1169 record 3 // | | | | | | | | // Field 1 2 3 4 5 6 7 Comment OPEN SECTION JamoSection where name is SORTTABLES\JAMOSORT\[Character] from unisort.txt // Now open the first record SELECT RECORD JamoRecord FROM JamoSection WHERE record index is 0 // Now gather the information from that record. SET JamoStateData.OldHangulFlag to JamoRecord.Field2 SET JamoStateData.LeadingIndex to JamoRecord.Field3 SET JamoStateData.VowelIndex to JamoRecord.Field4 SET JamoStateData.TrailingIndex to JamoRecord.Field5 SET JamoStateData.ExtraWeight to JamoRecord.Field6 SET JamoStateData.TransitionCount to JamoRecord.Field7 // Remember the record SET JamoStateData.DataRecord to JamoRecord RETURN JamoStateData
<8> Section 3.1.5.2.19: The FindNewJamoState algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
<9> Section 3.1.5.2.20: The following UpdateJamoSortInfo algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
-
COMMENT UpdateJamoSortInfo COMMENT COMMENT On Entry: JamoClass - The current Jamo Class COMMENT JamoStateData - Information about the new COMMENT character state COMMENT JamoSortInfo - Information about the character COMMENT state COMMENT COMMENT On Exit: JamoSortInfo - Updated with information about COMMENT the new state based on JamoClass COMMENT and JamoSortData COMMENT PROCEDURE UpdateJamoSortInfo(IN JamoClass : enumeration, IN JamoStateData : JamoStateDataType, INOUT JamoSortInfo : JamoSortInfoType) // Record if this is a Jamo unique to old Hangul SET JamoSortInfo.OldHangulFlag to JamoSortInfo.OldHangulFlag | JamoStateData.OldHangulFlag // Update the indices if the new ones are higher than the current // ones. IF JamoStateData.LeadingIndex is greater than JamoSortInfo.LeadingIndex THEN SET JamoSortInfo.LeadingIndex to JamoStateData.LeadingIndex; ENDIF IF JamoStateData.VowelIndex is greater than JamoSortInfo.VowelIndex THEN SET JamoSortInfo.VowelIndex to JamoStateData.VowelIndex; ENDIF IF JamoStateData.TrailingIndex is greater than JamoSortInfo.TrailingIndex THEN SET JamoSortInfo.TrailingIndex to JamoStateData.TrailingIndex; ENDIF // Update the extra weights according to the current Jamo class. CASE JamoClass OF "Leading Jamo Class": IF JamoStateData.ExtraWeight is greater than JamoSortInfo.LeadingWeight THEN SET JamoSortInfo.LeadingWeight to JamoStateData.ExtraWeight ENDIF "Vowel Jamo Class": IF JamoStateData.ExtraWeight is greater than JamoSortInfo.VowelWeight THEN SET JamoSortInfo.VowelWeight to JamoStateData.ExtraWeight ENDIF "Trailing Jamo Class": IF JamoStateData.ExtraWeight is greater than JamoSortInfo.TrailingWeight THEN SET JamoSortInfo.TrailingWeight to JamoStateData.ExtraWeight ENDIF ENDCASE RETURN JamoSortInfo
<10> Section 3.1.5.2.21: The IsJamo algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
<11> Section 3.1.5.2.22: The IsCombiningJamo algorithm is not used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
<12> Section 3.1.5.2.23: The following IsJamoLeading algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
-
COMMENT IsJamoLeading COMMENT COMMENT On Entry: SourceCharacter - Unicode Character to test COMMENT COMMENT On Exit: Result - true if SourceCharacter is a COMMENT leading Jamo COMMENT COMMENT NOTE: Only call this if the character is known to be a Jamo COMMENT syllable. This function only helps distinguish between COMMENT the different types of Jamo, so only call it if COMMENT IsJamo() has returned true. COMMENT PROCEDURE IsJamoLeading(IN SourceCharacter : Unicode Character, OUT Result: boolean) IF SourceCharacter is less than NLS_CHAR_FIRST_VOWEL_JAMO THEN SET Result to true ELSE SET Result to false ENDIF RETURN Result
<13> Section 3.1.5.2.24: The IsJamoVowel algorithm is not applicable to Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
<14> Section 3.1.5.2.25: The following IsJamoTrailing algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
-
COMMENT IsJamoTrailing COMMENT COMMENT On Entry: SourceCharacter - Unicode Character to test COMMENT COMMENT On Exit: Result - true if this is a trailing Jamo COMMENT COMMENT NOTE: Only call this if the character is known to be a Jamo COMMENT syllable. This function only helps distinguish between COMMENT the different types of Jamo, so only call it if COMMENT IsJamo() has returned true. COMMENT PROCEDURE IsJamoTrailing(IN SourceCharacter : Unicode Character, OUT Result: boolean) IF SourceCharacter is greater than or equal to NLS_CHAR_FIRST_VOWEL_JAMO THEN SET Result to true ELSE SET Result to false ENDIF RETURN Result
<15> Section 3.1.5.4: The IdnToNameprepUnicode, IdnToAscii, and IdnToUnicode algorithms are not applicable to Windows NT, Windows 2000, Windows XP, or Windows Server 2003. These algorithms follow the IDNA2003 standards for Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 operating system. Otherwise, the algorithms follow the IDNA2008+UTS46 standards.
<16> Section 3.1.5.4.6: This version is not used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.
<17> Section 3.1.5.4.7: This version is used in Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2