6 Appendix A: Product Behavior

Article
04/23/2024

The information in this specification is applicable to the following Microsoft products or supplemental software. References to product versions include released service packs.

Windows NT operating system
Windows 2000 operating system
Windows XP operating system
Windows Server 2003 operating system
Windows Vista operating system
Windows Server 2008 operating system
Windows 7 operating system
Windows Server 2008 R2 operating system
Windows 8 operating system
Windows Server 2012 operating system
Windows 8.1 operating system
Windows Server 2012 R2 operating system
Windows 10 operating system
Windows Server 2016 operating system
Windows Server operating system
Windows Server 2019 operating system
Windows Server 2022 operating system
Windows 11 operating system
Windows Server 2025 operating system

Exceptions, if any, are noted in this section. If an update version, service pack or Knowledge Base (KB) number appears with a product name, the behavior changed in that update. The new behavior also applies to subsequent updates unless otherwise specified. If a product edition appears with the product version, behavior is different in that product edition.

Unless otherwise specified, any statement of optional behavior in this specification that is prescribed using the terms "SHOULD" or "SHOULD NOT" implies product behavior in accordance with the SHOULD or SHOULD NOT prescription. Unless otherwise specified, the term "MAY" implies that the product does not follow the prescription.

<1> Section 3.1.5.2.3: Only Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 use record count for DEFAULT.

<2> Section 3.1.5.2.3: An LCID is used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

<3> Section 3.1.5.2.3.1: The files in the download map to specific Windows versions as follows:

Version	File Name
Windows NT 4.0 operating system, Windows 2000, Windows XP, and Windows Server 2003	Windows NT 4.0 through Windows Server 2003 Sorting Weight Table.txt
Windows Vista	Windows Vista Sorting Weight Table.txt
Windows Server 2008	Windows Server 2008 Sorting Weight Table.txt
Windows 7 and Windows Server 2008 R2	Windows 7 and Windows Server 2008 R2 Sorting Weight Table.txt
Windows 8, Windows 8.1, Windows Server 2012, and Windows Server 2012 R2	Windows 8 and Windows Server 2012 Sorting Weight Table.txt Windows 8 Upper Case Mapping Table.txt
Windows 10, Windows Server 2016, Windows Server operating system, and Windows Server 2019	Windows 10 Sorting Weight Table.txt
Windows Server 2022	Windows Server 2022 Sorting Weight Table.txt
Windows 11	Windows 11 Sorting Weight Table.txt

<4> Section 3.1.5.2.4: The following algorithm for generation of sort keys for a specific UTF-16 string is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, Windows Server 2008 R2, Windows 8, Windows 8.1, Windows Server 2012, Windows Server 2012 R2, Windows 10, Windows Server 2016, Windows Server operating system, Windows Server 2019, and Windows Server 2022.

    STRUCTURE CharacterWeightType
    (
         ScriptMember:    8 bit integer
         PrimaryWeight:   8 bit integer
         DiacriticWeight: 8 bit integer
         CaseWeight:      8 bit integer
    )
    
    STRUCTURE UnicodeWeightType
    (
         ScriptMember:    8 bit integer
         PrimaryWeight:   8 bit integer
         ThirdByteWeight: 8 bit integer
    )
    
    STRUCTURE SpecialWeightType
    (
         Position:       16 bit integer
         ScriptMember:    8 bit integer
         PrimaryWeight:   8 bit integer
    )
    
    STRUCTURE ExtraWeightType
    (
         W6:              8 bit integer
         W7:              8 bit integer
    )
    SET constant LCID_KOREAN to 0x0412
    SET constant LCID_KOREAN_UNICODE_SORT to 0x010412
    SET constant LCID_HUNGARIAN to 0x040e
    
    SET constant SORTKEY_SEPARATOR to 0x01
    SET constant SORTKEY_TERMINATOR to 0x00
    
    SET global KoreanScriptMap to InitKoreanScriptMap
    
    //
    //  Script Member Values.
    //
    SET constant UNSORTABLE       to 0
    SET constant NONSPACE_MARK    to 1
    SET constant EXPANSION        to 2
    SET constant EASTASIA_SPECIAL to 3
    SET constant JAMO_SPECIAL     to 4
    SET constant EXTENSION_A      to 5
    SET constant PUNCTUATION      to 6
    
    SET constant SYMBOL_1         to 7
    SET constant SYMBOL_2         to 8
    SET constant SYMBOL_3         to 9
    SET constant SYMBOL_4         to 10
    SET constant SYMBOL_5         to 11
    SET constant SYMBOL_6         to 12
    
    SET constant DIGIT            to 13
    
    SET constant LATIN            to 14
    SET constant KANA             to 34
    SET constant IDEOGRAPH        to 128
    
    IF Windows version is Windows Vista, Windows Server 2008, Windows 7, or    
       Windows Server 2008 R2 THEN
    SET constant MAX_SPECIAL_CASE to SYMBOL_6
    
    ELSE
    SET constant MAX_SPECIAL_CASE to SYMBOL_5
    ENDIF
        COMMENT Set the constant for fhe first script member of the Unicode                                                                                      
        COMMENT Private Use Area (PUA) range
        SET constant PUA3BYTESTART to 0xA9
        COMMENT Set the constant for the last script member of the Unicode  
        COMMENT Private Use Area (PUA) range
        SET constant PUA3BYTEEND to 0xAF
    
        COMMENT Set the constant for the first script member of CJK  
        COMMENT(Chinese/Japanese/Korean) 3 byte weight range
        SET constant CJK3BYTESTART to 0xC0
        COMMMENT Set the constant for the last script member of CJK 
        COMMENT (Chinese/Japanese/Korean) 3 byte weight range
        SET constant CJK3BYTEEND to 0xEF
    ENDIF
    SET constant FIRST_SCRIPT     to LATIN
    SET constant MAX_SCRIPTS      to 256
    
    //
    //  Values for CJK Unified Ideographs Extension A range.
    //    0x3400 thru 0x4dbf
    //
    SET constant SCRIPT_MEMBER_EXT_A  to 254       // SM for Extension A
    SET constant PRIMARY_WEIGHT_EXT_A to 255       // AW for Extension A
    
    //
    //  Lowest weight values.
    //  Used to remove trailing DW and CW values.
    //  Also used to keep illegal values out of sort keys.
    //
    
    SET constant MIN_DW to 2
    SET constant MIN_DW to 2
    
    //
    //  Bit mask values.
    //
    //  Case Weight (CW) - 8 bits:
    //    bit 0   => width
    //    bit 1,2 => small kana, sei-on
    //    bit 3,4 => upper/lower case
    //    bit 5   => kana
    //    bit 6,7 => contraction
    //
    
    
        SET constant CONTRACTION_8_MASK to 0xc0
        SET constant CONTRACTION_7_MASK to 0xc0
        SET constant CONTRACTION_6_MASK to 0xc0
        SET constant CONTRACTION_5_MASK to 0x80
        SET constant CONTRACTION_4_MASK to 0x80
        SET constant CONTRACTION_3_MASK to 0x40
        SET constant CONTRACTION_2_MASK to 0x40
    
        SET constant CONTRACTION_MASK to 0xc0
    
    ELSE
        COMMENT Otherwise, only 2-character or 3-character contractions 
    //  are supported.
    SET constant CONTRACTION_3_MASK to 0xc0  
    //  Bit-mask to check 2 character contraction or 3 character contraction
    SET constant CONTRACTION_2_MASK to 0x80  
    //  Bit-mask to check 2 character contraction
    ENDIF
    
    SET constant CASE_UPPER_MASK to 0xe7  // zero out case bits
    SET constant CASE_KANA_MASK  to 0xdf  // zero out kana bit
    SET constant CASE_WIDTH_MASK to 0xfe  // zero out width bit
    
    //
    //  Masks to isolate the various bits in the case weight.
    //
    //  NOTE: Bit 2 needs to always equal 1 to avoid getting
    //        a byte value of either 0 or 1.
    //
    
    SET constant CASE_EXTRA_WEIGHT_MASK to 0xc4
    SET constant ISOLATE_KANA to
                 (~CASE_KANA_MASK) | CASE_EXTRA_WEIGHT_MASK
    SET constant ISOLATE_WIDTH to 
                 (~CASE_WIDTH_MASK) | CASE_EXTRA_WEIGHT_MASK
    
    //
    //  Values for East Asia special case primary weights.
    //
    SET constant PW_REPEAT      to 0
    SET constant PW_CHO_ON      to 1
    SET constant MAX_SPECIAL_PW to PW_CHO_ON
    
    //
    //  Values for weight 5 - East Asia Extra Weights.
    //
    SET constant WT_FIVE_KANA to 3
    SET constant WT_FIVE_REPEAT to 4
    SET constant WT_FIVE_CHO_ON to 5
    
    //
    //  PW Mask for Cho-On:
    //  Leaves bit 7 on in PW, so it becomes Repeat
    //  if it follows Kana N.
    //
    SET constant CHO_ON_PW_MASK to 0x87
    
    //
    //  Special weight values
    //
    SET constant MAP_INVALID_WEIGHT to 0xff
    
    //
    //  Some Significant Values for Korean Jamo.
    //  The L, V & T syllables in the 0x1100 Unicode range
    //  can be composed to characters in the 0xac00 range.
    //  See The Unicode Standard for details.
    //
    SET constant NLS_CHAR_FIRST_JAMO       to 0x1100 
    //  Begin Jamo range
    SET constant NLS_CHAR_LAST_JAMO        to 0x11f9 
    //  End Jamo range
    SET constant NLS_CHAR_FIRST_VOWEL_JAMO to 0x1160 
    //  First Vowel Jamo
    SET constant
        NLS_CHAR_FIRST_TRAILING_JAMO to 0x11a8   
    //  First Trailing Jamo
    SET constant
        NLS_JAMO_VOWEL_COUNT to 21           
    //  Number of vowel Jamo (V)
    SET constant
        NLS_JAMO_TRAILING_COUNT to 28     
    //  Number of trailing Jamo (L)
    SET constant
        NLS_HANGUL_FIRST_COMPOSED to 0xac00      
    //  Begin composed range
    
    //
    //  Values for Unicode Weight extra weights (e.g. Jamo (old Hangul)).
    //  The following uses SM for extra UW weights.
    //
    SET constant ScriptMember_Extra_UnicodeWeight to 255
    //  Leading Weight / Vowel Weight / Trailing Weight
    //  according to the current Jamo class.
    //
    STRUCTURE JamoSortInfoType
    (
         // true for an old Hangul sequence
         OldHangulFlag : Boolean
         
         // true if U+1160 (Hangul Jungseong Filler) used
         FillerUsed : Boolean
    
         // index to the prior modern Hangul syllable (L)
         LeadingIndex : 8 bit integer
    
         // index to the prior modern Hangul syllable (V)
         VowelIndex : 8 bit integer
    
         // index to the prior modern Hangul syllable (T)
         TrailingIndex : 8 bit integer
    
         // Weight to offset from other old hangul (L)
         LeadingWeight : 8 bit integer
    
         // Weight to offset from other old hangul (V)
         VowelWeight : 8 bit integer
    
         // Weight to offset from other old hangul (T)
         TrailingWeight : 8 bit integer
    )
    
    // This is the raw data record type from the data table
    STRUCTURE JamoStateDataType
    (
         // true for an old Hangul sequence
         OldHangulFlag : Boolean
    
         // index to the prior modern Hangul syllable (L)
         LeadingIndex : 8 bit integer
    
         // index to the prior modern Hangul syllable (V)
         VowelIndex : 8 bit integer
    
         // index to the prior modern Hangul syllable (T)
         TrailingIndex : 8 bit integer
    
         // weight to distinguish from old Hangul
         ExtraWeight : 8 bit integer
    
         // number of additional records in this state
         TransitionCount : 8 bit integer
    
         // Current record in unisort.txt Jamo table:
         JamoRecord : data record
    
         // SORTTABLES\JAMOSORT\[Character] section 
    )
    COMMENT GetWindowsSortKey
    COMMENT
    COMMENT  On Entry:  SourceString - Unicode String to compute a
    COMMENT                            sort key for
    COMMENT             SortLocale   - Locale to determine correct 
    COMMENT                            linguistic sort
    COMMENT             Flags        - Bit Flag to control behavior
    COMMENT                            of sort key generation. 
    COMMENT                             
    COMMENT  NORM_IGNORENONSPACE    Ignore diacritic weight
    COMMENT  NORM_IGNORECASE:       Ignore case weight
    COMMENT  NORM_IGNOREKANATYPE:   Ignore Japanese Katakana/Hiraga
    COMMENT                         difference
    COMMENT  NORM_IGNOREWIDTH:      Ignore Chinese/Japanese/Korean
    COMMENT                         half-width and full-width difference.
    COMMENT
    COMMENT  On Exit:   SortKey      - Byte array containing the
    COMMENT                            computed sort key.
    COMMENT
    
    PROCEDURE GetWindowsSortKey(IN SourceString : Unicode String,
                                IN SortLocale :   LCID,
                                IN Flags : 32 bit integer,
                                OUT SortKey : BYTE String)
    
    COMMENT Compute flags for sort conditions
    COMMENT Based on the case/kana/width flags,
    COMMENT   turn off bits in case mask when comparing case weight.
    
    SET CaseMask to 0xff
    
    If (NORM_IGNORECASE bit is on in Flags) THEN
        SET CaseMask to CaseMask LOGICAL AND with CASE_UPPER_MASK
    ENDIF
    
    If (NORM_IGNOREKANATYPE bit is on in Flags) THEN
        SET CaseMask to CaseMask LOGICAL AND with CASE_KANA_MASK
    ENDIF
    
    If (NORM_IGNOREWIDTH bit is on in Flags) THEN
        SET CaseMask to CaseMask LOGICAL AND with CASE_WIDTH_MASK
    ENDIF
    
    COMMENT Windows 7 and Windows Server 2008 R2 use 3-byte 
    COMMENT (instead of 2-byte) sequence for Unicode Weights
    COMMENT for Private Use Area (PUA) and some Chinese/Japanese/Korean (
    COMMENT CJK) script members.
    COMMENT Does this sort have a 3-byte Unicode Weight (CJK sorts)?
    IF Windows version is Windows 7 and Windows Server 2008 R2 THEN
       COMMENT Check if the locale can have 3-byte Unicode weight
       SET Is3ByteWeightLocale to CALL Check3ByteWeightLocale(SortLocale)
    ENDIF
    
    
    IF Windows version is Windows Vista, Windows Server 2008, Windows 7, or 
    Windows Server 2008 R2 THEN
        COMMENT For Windows Vista, Windows Server 2008, Windows 7, and 
        COMMENT Windows Server 2008 R2, the algorithm
        COMMENT does not remap the script for Korean locale
        SET IsKoreanLocale to false
    ELSE
    
    
       IF SortLocale is LCID_KOREAN or
          SortLocale is LCID_KOREAN_UNICODE_SORT THEN
             SET IsKoreanLocale to true
             IF KoreanScriptMap is null THEN
                 CALL InitKoreanScriptMap
       ELSE
           SET IsKoreanLocale to false
       ENDIF
    ENDIF
    
    //
    //  Allocate buffer to hold different levels of sort key weights.
    //  UnicodeWeights/ExtraWeights/SpecialWeights will be eventually
    //  to be collected together, in that order, into the returned
    //  Sortkey byte string.
    //
    //  Maximum expansion size is 3 times the input size
    //
    
    // Unicode Weight => 4 word (16 bit) length
    // (extension A and Jamo need extra words)
    SET UnicodeWeights to new empty string of UnicodeWeightType
    
    SET DiacriticWeights to new empty string of BYTE
    SET CaseWeights to new empty string of BYTE
    
    // Extra Weight=>4 byte length (4 weights, 1 byte each) FE Special
    SET ExtraWeights to new empty string of ExtraWeightType
    
    // Special Weight => dword length (2 words each of 16 bits)
    SET SpecialWeights to new empty string of SpecialWeightType
    
    //
    // Go through the string, code point by code point,
    // testing for contractions and Hungarian special character sequence
    //
    
    // loop presumes 0 based index for source string
    FOR SourceIndex is 0 to Length(SourceString) -1
        //
        // Get weights
        // CharacterWeight will contain all of the weight information
        // for the character tested.
        //
    
        SET CharacterWeight to CALL GetCharacterWeights
            WITH (SortLocale, SourceString[SourceIndex])
    
        SET ScriptMember to CharacterWeight.ScriptMember
    
        // Special case weights have script members less than
        // MAX_SPECIAL_CASE (11)
        IF ScriptMember is greater than MAX_SPECIAL_CASE  THEN
    
            //
            //  No special case on character, but has to check for
            //  contraction characters and Hungarian special 
            //  character sequence characters.
            //
    
            SET HasHungarianSpecialCharacterSequence to CALL                                   
                TestHungarianCharacterSequences
                     WITH (SortLocale, SourceString, SourceIndex)
    
            SET Result to CALL GetContractionType WITH (CharacterWeight)
    
            CASE Result OF
               
               "3-character Contraction":
                   COMMENT This is only possible for Windows versions that 
                   COMMENT are Windows NT 4.0 through Windows Server 2003
                   Set ContractionFound to CALL SortkeyContractionHandler  
                     WITH (SortLocale, SourceString, SourceIndex,   
                           HasHungarianSpecialCharacterSequence, 3, 
                           UnicodeWeights, DiacriticWieghts, CaseWeights)
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ENDIF
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ENDIF
                   COMMENT If no contraction is found, fall through into additional cases.
                   FALLTHROUGH
    
               "2-character Contraction":
                   COMMENT This is only possible for Windows versions that are 
                   COMMENT Windows NT 4.0 through Windows Server 2003
                   Set ContractionFound to CALL SortkeyContractionHandler  
                    WITH (SortLocale, SourceString, SourceIndex,             
                          HasHungarianSpecialCharacterSequence, 2,
                          UnicodeWeights, DiacriticWieghts, CaseWeights)
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ENDIF
                   COMMENT If no contraction is found, fall through into the OTHER case.
                   COMMENT Since "3-character contraction" or "2-character contraction" 
                   COMMENT are the only two possible values for 
                   COMMENT Windows NT 4.0 through Windows Server 2003, all calls to 
                   COMMENT SortkeyContractionHandler  will return false.
                   COMMENT So, the fallthrough will go directly to the OTHERS section
                   FALLTHROUGH
    
               "6-character contraction, 7-character contraction, or 8-character contraction":
                   Set ContractionFound to CALL SortkeyContractionHandler  
                    WITH (SortLocale, SourceString, SourceIndex,  
                          HasHungarianSpecialCharacterSequence, 8,
                          UnicodeWeights, DiacriticWieghts, CaseWeights)
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ELSE
                       Set ContractionFound to CALL SortkeyContractionHandler 
                        WITH (SortLocale, SourceString, SourceIndex,  
                              HasHungarianSpecialCharacterSequence, 7,
                              UnicodeWeights, DiacriticWieghts, CaseWeights)
                   ENDIF
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ELSE
                       Set ContractionFound to CALL SortkeyContractionHandler  
                        WITH (SortLocale, SourceString, SourceIndex,  
                              HasHungarianSpecialCharacterSequence, 6,
                              UnicodeWeights, DiacriticWieghts, CaseWeights)
                   ENDIF
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ENDIF
                   COMMENT If no contraction is found, fall through into additional cases.
                   FALLTHROUGH
    
               "4-character contraction or 5-character contraction":
                   Set ContractionFound to CALL SortkeyContractionHandler  
                    WITH (SortLocale, SourceString, SourceIndex,             
                          HasHungarianSpecialCharacterSequence, 5,
                          UnicodeWeights, DiacriticWieghts, CaseWeights)
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ELSE
                       Set ContractionFound to CALL SortkeyContractionHandler 
                        WITH (SortLocale, SourceString, SourceIndex,  
                              HasHungarianSpecialCharacterSequence, 4,
                              UnicodeWeights, DiacriticWieghts, CaseWeights)
                   ENDIF
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ENDIF
                   COMMENT If no contraction is found, fall through into additional cases.
                   FALLTHROUGH
    
               "2-character contraction or 3-character contraction":
                   Set ContractionFound to CALL SortkeyContractionHandler  
                    WITH (SortLocale, SourceString, SourceIndex,      
                          HasHungarianSpecialCharacterSequence, 3,
                          UnicodeWeights, DiacriticWieghts, CaseWeights)
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ELSE
                       Set ContractionFound to CALL SortkeyContractionHandler 
                        WITH (SortLocale, SourceString, SourceIndex,    
                              HasHungarianSpecialCharacterSequence, 2,
                              UnicodeWeights, DiacriticWieghts, CaseWeights)
                   ENDIF
                   IF ContractionFound is true THEN
                       COMMENT Break out of the case statement
                       BREAK
                   ENDIF
                   COMMENT If no contraction is found, fall through into additional cases.
                   FALLTHROUGH
    
    
               OTHERS :
                  IF Windows version is greater than Windows Server 2008 R2 or Windows 7 
                    THEN
                      COMMENT In Windows Server 2008 R2 or Windows 7, 
                      COMMENT Private Use Area (PUA) code points 
                      COMMENT and some CJK (Chinese/Japanese/Korean) sorts 
                      COMMENT might need 3 byte weights
                      COMMENT Store normal Unicode weight first. Note that there is no 
                      COMMENT adjustment of Korean weight anymore.
                      SET UnicodeWeight to 
                         CorrectUnicodeWeight(CharacterWeight, FALSE)
                      COMMENT Assume 3-byte Unicode Weight is not used first. 
                      COMMENT  The alogorithm will check this later.
                       SET UnicodeWeight.ThirdByteWeight to 0
    
                      IF (ScriptMember is equal to or greater than PUA3BYTESTART)  
                         AND                       
                         (ScriptMember is less than or equal to PUA3BYTEEND) THEN
                          SET IsScriptMemberPUA3BYTEWeight to true
                      ELSE
                          SET IsScriptMemberPUA3ByteWeight to false
                      ENDIF
        
                        
                      IF (ScriptMember is equal to or greater than CJK3BYTESTART) AND
                         (ScriptMember is less than or equal to CJK3BYTEEND) THEN
                          SET IsScriptMemberCJK3ByteWeight to true
                      ELSE
                       SET IsScriptMemberCJK3ByteWeight to false
                      ENDIF
                      IF (IsScriptMemberPUA3ByteWeight is true) OR 
                         (Is3ByteWeightLocale AND 
                          IsScriptMemberCJK3ByteWeight is true) THEN
                          COMMENT PUA code points and some CJK sorts need 3 byte weights
                          SET UnicodeWeight.ThirdByteWeight to CharacterWeight.DiacriticWeight
                      ELSE
                        
                          COMMENT Normal Diacritic Weight
                          APPEND CharacterWeight.DiacriticWeight to DiacriticWeights as a BYTE
                      ENDIF
                      APPEND UnicodeWeight to UnicodeWeights
    
                      SET CaseWeight to GetCaseWeight(CharacterWeight)
                      APPEND CharacterWeight.CaseWeight to CaseWeights as a BYTE
    
                  ELSE
    
                      SET UnicodeWeight to 
                         CorrectUnicodeWeight(CharacterWeight, IsKoreanLocale)
                      APPEND UnicodeWeight to UnicodeWeights
                      APPEND CharacterWeight.DiacriticWeight to DiacriticWeights                                                      
                             as a BYTE
                      SET CaseWeight to GetCaseWeight(CharacterWeight)
                      APPEND CharacterWeight.CaseWeight to CaseWeights as a BYTE
                  ENDIF
           ENDCASE
        ELSE
           CALL SpecialCaseHandler WITH (SourceString, SourceIndex,
                      UnicodeWeights, ExtraWeights, SpecialWeights,
                      SortLocale, IsKoreanLocale)
        ENDIF
    ENDFOR
    
    //
    //  Store the Unicode Weights in the destination buffer.
    //
    FOR each UnicodeWeight in UnicodeWeights
        //
        //  Copy Unicode weight to destination buffer.
        //
        APPEND UnicodeWeight.ScriptMember to SortKey as a BYTE
        APPEND UnicodeWeight.PrimaryWeight to SortKey as a BYTE
       IF Windows version is greater than Windows Server 2008 R2 or Windows 7 THEN
           IF UnicodeWeight.ThirdByteWeight is not 0 THEN
               COMMENT When 3-byte Unicode Weight is used, append the additional 
               COMMENT BYTE into SortKey
               APPEND UnicodeWeight.ThirdByteWeight to SortKey as a BYTE
           ENDIF
        ENDIF
    
    ENDFOR
    
    //
    //  Copy Separator to destination buffer.
    //
    APPEND SORTKEY_SEPARATOR to SortKey as a BYTE
    
    //
    //  Store Diacritic Weights in the destination buffer.
    //
    IF (NORM_IGNORENONSPACE bit is not turned on in Flags) THEN
        IF (IsReverseDW is TRUE) THEN
           //
           //  Reverse diacritics:
           //    - remove diacritics from left  to right.
           //    - store  diacritics from right to left.
           //
           FOR each DiacriticWeight in
               DiacriticWeights in the "first in first out" order
              IF DiacriticWeight <= MIN_DW THEN
                 REMOVE DiacriticWeight from DiacriticWeights
              ELSE
                 BREAK from the current FOR loop
              ENDIF
           ENDFOR
    
           FOR each DiacriticWeight in
               DiacriticWeights in the "last in first out" order
              //
              //  Copy Unicode weight to destination buffer.
              //
              APPEND DiacriticWeight to SortKey as a BYTE
           ENDFOR
        ELSE
           //
           //  Regular diacritics:
           //    - remove diacritics from right to left.
           //    - store  diacritics from left  to right.
           FOR each DiacriticWeight in
               DiacriticWeights in the "last in first out" order
               IF DiacriticWeight <= MIN_DW THEN
                  REMOVE DiacriticWeight from DiacriticWeights
               ELSE
                  BREAK from the current FOR loop
               ENDIF
           ENDFOR
    
           FOR each DiacriticWeight in
               DiacriticWeights in the order of "first in first out"
               //
               //  Copy Unicode weight to destination buffer.
               //
               APPEND DiacriticWeight to SortKey as a BYTE
           ENDFOR
        ENDIF
    ENDIF
    
    //
    //  Copy Separator to destination buffer.
    //
    APPEND SORTKEY_SEPARATOR to SortKey as a BYTE
    
    //
    //  Store case Weights
    //
    //    - Eliminate minimum CW.
    //    - Copy case weights to destination buffer.
    //
    IF (NORM_IGNORECASE bit is not turned on in Flags
         OR NORM_IGNOREWIDTH bit is not turned on in Flags) THEN
        FOR each CaseWeight in CaseWeights
            in the "last in first out" order
            IF CaseWeight <= MIN_CW THEN
               REMOVE CaseWeight from CaseWeights
            ELSE
               BREAK from the current FOR loop
            ENDIF
        ENDFOR
    
        FOR each CaseWeight in CaseWeights
           //
           //  Copy Unicode weight to destination buffer.
           //
           APPEND CaseWeight to SortKey as a BYTE
        ENDFOR
    ENDIF
    
    //
    //  Copy Separator to destination buffer.
    //
    APPEND SORTKEY_SEPARATOR to SortKey as a BYTE
    
    //
    //  Store the Extra Weights in the destination buffer for
    //  EAST ASIA Special.
    //
    //    - Eliminate unnecessary XW.
    //    - Copy extra weights to destination buffer.
    //
    IF Length(ExtraWeights) is greater than 0 THEN
        IF (NORM_IGNORENONSPACE bit is turned on in Flag) THEN
           APPEND 0xff to SortKey as a BYTE
           APPEND 0x02 to SortKey as a BYTE
        ENDIF
    
       // Append W6 group to SortKey
       // Trim unused values from the end of the string
       SET EndExtraWeight to Length(ExtraWeights) - 1
    
       WHILE EndExtraWeight greater than 0 and
            ExtraWeightSeparator[EndExtraWeight].W6 == 0xe4
          DECREMENT EndExtraWeight
       ENDWHILE
    
       SET ExtraWeightIndex to 0
       WHILE ExtraWeightIndex is less than or equal to EndExtraWeight
          APPEND ExtraWeightSeparator[ExtraWeightIndex].W6
            to SortKey as a BYTE
          INCREMENT ExtraWeightIndex
       ENDWHILE
    
       // Append W6 separator
       APPEND 0xff to SortKey as a BYTE
    
       // Append W7 group to SortKey
       // Trim unused values from the end of the string
       SET EndExtraWeight to Length(ExtraWeights) - 1
       WHILE EndExtraWeight greater than 0 and
             ExtraWeightSeparator[EndExtraWeight].W7 == 0xe4
          DECREMENT EndExtraWeight
       ENDWHILE
    
       SET ExtraWeightIndex to 0
       WHILE ExtraWeightIndex is less than or equal to EndExtraWeight
          APPEND ExtraWeightSeparator[ExtraWeightIndex].W7 to SortKey
          INCREMENT ExtraWeightIndex
       ENDWHILE
    
       // Append W7 separator
       APPEND 0xff to SortKey as a BYTE
    ENDIF
    
    //
    //  Copy Separator to destination buffer.
    //
    APPEND SORTKEY_SEPARATOR to SortKey as a BYTE
    
    //
    //  Store the Special Weights in the destination buffer.
    //
    //    - Copy special weights to destination buffer.
    //
    FOR each SpecialWeight in SpecialWeights
       // High byte (most significant)
       SET Byte1 to SpecialWeight.Position >> 8
       // Low byte (least significant)
       SET Byte2 to SpecialWeight.Position & 0xff
       APPEND Byte1 to SortKey as a BYTE
       APPEND Byte2 to SortKey as a BYTE
       APPEND SpecialWeight.Script to SortKey as a BYTE
       APPEND SpecialWeight.Weight to SortKey as a BYTE
    ENDFOR
    
    //
    //  Copy terminator to destination buffer.
    //
    APPEND SORTKEY_TERMINATOR to SortKey
    
    RETURN SortKey

<5> Section 3.1.5.2.16: The following MapOldHangulSortKey algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

 COMMENT MapOldHangulSortKey
 COMMENT
 COMMENT  On Entry:  SourceString   - Unicode String to test
 COMMENT             SourceIndex    - Index of leading Jamo to start 
 COMMENT                              from
 COMMENT             SortLocale     - Locale to use for linguistic
 COMMENT                              sort data
 COMMENT             UnicodeWeights - String to store any Unicode
 COMMENT                              weight found
 COMMENT                              for this character(s)
 COMMENT
 COMMENT  On Exit:   CharactersRead - Number of old Hangul found 
 COMMENT             UnicodeWeights - Any Unicode weights found are 
 COMMENT                              appended
 COMMENT
  
 PROCEDURE MapOldHangulSortKey(IN SourceString : Unicode String,
                    IN SourceIndex : 32 bit integer,
                    IN SortLocale : LCID,
                    IN OUTUnicodeWeights : String of UnicodeWeightType,
                    IN IsKoreanLocale : Boolean,
                    OUT CharactersRead : 32 bit integer)
  
 SET CurrentIndex to SourceIndex
 SET JamoSortInfo to empty JamoSortInfoType
  
 // Get any Old Hangul Leading Jamo composition for our Leading Jamo
 SET JamoClass to CALL GetJamoComposition WITH (SourceString,
                 SourceIndex, "Leading Jamo Class", JamoSortInfo)
  
 IF JamoClass is equal to "Vowel Jamo Class" THEN
     // A Vowel Jamo, try to find an  
     // Old Hangul Vowel Jamo composition.
     SET JamoClass to CALL GetJamoComposition WITH (SourceString,
                 SourceIndex, "Vowel Jamo Class", JamoSortInfo)
 ENDIF
  
 IF JamoClass is equal to "Trailing Jamo Class" THEN
     // A Trailing Jamo, try to find an 
     // Old Hangul Trailing Jamo composition.
     SET JamoClass CALL GetJamoComposition WITH (SourceString,
                 SourceIndex, "Trailing Jamo Class", JamoSortInfo)
 ENDIF
  
 // A valid leading and vowel sequence and this is 
 // old Hangul...
 IF JamoSortInfo.OldHangulFlag is true THEN
  
     // Compute the modern hangul syllable prior to this composition
     // Users formula from Unicode 3.0 Section 3.11 p54
     // "Hangul Syllable Composition"
     // This converts a U+11.. sequence to a U+AC00 character
  
     SET ModernHangul to (JamoSortInfo.LeadingIndex *
                NLS_JAMO_VOWELCOUNT + JamoSortInfo.VowelIndex) *
                NLS_JAMO_TRAILING_COUNT + JamoSortInfo.TrailingIndex +
                NLS_HANGUL_FIRST_SYLLABLE
  
     IF JamoSortInfo.FillerUsed is true THEN
         // If the filler is used, sort before the modern Hangul, 
         // instead of after
         DECREMENT ModernHangul
  
         // If falling off the modern Hangul syllable block...
         IF ModernHangul is less than NLS_HANGUL_FIRST_SYLLABLE THEN
             // Sort after the previous character
             // (Circled Hangul Kiyeok A)
            SET ModernHangul to 0x326e
         ENDIF
  
         // Shift the leading weight past any old Hangul
         // that sorts after this modern Hangul
         SET JamoSortInfo.LeadingWeight to
             JamoSortInfo.LeadingWeight + 0x80
     ENDIF
  
     // Store the weights
     SET CharacterWeight to CALL GetCharacterWeights WITH (ModernHangul)
     SET UnicodeWeight to CALL CorrectUnicodeWeight
             WITH (CharacterWeight, IsKoreanLocale)
     APPEND UnicodeWeight to UnicodeWeights
  
     // Add additional weights
     SET UnicodeWeight to CALL MakeUnicodeWeight WITH 
             (ScriptMember_Extra_UnicodeWeight,
              JamoSortInfo.LeadingWeight, false)
     APPEND UnicodeWeight to UnicodeWeights
  
     SET UnicodeWeight to CALL MakeUnicodeWeight WITH
             (ScriptMember_Extra_UnicodeWeight,
              JamoSortInfo.VowelWeight, false)
  
     APPEND UnicodeWeight to UnicodeWeights
     SET UnicodeWeight to CALL MakeUnicodeWeight WITH 
             (ScriptMember_Extra_UnicodeWeight,
              JamoSortInfo.TrailingWeight, false)
  
     APPEND UnicodeWeight to UnicodeWeights
  
     // Return the characters consumed
     SET CharactersRead to CurrentIndex - SourceIndex
     RETURN CharactersRead
 ENDIF
  
 // Otherwise it isn't a valid old Hangul composition
 // and don't do anything with it
  
 SET CharactersRead to 0
 RETURN CharactersRead

<6> Section 3.1.5.2.17: The GetJamoComposition algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

<7> Section 3.1.5.2.18: The following GetJamoStateData algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

 COMMENT GetJamoStateData
 COMMENT
 COMMENT  On Entry:  Character     - Unicode Character to get Jamo 
 COMMENT                             information for
 COMMENT 
 COMMENT  On Exit:   JamoStateData - Jamo state information from 
 COMMENT                             the data file
 COMMENT
 COMMENT  Jamo State information looks like this in the database:
 COMMENT
 COMMENT   SORTTABLES
 COMMENT     ...
 COMMENT     JAMOSORT395
 COMMENT     ...
 COMMENT   0x11724 
 COMMENT     0x1172 0x00 0x00 0x11 0x00 0x380x03; U+1172 
 COMMENT     0x1161 0x01 0x00 0x00 0x00 0x000x01; U+1172,1161 
 COMMENT     0x1175 0x01 0x00 0x11 0x1b 0x3a0x00; U+1172,1161,1175 
 COMMENT     0x1169 0x01 0x00 0x11 0x1b 0x3f0x00; U+1172,1169
  
 PROCEDURE GetJamoStateData (IN Character : Unicode Character,
                             OUT JamoStateData : JamoStateDateType)
  
 // Get the Jamo section for this character.
 // If Character was 0x1172, this would access the following section:
 // 0x11724 
 //    0x1172 0x00 0x00 0x11 0x00 0x38 0x03 ; U+1172           record 0
 //    0x1161 0x01 0x00 0x00 0x00 0x00 0x01 ; U+1172,1161      record 1
 //    0x1175 0x01 0x00 0x11 0x1b 0x3a 0x00 ; U+1172,1161,1175 record 2
 //    0x1169 0x01 0x00 0x11 0x1b 0x3f 0x00 ; U+1172,1169      record 3
 //    |     |    |    |    |    |    |       |
 // Field 1  2    3    4    5    6    7       Comment
  
 OPEN SECTION JamoSection
      where name is SORTTABLES\JAMOSORT\[Character] from unisort.txt
  
 // Now open the first record
 SELECT RECORD JamoRecord FROM JamoSection WHERE record index is 0
  
 // Now gather the information from that record.
 SET JamoStateData.OldHangulFlag   to JamoRecord.Field2
 SET JamoStateData.LeadingIndex    to JamoRecord.Field3
 SET JamoStateData.VowelIndex      to JamoRecord.Field4
 SET JamoStateData.TrailingIndex   to JamoRecord.Field5
 SET JamoStateData.ExtraWeight     to JamoRecord.Field6
 SET JamoStateData.TransitionCount to JamoRecord.Field7
  
 // Remember the record
 SET JamoStateData.DataRecord to JamoRecord
  
 RETURN JamoStateData

<8> Section 3.1.5.2.19: The FindNewJamoState algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

<9> Section 3.1.5.2.20: The following UpdateJamoSortInfo algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

 COMMENT UpdateJamoSortInfo
 COMMENT
 COMMENT  On Entry:  JamoClass     - The current Jamo Class
 COMMENT             JamoStateData - Information about the new
 COMMENT                             character state
 COMMENT             JamoSortInfo  - Information about the character
 COMMENT                             state
 COMMENT
 COMMENT  On Exit:   JamoSortInfo  - Updated with information about
 COMMENT                             the new state based on JamoClass
 COMMENT                             and JamoSortData
 COMMENT
  
 PROCEDURE UpdateJamoSortInfo(IN JamoClass : enumeration,
                              IN JamoStateData : JamoStateDataType,
                              INOUT JamoSortInfo : JamoSortInfoType)
  
 // Record if this is a Jamo unique to old Hangul
 SET JamoSortInfo.OldHangulFlag to
     JamoSortInfo.OldHangulFlag | JamoStateData.OldHangulFlag
  
 //  Update the indices if the new ones are higher than the current
 //  ones.
 IF JamoStateData.LeadingIndex
    is greater than JamoSortInfo.LeadingIndex THEN
    SET JamoSortInfo.LeadingIndex to JamoStateData.LeadingIndex;
 ENDIF
  
 IF JamoStateData.VowelIndex
    is greater than JamoSortInfo.VowelIndex THEN
    SET JamoSortInfo.VowelIndex to JamoStateData.VowelIndex;
 ENDIF
  
 IF JamoStateData.TrailingIndex
    is greater than JamoSortInfo.TrailingIndex THEN
    SET JamoSortInfo.TrailingIndex to JamoStateData.TrailingIndex;
 ENDIF
  
 //  Update the extra weights according to the current Jamo class.
 CASE JamoClass OF
    "Leading Jamo Class":
       IF JamoStateData.ExtraWeight
          is greater than JamoSortInfo.LeadingWeight THEN
          SET JamoSortInfo.LeadingWeight to JamoStateData.ExtraWeight
       ENDIF
  
    "Vowel Jamo Class":
       IF JamoStateData.ExtraWeight
          is greater than JamoSortInfo.VowelWeight THEN
          SET JamoSortInfo.VowelWeight to JamoStateData.ExtraWeight
       ENDIF
  
    "Trailing Jamo Class":
    IF JamoStateData.ExtraWeight
       is greater than JamoSortInfo.TrailingWeight THEN
       SET JamoSortInfo.TrailingWeight to JamoStateData.ExtraWeight
    ENDIF
 ENDCASE
  
 RETURN JamoSortInfo

<10> Section 3.1.5.2.21: The IsJamo algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

<11> Section 3.1.5.2.22: The IsCombiningJamo algorithm is not used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

<12> Section 3.1.5.2.23: The following IsJamoLeading algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

 COMMENT IsJamoLeading 
 COMMENT
 COMMENT  On Entry:  SourceCharacter - Unicode Character to test
 COMMENT
 COMMENT  On Exit:   Result          - true if SourceCharacter is a
 COMMENT                               leading Jamo
 COMMENT
 COMMENT NOTE: Only call this if the character is known to be a Jamo
 COMMENT       syllable. This function only helps distinguish between
 COMMENT       the different types of Jamo, so only call it if
 COMMENT       IsJamo() has returned true.
 COMMENT
  
 PROCEDURE IsJamoLeading(IN SourceCharacter : Unicode Character,
                         OUT Result: boolean)
  
 IF SourceCharacter is less than NLS_CHAR_FIRST_VOWEL_JAMO THEN
      SET Result to true
 ELSE
      SET Result to false
 ENDIF
  
 RETURN Result

<13> Section 3.1.5.2.24: The IsJamoVowel algorithm is not applicable to Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

<14> Section 3.1.5.2.25: The following IsJamoTrailing algorithm is only used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

 COMMENT IsJamoTrailing
 COMMENT
 COMMENT  On Entry:  SourceCharacter - Unicode Character to test
 COMMENT
 COMMENT  On Exit:   Result          - true if this is a trailing Jamo
 COMMENT
 COMMENT NOTE: Only call this if the character is known to be a Jamo
 COMMENT       syllable. This function only helps distinguish between
 COMMENT       the different types of Jamo, so only call it if
 COMMENT       IsJamo() has returned true.
 COMMENT
  
 PROCEDURE IsJamoTrailing(IN SourceCharacter : Unicode Character,
                          OUT Result: boolean)
  
 IF SourceCharacter is greater than
    or equal to NLS_CHAR_FIRST_VOWEL_JAMO THEN
      SET Result to true
 ELSE
      SET Result to false
 ENDIF
  
 RETURN Result

<15> Section 3.1.5.4: The IdnToNameprepUnicode, IdnToAscii, and IdnToUnicode algorithms are not applicable to Windows NT, Windows 2000, Windows XP, or Windows Server 2003. These algorithms follow the IDNA2003 standards for Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 operating system. Otherwise, the algorithms follow the IDNA2008+UTS46 standards.

<16> Section 3.1.5.4.6: This version is not used in Windows NT, Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2.

<17> Section 3.1.5.4.7: This version is used in Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2

Share via

6 Appendix A: Product Behavior

Additional resources