3.1.5.4.6 IDNA2008+UTS46 NormalizeForIdna
NormalizeForIdna prepares the input string for encoding, using the mapping/normalization rules provided by IDNA2008+UTS46 (IDNA2008 with [TR46] applied).<16>
-
COMMENT NormalizeForIdna2008 COMMENT On Entry: SourceString – Unicode String to prepare for IDNA COMMENT Flags - Bit flags to control behavior COMMENT of IDN validation COMMENT COMMENT IDN_ALLOW_UNASSIGNED: During validation, allow unicode COMMENT code points that are not assigned. COMMENT COMMENT On Exit: Punycode - String containing the Punycode ASCII range COMMENT form of the input PROCEDURE NormalizeForIdna2008 (IN SourceString : Unicode String, IN Flags: 32 bit integer, OUT OutputString : Unicode String) COMMENT Mapping is done per the tables published by Unicode by following COMMENT RFC5892 as modified by UTS#46 section 2 "Unicode IDNA Compatibility Processing" COMMENT Appendix A of RFC5892 is NOT applied. COMMENT Effectively this mapping is merely applying the latest IdnaMappingTable.txt COMMENT mappings, including the "deviation" mappings from http://www.unicode.org/Public/idna/ COMMENT COMMENT Apply UTS#46 Section 4 steps 1 & 2 to the string with the "Transitional Processing" COMMENT option for the four "deviation" characters. Steps 3 and 4 are done by the caller. COMMENT http://www.unicode.org/reports/tr46/#Processing OPEN mapping FILE "http://www.unicode.org/Public/idna/6.3.0/IdnaMappingTable.txt" SET OutputString TO "" FOREACH character IN SourceString FIND RECORD data IN mapping WHERE LINE CONTAINS character IF (data IS EMPTY) THEN IF (IDN_ALLOW_UNASSIGNED bit IS NOT ON in Flags) THEN RETURN ERROR ELSE APPEND character TO OutputString ENDIF ELSE SWITCH (data FIELD statusValue) CASE "valid" CASE "disallowed_STD3_valid" BREAK CASE "ignored" SET character TO "" BREAK CASE "mapped" CASE "disallowed_STD3_valid" CASE "deviation" SET character TO data FIELD mappingValue BREAK ENDSWITCH APPEND character TO OuptutString ENDIF ENDFOREACH RETURN OutputString