Comparing and Sorting Data for a Specific Culture
Alphabetical order and conventions for sequencing items vary from culture to culture. For example, sort order can be case-sensitive or case-insensitive. It can be phonetically based or based on the appearance of the character. In East Asian languages, sorts are ordered by the stroke and radical of ideographs. Sorts can also vary depending on the fundamental order the language and culture uses for the alphabet. For example, the Swedish language has an Æ character that it sorts after Z in the alphabet. The German language also has this character, but sorts it like ae, after A in the alphabet. A world-ready application must be able to compare and sort data on a per-culture basis in order to support culture-specific and language-specific sorting conventions.
Note |
---|
There are scenarios where culture-sensitive behavior is not desirable. For more information about when and how to perform culture-insensitive operations, see Culture-Insensitive String Operations. |
Comparing Strings
The CompareInfo class provides a set of methods you can use to perform culture-sensitive string comparisons. The CultureInfo class has a CultureInfo.CompareInfo property that is an instance of this class. This property defines how to compare and sort strings for a specific culture. The String.Compare method uses the information in the CultureInfo.CompareInfo property to compare strings. The String.Compare method returns a negative integer if string1 is less than string2, zero if string1 and string2 are equal, and a positive integer if string1 is greater than string2.
The following code example illustrates how two strings can be evaluated differently by the String.Compare method depending upon the culture used to perform the comparison. First, the CurrentCulture is set to Danish in Denmark and the strings "Apple" and "Æble" are compared. The Danish language treats the character Æ as an individual letter, sorting it after Z in the alphabet. Therefore, the string "Æble" is greater than "Apple" for the Danish culture. Next, the CurrentCulture is set to English in the U.S. and the strings "Apple" and "Æble" are compared again. This time, the string "Æble" is determined to be less than "Apple". The English language treats the character Æ as a special symbol, sorting it before the letter A in the alphabet.
Imports System
Imports System.Globalization
Imports System.Threading
Imports Microsoft.VisualBasic
Public Class TestClass
Public Shared Sub Main()
Dim str1 As String = "Apple"
Dim str2 As String = "Æble"
' Sets the CurrentCulture to Danish in Denmark.
Thread.CurrentThread.CurrentCulture = New CultureInfo("da-DK")
Dim result1 As Integer = [String].Compare(str1, str2)
Console.WriteLine(ControlChars.Newline + "When the CurrentCulture _
is ""da-DK""," + ControlChars.Newline + " the result of _
comparing_{0} with {1} is: {2}", str1, str2, result1)
' Sets the CurrentCulture to English in the U.S.
Thread.CurrentThread.CurrentCulture = New CultureInfo("en-US")
Dim result2 As Integer = [String].Compare(str1, str2)
Console.WriteLine(ControlChars.Newline + "When the _
CurrentCulture is""en-US""," + ControlChars.Newline + " _
the result of comparing {0} with {1} is: {2}", str1, _
str2,result2)
End Sub
End Class
using System;
using System.Globalization;
using System.Threading;
public class CompareStringSample
{
public static void Main()
{
string str1 = "Apple";
string str2 = "Æble";
// Sets the CurrentCulture to Danish in Denmark.
Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
// Compares the two strings.
int result1 = String.Compare(str1, str2);
Console.WriteLine("\nWhen the CurrentCulture is \"da-DK\",\nthe
result of comparing {0} with {1} is: {2}",str1, str2,
result1);
// Sets the CurrentCulture to English in the U.S.
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
// Compares the two strings.
int result2 = String.Compare(str1, str2);
Console.WriteLine("\nWhen the CurrentCulture is \"en-US\",\nthe
result of comparing {0} with {1} is: {2}",str1, str2,
result2);
}
}
If you execute this code, the output appears as follows:
When the CurrentCulture is "da-DK",
the result of comparing Apple with Æble is: -1
When the CurrentCulture is "en-US",
the result of comparing Apple with Æble is: 1
For more information on comparing strings, see String Class and Comparing Strings.
Using Alternate Sort Orders
Some cultures support more than one sort order. For example, the culture "zh-CN" (Chinese in China) supports a sort by pronunciation (default) and a sort by stroke count. When you create a CultureInfo using a culture name, such as "zh-CN", the default sort order is used. To specify the alternate sort order, create a CultureInfo object using the LCID for the alternate sort order. Then, obtain a CompareInfo object (to use in string comparisons) from CultureInfo.CompareInfo. Alternately, you can create a CompareInfo object directly by using the CompareInfo.GetCompareInfo Method (Int32), specifying the LCID for the alternate sort order.
The following table lists the cultures that support alternate sort orders and the LCIDs for the default and alternate sort orders.
Culture name | Language-country/region | Default sort name and LCID | Alternate sort name and LCID |
---|---|---|---|
es-ES |
Spanish - Spain |
International: 0x00000C0A |
Traditional: 0x0000040A |
zh-TW |
Chinese - Taiwan |
Stroke Count: 0x00000404 |
Bopomofo: 0x00030404 |
zh-CN |
Chinese - China |
Pronunciation: 0x00000804 |
Stroke Count: 0x00020804 |
zh-HK |
Chinese - Hong Kong SAR |
Stroke Count: 0x00000c04 |
Stroke Count: 0x00020c04 |
zh-SG |
Chinese - Singapore |
Pronunciation: 0x00001004 |
Stroke Count: 0x00021004 |
zh-MO |
Chinese - Macao SAR |
Pronunciation: 0x00001404 |
Stroke Count: 0x00021404 |
ja-JP |
Japanese - Japan |
Default: 0x00000411 |
Unicode: 0x00010411 |
ko-KR |
Korean - Korea |
Default: 0x00000412 |
Korean Xwansung - Unicode: 0x00010412 |
de-DE |
German - Germany |
Dictionary: 0x00000407 |
Phone Book Sort DIN: 0x00010407 |
hu-HU |
Hungarian - Hungary |
Default: 0x0000040e |
Technical Sort: 0x0001040e |
ka-GE |
Georgian - Georgia |
Traditional: 0x00000437 |
Modern Sort: 0x00010437 |
Searching Strings
You can use the overloaded CompareInfo.IndexOf method to return the zero-based index of a character or substring within a specified string. The method returns a negative integer if the character or substring is not found in the specified string. When searching for a specified character using CompareInfo.IndexOf, be aware that the method overloads that accept a CompareOptions parameter perform the comparison differently than the method overloads that do not accept a CompareOptions parameter. The CompareInfo.IndexOf overloads that search for a char (Char in Visual Basic) and do not take a parameter of type CompareOptions perform a culture-sensitive search. This means that if char is a Unicode value representing a precomposed character, such as the ligature 'Æ' (\u00C6), it might be considered equivalent to any occurrence of its components in the correct sequence, such as "AE" (\u0041\u0045), depending on the culture. To perform an ordinal (culture-insensitive) search, where a char is considered equivalent to another char only if the Unicode values are the same, use one of the CompareInfo.IndexOf overloads that take a CompareOptions parameter. Set the CompareOptions parameter to the CompareOptions.Ordinal value.
You can also use overloads of the String.IndexOf method that search for a char to perform an ordinal search. Note that the overloads of the String.IndexOf method that search for a string, perform a culture-sensitive search.
The following code example illustrates the difference in the results returned by the CompareInfo.IndexOf(string, char) method depending on culture. A CultureInfo is created for "da-DK" (Danish in Denmark). Next, overloads of the CompareInfo.IndexOf method are used to search for the character 'Æ' in the strings "Æble" and "aeble." Note that for the "da-DK" culture, the CompareInfo.IndexOf method that takes a CompareOptions.Ordinal parameter and the CompareInfo.Index method that does not take a CompareOptions.Ordinal parameter return the same result. The character 'Æ' is only considered equivalent to the Unicode code value \u00E6.
Imports System
Imports System.Globalization
Imports System.Threading
Imports Microsoft.VisualBasic
Public Class CompareClass
Public Shared Sub Main()
Dim str1 As String = "Æble"
Dim str2 As String = "aeble"
Dim find As Char = "Æ"
' Creates a CultureInfo for Danish in Denmark.
Dim ci As New CultureInfo("da-DK")
Dim result1 As Integer = ci.CompareInfo.IndexOf(str1, find)
Dim result2 As Integer = ci.CompareInfo.IndexOf(str2, find)
Dim result3 As Integer = ci.CompareInfo.IndexOf(str1, find, _
CompareOptions.Ordinal)
Dim result4 As Integer = ci.CompareInfo.IndexOf(str2, find, _
CompareOptions.Ordinal)
Console.WriteLine(ControlChars.Newline + "CultureInfo is set to _
{0}", ci.DisplayName)
Console.WriteLine(ControlChars.Newline + "Using _
CompareInfo.IndexOf(string, char) method" + _
ControlChars.Newline + " the result of searching for {0} in the _
string {1} is: {2}", find, str1, result1)
Console.WriteLine(ControlChars.Newline + "Using _
CompareInfo.IndexOf(string, char) method" + _
ControlChars.Newline + " the result of searching for {0} in the _
string {1} is: {2}", find, str2, result2)
Console.WriteLine(ControlChars.Newline + "Using _
CompareInfo.IndexOf(string, char, CompareOptions) method" + _
ControlChars.Newline + " the result of searching for {0} in the _
string {1} is: {2}", find, str1, result3)
Console.WriteLine(ControlChars.Newline + "Using _
CompareInfo.IndexOf(string, char, CompareOptions) method" + _
ControlChars.Newline + " the result of searching for {0} in the _
string {1} is: {2}", find, str2, result4)
End Sub
End Class
using System;
using System.Globalization;
using System.Threading;
public class CompareClass
{
public static void Main()
{
string str1 = "Æble";
string str2 = "aeble";
char find = 'Æ';
// Creates a CultureInfo for Danish in Denmark.
CultureInfo ci= new CultureInfo("da-DK");
int result1 = ci.CompareInfo.IndexOf(str1, find);
int result2 = ci.CompareInfo.IndexOf(str2, find);
int result3 = ci.CompareInfo.IndexOf(str1, find,
CompareOptions.Ordinal);
int result4 = ci.CompareInfo.IndexOf(str2, find,
CompareOptions.Ordinal);
Console.WriteLine("\nCultureInfo is set to {0} ", ci.DisplayName);
Console.WriteLine("\nUsing CompareInfo.IndexOf(string, char)
method\nthe result of searching for {0} in the string {1} is:
{2}", find, str1, result1);
Console.WriteLine("\nUsing CompareInfo.IndexOf(string, char)
method\nthe result of searching for {0} in the string {1} is:
{2}", find, str2, result2);
Console.WriteLine("\nUsing CompareInfo.IndexOf(string, char,
CompareOptions) method\nthe result of searching for {0} in the
string {1} is: {2}", find, str1, result3);
Console.WriteLine("\nUsing CompareInfo.IndexOf(string, char,
CompareOptions) method\nthe result of searching for {0} in the
string {1} is: {2}", find, str2, result4);
}
}
This code produces the following output:
CultureInfo is set to Danish (Denmark)
Using CompareInfo.IndexOf(string, char) method
the result of searching for Æ in the string Æble is: 0
Using CompareInfo.IndexOf(string, char) method
the result of searching for Æ in the string aeble is: -1
Using CompareInfo.IndexOf(string, char, CompareOptions) method
the result of searching for Æ in the string Æble is: 0
Using CompareInfo.IndexOf(string, char, CompareOptions) method
the result of searching for Æ in the string aeble is: -1
If you replace CultureInfo ci = new CultureInfo ("da-DK");
with CultureInfo ci = new CultureInfo ("en-US")
, the CompareInfo.Index method with the CompareOptions.Ordinal parameter and the CompareInfo.Index method without the CompareOptions.Ordinal parameter return different results. The culture-sensitive comparison performed by CompareInfo.IndexOf(string, char) evaluates the character 'Æ' as equivalent to its components "ae". The ordinal (culture-insensitive) comparison performed by the CompareInfo.IndexOf(string, char, CompareOptions.Ordinal) method does not return character 'Æ' equivalent to "ae" because their Unicode code values do not match.
When you recompile and execute the code for the "en-US" culture, the following output is produced:
The CurrentCulture property is set to English (United States)
Using CompareInfo.IndexOf(string, char) method
the result of searching for Æ in the string Æble is: 0
Using CompareInfo.IndexOf(string, char) method
the result of searching for Æ in the string aeble is: 0
Using CompareInfo.IndexOf(string, char, CompareOptions) method
the result of searching for Æ in the string Æble is: 0
Using CompareInfo.IndexOf(string, char, CompareOptions) method
the result of searching for Æ in the string aeble is: -1
Sorting Strings
The Array class provides an overloaded Array.Sort method that allows you to sort arrays based on the CultureInfo.CurrentCulture property. In the following example, an array of three strings is created. First, the CurrentCulture is set to "en-US" and the Array.Sort method is called. The resulting sort order is based on sorting conventions for the "en-US" culture. Next, the CurrentCulture is set to "da-DK" and the Array.Sort method is called again. Notice how the resulting sort order differs from the "en-US" results because the sorting conventions for the "da-DK" culture are used.
Imports System
Imports System.Threading
Imports System.IO
Imports System.Globalization
Imports Microsoft.VisualBasic
Public Class TextToFile
Public Shared Sub Main()
Dim str1 As [String] = "Apple"
Dim str2 As [String] = "Æble"
Dim str3 As [String] = "Zebra"
' Creates and initializes a new Array to store
' these date/time objects.
Dim stringArray As Array = Array.CreateInstance(GetType([String]), _
3)
stringArray.SetValue(str1, 0)
stringArray.SetValue(str2, 1)
stringArray.SetValue(str3, 2)
' Displays the values of the Array.
Console.WriteLine(ControlChars.Newline + "The Array initially _
contains the following strings:")
PrintIndexAndValues(stringArray)
' Sets the CurrentCulture to "en-US".
Thread.CurrentThread.CurrentCulture = New CultureInfo("en-US")
' Sorts the values of the Array.
Array.Sort(stringArray)
' Displays the values of the Array.
Console.WriteLine(ControlChars.Newline + "After sorting for the _
culture ""en-US"":")
PrintIndexAndValues(stringArray)
' Sets the CurrentCulture to "da-DK".
Thread.CurrentThread.CurrentCulture = New CultureInfo("da-DK")
' Sort the values of the Array.
Array.Sort(stringArray)
' Displays the values of the Array.
Console.WriteLine(ControlChars.Newline + "After sorting for the _
culture ""da-DK"":")
PrintIndexAndValues(stringArray)
End Sub
Public Shared Sub PrintIndexAndValues(myArray As Array)
Dim i As Integer
For i = myArray.GetLowerBound(0) To myArray.GetUpperBound(0)
Console.WriteLine(ControlChars.Tab + "[{0}]:" + _
ControlChars.Tab + "{1}", i, myArray.GetValue(i))
Next i
End Sub
End Class
using System;
using System.Threading;
using System.Globalization;
public class ArraySort
{
public static void Main(String[] args)
{
String str1 = "Apple";
String str2 = "Æble";
String str3 = "Zebra";
// Creates and initializes a new Array to store the strings.
Array stringArray = Array.CreateInstance( typeof(String), 3 );
stringArray.SetValue(str1, 0 );
stringArray.SetValue(str2, 1 );
stringArray.SetValue(str3, 2 );
// Displays the values of the Array.
Console.WriteLine( "\nThe Array initially contains the following
strings:" );
PrintIndexAndValues(stringArray);
// Sets the CurrentCulture to "en-US".
Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US");
// Sort the values of the Array.
Array.Sort(stringArray);
// Displays the values of the Array.
Console.WriteLine( "\nAfter sorting for the culture \"en-US\":" );
PrintIndexAndValues(stringArray);
// Sets the CurrentCulture to "da-DK".
Thread.CurrentThread.CurrentCulture = new CultureInfo("da-DK");
// Sort the values of the Array.
Array.Sort(stringArray);
// Displays the values of the Array.
Console.WriteLine( "\nAfter sorting for the culture \"da-DK\":" );
PrintIndexAndValues(stringArray);
}
public static void PrintIndexAndValues(Array myArray)
{
for ( int i = myArray.GetLowerBound(0); i <=
myArray.GetUpperBound(0); i++ )
Console.WriteLine( "\t[{0}]:\t{1}", i, myArray.GetValue( i ) );
}
}
This code produces the following output:
The Array initially contains the following strings:
[0]: Apple
[1]: Æble
[2]: Zebra
After sorting for the culture "en-US":
[0]: Æble
[1]: Apple
[2]: Zebra
After sorting for the culture "da-DK":
[0]: Apple
[1]: Zebra
[2]: Æble
Using Sort Keys
Sort keys are used to support culturally sensitive sorts. Based on the Unicode Standard, each character in a string is given several categories of sort weights, including alphabetic, case, and diacritic weights. A sort key serves as the repository of these weights for a particular string. For example, a sort key might contain a string of alphabetic weights, followed by a string of case weights, and so on. For additional information on sort key concepts, see the Unicode Standard at www.unicode.org.
In the .NET Framework, the SortKey class maps strings to their sort keys and vice versa. You can use the CompareInfo.GetSortKey method to create a sort key for a string that you specify. The resulting sort key for a specified string is a sequence of bytes that can differ depending upon the CurrentCulture and the CompareOptions that you specify. For example, if you specify IgnoreCase when creating a sort key, a string comparison operation using the sort key will ignore case.
After you create a sort key for a string, you can pass it as a parameter to methods provided by the SortKey class. The SortKey.Compare method allows you to compare sort keys. Because a SortKey.Compare performs a simple byte-by-byte comparison, it is much faster than using String.Compare. In applications that are sorting-intensive, you can improve performance by generating and storing sort keys for all the strings that the application uses. When a sort or comparison operation is required, you can use the sort keys rather than the strings.
The following code example creates sort keys for two strings when the CurrentCulture is set to "da-DK". It compares the two strings using the SortKey.Compare method and displays the results. The SortKey.Compare method returns a negative integer if string1 is less than string2, zero (0) if string1 and string2 are equal, and a positive integer if string1 is greater than string2. Next, the CurrentCulture is set to "en-US" culture and sort keys are created for the same strings. The sort keys for the strings are compared and the results are displayed. Notice that the sort results differ based upon the CurrentCulture. Although the results of the following code example are identical to the results of comparing these strings in the Comparing Strings example earlier in this topic, the SortKey.Compare method is faster than the String.Compare method.
Imports System
Imports System.Threading
Imports System.Globalization
Imports Microsoft.VisualBasic
Public Class SortKeySample
Public Shared Sub Main()
Dim str1 As [String] = "Apple"
Dim str2 As [String] = "Æble"
' Sets the CurrentCulture to "da-DK".
Dim dk As New CultureInfo("da-DK")
Thread.CurrentThread.CurrentCulture = dk
' Creates a culturally sensitive sort key for str1.
Dim sc1 As SortKey = dk.CompareInfo.GetSortKey(str1)
' Create a culturally sensitive sort key for str2.
Dim sc2 As SortKey = dk.CompareInfo.GetSortKey(str2)
' Compares the two sort keys and display the results.
Dim result1 As Integer = SortKey.Compare(sc1, sc2)
Console.WriteLine(ControlChars.Newline + "When the CurrentCulture _
is ""da-DK""," + ControlChars.Newline + " the result of _
comparing {0} with {1} is: {2}", str1, str2, result1)
' Sets the CurrentCulture to "en-US".
Dim enus As New CultureInfo("en-US")
Thread.CurrentThread.CurrentCulture = enus
' Creates a culturally sensitive sort key for str1.
Dim sc3 As SortKey = enus.CompareInfo.GetSortKey(str1)
' Create a culturally sensitive sort key for str1.
Dim sc4 As SortKey = enus.CompareInfo.GetSortKey(str2)
' Compares the two sort keys and display the results.
Dim result2 As Integer = SortKey.Compare(sc3, sc4)
Console.WriteLine(ControlChars.Newline + "When the CurrentCulture _
is ""en-US""," + ControlChars.Newline + " the result of _
comparing {0} with {1} is: {2}", str1, str2, result2)
End Sub
End Class
using System;
using System.Threading;
using System.Globalization;
public class SortKeySample
{
public static void Main(String[] args)
{
String str1 = "Apple";
String str2 = "Æble";
// Sets the CurrentCulture to "da-DK".
CultureInfo dk = new CultureInfo("da-DK");
Thread.CurrentThread.CurrentCulture = dk;
// Creates a culturally sensitive sort key for str1.
SortKey sc1 = dk.CompareInfo.GetSortKey(str1);
// Create a culturally sensitive sort key for str2.
SortKey sc2 = dk.CompareInfo.GetSortKey(str2);
// Compares the two sort keys and display the results.
int result1 = SortKey.Compare(sc1, sc2);
Console.WriteLine("\nWhen the CurrentCulture is \"da-DK\",\nthe
result of comparing {0} with {1} is: {2}", str1, str2,
result1);
// Sets the CurrentCulture to "en-US".
CultureInfo enus = new CultureInfo("en-US");
Thread.CurrentThread.CurrentCulture = enus ;
// Creates a culturally sensitive sort key for str1.
SortKey sc3 = enus.CompareInfo.GetSortKey(str1);
// Create a culturally sensitive sort key for str1.
SortKey sc4 = enus.CompareInfo.GetSortKey(str2);
// Compares the two sort keys and display the results.
int result2 = SortKey.Compare(sc3, sc4);
Console.WriteLine("\nWhen the CurrentCulture is \"en-US\",\nthe
result of comparing {0} with {1} is: {2}", str1, str2,
result2);
}
}
This code produces the following output:
When the CurrentCulture is "da-DK",
the result of comparing Apple with Æble is: -1
When the CurrentCulture is "en-US",
the result of comparing Apple with Æble is: 1
Normalization
You can normalize strings to either uppercase or lowercase before sorting. Rules for string sorting and casing are language-specific. For example, even within Latin-script-based languages, there are different composition and sorting rules. There are only a few languages (including English) where the sort order matches the order of the code points (for example, A [65] comes before B [66]).
You should not rely on code points to perform accurate sorting and string comparisons. In addition, the .NET Framework does not enforce or guarantee a specific form of normalization. You are responsible for performing the appropriate normalization in the applications that you develop.
See Also
Reference
CompareInfo Class
SortKey Class
Concepts
Culture-Insensitive String Operations