About Character Sets

Article
11/18/2015

A version of this page is also available for

4/8/2010

A character set is a group of characters from a given language. For example, the ASCII character set is the standard United States-English character set. MLang provides a number of APIs to help you use multiple character sets, including APIs that perform conversions to Unicode and font linking.

The following terms and definitions pertain to character sets, and will help you better understand MLang methods:

Encoding
A mapping of a character to a sequence of bits. All encodings except Unicode are called multibyte encodings.
Charset
The application of an encoding for each character in a character set. In other words, it is a character set in which every character has been assigned an encoding-unique numeric value.
Code page
A unique physical implementation of a charset. In the MLang API, a code page is usually identified by a DWORD. Each bit in the DWORD represents a specific code page. When a bit is set to 1, its corresponding code page is considered a member in the set; if the bit is set to 0, its code page is not considered a member. Thus, the DWORD 0x1e0000 would represent the code pages corresponding to the bits 0x100000, 0x80000, 0x40000, and 0x20000.
Font linking
The process of creating customized fonts that can display text in characters from a variety of different languages. This functionality is especially useful when dealing with Unicode strings, which can contain characters from many character sets at once.

Partager via

About Character Sets

See Also

Other Resources

Ressources supplémentaires