Share via


Support for Unicode

Unicode TasksMultibyte Character Set (MBCS) Tasks

A “wide character” is a two-byte multilingual character code. Any character used in modern computing worldwide, including technical symbols and special publishing characters, can be represented according to the Unicode specification as a wide character. Because each wide character is always represented in a fixed size of 16 bits, using wide characters simplifies programming with international character sets.

A wide-character string is represented as a wchar_t[] array and is pointed to by a wchar_t* pointer. Any ASCII character can be represented as a wide character by prefixing the letter L to the character. For example, L'\0' is the terminating wide (16-bit) NULL character. Similarly, any ASCII string literal can be represented as a wide-character string literal by prefixing the letter L to the ASCII literal (L"Hello").

Generally, wide characters take more space in memory than multibyte characters but are faster to process. In addition, only one locale can be represented at a time in multibyte encoding, whereas all character sets in the world are represented simultaneously by the Unicode representation.

The MFC framework is Unicode-enabled throughout, except for the database classes. (ODBC is not Unicode-enabled.) MFC accomplishes Unicode enabling by using “portable” macros throughout, as shown in the following table:

Portable Data Types in MFC

Non-portable data type(s) Replaced by this macro
char _TCHAR
char*, LPSTR (Win32 data type) LPTSTR
const char*, LPCSTR (Win32 data type) LPCTSTR

Class CString uses _TCHAR as its base and provides constructors and operators for easy conversions. Most string operations for Unicode can be written by using the same logic used for handling the Windows ANSI character set, except that the basic unit of operation is a 16-bit character instead of an 8-bit byte. Unlike working with multibyte character sets (MBCS), you do not have to (and should not) treat a Unicode character as if it were two distinct bytes.

See Also   Support for Using wmain