Поделиться через


C++ String Literals

A string literal represents a sequence of characters that together form a null-terminated string. The characters must be enclosed between double quotation marks. There are the following kinds of string literals:

  • Narrow string literals, represented as "xxx".

  • Wide string literals, represented as L"xxx".

  • Raw string literals, represented as R"ddd(xxx) ddd", where ddd is a delimiter. Raw string literals may be either narrow (represented with R) or wide (represented with LR).

A narrow string literal is a null-terminated array of constant char that contains any graphic character except the double quotation mark ("), backslash (\), or newline character. A narrow string literal may contain the escape sequences listed in C++ Character Literals.

const char *narrow = "abcd";

// represents the string: yes\no
const char *escaped = "yes\\no";

A wide string literal is a null-terminated array of constant wchar_t that contains any graphic character except the double quotation mark ("), backslash (\), or newline character. A wide string literal may contain the escape sequences listed in C++ Character Literals.

const wchar_t* wide = L"zyxw";
const wchar_t* newline = L"hello\ngoodbye";

A raw string literal is a null-terminated array—of either constant char or constant wchar_t—that contains any graphic character, including the double quotation mark ("), backslash (\), or newline character. Raw string literals are often used in regular expressions that use character classes, and in HTML strings and XML strings. For examples, see the following article: Bjarne Stroustrup's FAQ on C++11.

// represents the string: An unescaped \ character
const char* raw_narrow = R"(An unescaped \ character)";

// represents the string: An unescaped " character
const wchar_t* raw_wide = LR"(An unescaped " character)";

A delimiter is a user-defined sequence of up to 16 characters that immediately precedes the opening parenthesis of a raw string literal and immediately follows its closing parenthesis. You can use a delimiter to disambiguate strings that contain both double quotation marks and parentheses. This causes a compiler error:

// meant to represent the string: )”
const char* bad_parens = R"()")";

But a delimiter resolves it:

const char* good_parens = R"xyz()")xyz";

You can construct a raw string literal in which there is a newline (not the escaped character) in the source:

// represents the string: hello
//goodbye
const wchar_t* newline = LR"(hello
goodbye)";

Size of String Literals

The size (in bytes) of a narrow string literal is the number of characters plus 1 (for the terminating null character); the size (in bytes) of a wide string literal is the number of characters times 2 plus 2 (for the terminating null character). This shows the size of a wide string literal:

const wchar_t* str = L"Hello!";
const size_t byteSize = (wcslen(str) + 1) * sizeof(wchar_t);

Notice that strlen() and wcslen() do not include the size of the terminating null character.

The maximum length of a string literal is 65535 bytes. This limit applies to both narrow string literals and wide string literals.

Modifying String Literals

Because string literals are constants, trying to modify them—for example, str[2] = 'A'—causes a compiler error.

Microsoft Specific

In Visual C++ you can use a string literal to initialize a pointer to non-const char or wchar_t. This is allowed in C code, but is deprecated in C++98 and removed in C++11. An attempt to modify the string causes an access violation, as in this example:

wchar_t* str = L"hello";
str[2] = L'a'; // run-time error: access violation

You can cause the compiler to emit an error when a string literal is converted to a non_const character when you set the /Zc:strictStrings (Disable string literal type conversion) compiler option. It is a good practice to use the auto keyword to declare string literal-initialized pointers, because it resolves to the correct (const) type. For example, this example catches an attempt to write to a string literal at compile time:

auto str = L"hello";
str[2] = L'a'; // Compiler error: you cannot assign to a variable that is const

In some cases, identical string literals may be pooled to save space in the executable file. In string-literal pooling, the compiler causes all references to a particular string literal to point to the same location in memory, instead of having each reference point to a separate instance of the string literal. To enable string pooling, use the /GF compiler option.

End Microsoft Specific

Concatenating adjacent string literals

Adjacent string literals are concatenated. This declaration:

char str[] = "12" "34";

is identical to this declaration:

char atr[] = "1234";

and to this declaration:

char atr[] =  "12\
34";

Using embedded hexadecimal escape codes to specify string constants can cause unexpected results. The following example seeks to create a string literal that contains the ASCII 5 character, followed by the characters f, i, v, and e:

"\x05five"

The actual result is a hexadecimal 5F, which is the ASCII code for an underscore, followed by the characters i, v, and e. To get the correct result, you can use one of these:

"\005five"     // Use octal constant.
"\x05" "five"  // Use string splicing.

String Literals with Unicode Characters

Surrogate pairs and supplementary characters (as in UTF-16) are represented with a \U prefix. These are wide strings rather than single characters, and are represented with double quotation marks rather than single quotation marks. The U, u, and u8 prefixes are not supported.

const wchar_t* str1 = L"\U0002008A";
const wchar_t* str2 = L"\UD869DED6";
const wchar_t* str3 = L"\Udc00c800";

For more information about Unicode, see Unicode). For more information about surrogate pairs, see Surrogate Pairs and Supplementary Characters.

See Also

Reference

C++ Literals