© ISO/IEC N4810
6. Adjacent string literal tokens are concatenated.
7.
White-space characters separating tokens are no longer significant. Each preprocessing token is converted
into a token (5.6). The resulting tokens are syntactically and semantically analyzed and translated as a
translation unit. [Note: The process of analyzing and translating the tokens may occasionally result in
one token being replaced by a sequence of other tokens (13.2). — end note] It is implementation-defined
whether the sources for module units and header units on which the current translation unit has an
interface dependency (10.1, 10.3) are required to be available. [Note: Source files, translation units and
translated translation units need not necessarily be stored as files, nor need there be any one-to-one
correspondence between these entities and any external representation. The description is conceptual
only, and does not specify any particular implementation. — end note]
8.
Translated translation units and instantiation units are combined as follows: [Note: Some or all of
these may be supplied from a library. — end note] Each translated translation unit is examined
to produce a list of required instantiations. [Note: This may include instantiations which have been
explicitly requested (13.8.2). — end note] The definitions of the required templates are located. It
is implementation-defined whether the source of the translation units containing these definitions
is required to be available. [Note: An implementation could encode sufficient information into the
translated translation unit so as to ensure the source is not required here. — end note] All the required
instantiations are performed to produce instantiation units. [Note: These are similar to translated
translation units, but contain no references to uninstantiated templates and no template definitions.
— end note] The program is ill-formed if any instantiation fails.
9.
All external entity references are resolved. Library components are linked to satisfy external references
to entities not defined in the current translation. All such translator output is collected into a program
image which contains information needed for execution in its execution environment.
5.3 Character sets [lex.charset]
1
The basic source character set consists of 96 characters: the space character, the control characters representing
horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:
9
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & |
~
! = , \ " ’
2
The universal-character-name construct provides a way to name other characters.
hex-quad:
hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
universal-character-name:
\u hex-quad
\U hex-quad hex-quad
The character designated by the universal-character-name
\U00NNNNNN
is that character that has
U+NNNNNN
as a code point short identifier; the character designated by the universal-character-name
\uNNNN
is that
character that has
U+NNNN
as a code point short identifier. If a universal-character-name does not correspond
to a code point in ISO/IEC 10646 or if a universal-character-name corresponds to a surrogate code point, the
program is ill-formed. Additionally, if a universal-character-name outside the c-char-sequence, s-char-sequence,
or r-char-sequence of a character or string literal corresponds to a control character or to a character in the
basic source character set, the program is ill-formed.
10
[Note: ISO/IEC 10646 code points are within the
range 0x0-0x10FFFF (inclusive). A surrogate code point is a value in the range 0xD800-0xDFFF (inclusive).
A control character is a character whose code point is in either of the ranges 0x0-0x1F or 0x7F-0x9F (both
inclusive). — end note]
3
The basic execution character set and the basic execution wide-character set shall each contain all the
members of the basic source character set, plus control characters representing alert, backspace, and carriage
return, plus a null character (respectively, null wide character), whose value is 0. For each basic execution
9)
The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC
10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source
character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document
how the basic source characters are represented in source files.
10)
A sequence of characters resembling a universal-character-name in an r-char-sequence (5.13.5) does not form a universal-
character-name.
§ 5.3 10