C++17标准草案：探讨语言发展与核心特性

需积分: 46 139 浏览量更新于2024-07-18 收藏 5.66MB PDF 举报

本文档详细介绍了C++语言的最新标准，特别关注于文档版本N4741，发布日期为2018年4月2日，该版本修订自N4727，并由Richard Smith来自Google Inc. 编制。这份标准草案旨在为开发人员提供对C++发展方向的深入理解，尽管它尚处于早期阶段，存在不完整和格式错误，但它仍然是C++编程的重要参考资料。标准内容涵盖了广泛的范畴，从范围定义、规范引用，到具体的语言特性。首先，第1章概述了标准的适用范围，明确了其目标和预期的应用环境。接下来的章节深入探讨了C++的关键概念： 4.1 实施合规性：这部分强调了编译器开发者在实现新标准时应遵循的要求，确保代码的一致性和可移植性。 4.2 文档结构：描述了标准文档本身的组织架构，便于用户查找所需信息。 4.3 语法和符号表示：涉及如何编写和解析C++源代码，包括字符集选择、预处理器指令、替代符号等。 5.1 语义规则：详细规定了标识符、关键字、运算符和标点符号的处理方式，这些都是程序理解和编译的基础。 6.1 声明与定义：这部分阐述了变量、函数、类等的声明和定义规则，以及它们在程序中的作用。 6.2 一定义规则：确保每个实体在整个程序中只有一处定义，避免重复和链接问题。 6.3 变量作用域：解释了变量的作用域是如何确定的，这对于代码的封装和复用至关重要。 6.4 名称查找：探讨了变量、函数和其他标识符在程序中如何被查找和解析的过程。 6.5 程序与链接：涵盖了链接阶段的规则，如静态和动态链接，以及如何处理不同编译单元之间的关系。 6.6 内存和对象：讨论内存管理、对象生命周期以及构造和析构函数的执行机制。 6.7 类型系统：包括基本类型、复合类型、模板和泛型编程的处理方式。 6.8 程序执行：涉及程序的控制流程、异常处理、表达式求值等方面。 7. 标准转换：这部分描述了不同类型之间的转换规则，如何在不同数据类型间进行安全且正确的转换。由于这是早期的草案，读者需要注意其中可能存在不成熟或错误的部分。然而，对于了解C++最新发展趋势和技术规范的开发者来说，这无疑是一份重要的参考资源。通过深入学习和理解这些内容，开发人员可以更好地优化代码，确保其符合标准，提升程序的可靠性和效率。

4.2 Structure of this document [intro.structure]

Clause 5 through Clause 19 describe the C

programming language. That description includes detailed

syntactic speciﬁcations in a form described in 4.3. For convenience, Annex A repeats all such syntactic

speciﬁcations.

Clause 21 through Clause 33 and Annex D (the library clauses) describe the C

standard library. That

description includes detailed descriptions of the entities and macros that constitute the library, in a form

described in Clause 20.

Annex B recommends lower bounds on the capacity of conforming implementations.

Annex C summarizes the evolution of C

since its ﬁrst published description, and explains in detail the

diﬀerences between C

and C. Certain features of C

exist solely for compatibility purposes; Annex D

describes those features.

Throughout this document, each example is introduced by “[ Example: ” and terminated by “ — end example ]”.

Each note is introduced by “[ Note: ” and terminated by “ — end note ]”. Examples and notes may be nested.

4.3 Syntax notation [syntax]

In the syntax notation used in this document, syntactic categories are indicated by italic type, and literal

words and characters in

constant width

type. Alternatives are listed on separate lines except in a few cases

where a long set of alternatives is marked by the phrase “one of”. If the text of an alternative is too long to

ﬁt on a line, the text is continued on subsequent lines indented from the ﬁrst one. An optional terminal or

non-terminal symbol is indicated by the subscript “

opt

”, so

{ expression

opt

}

indicates an optional expression enclosed in braces.

Names for syntactic categories have generally been chosen according to the following rules:

—

(2.1)

X-name is a use of an identiﬁer in a context that determines its meaning (e.g., class-name, typedef-name).

—

(2.2)

X-id is an identiﬁer with no context-dependent meaning (e.g., qualiﬁed-id).

—

(2.3)

X-seq is one or more X ’s without intervening delimiters (e.g., declaration-seq is a sequence of declara-

tions).

—

(2.4)

X-list is one or more X’s separated by intervening commas (e.g., identiﬁer-list is a sequence of identiﬁers

separated by commas).

4.4 Acknowledgments [intro.ack]

The C

programming language as described in this document is based on the language as described in

Chapter R (Reference Manual) of Stroustrup: The C

Programming Language (second edition, Addison-

Wesley Publishing Company, ISBN 0-201-53992-6, copyright

1991 AT&T). That, in turn, is based on the C

programming language as described in Appendix A of Kernighan and Ritchie: The C Programming Language

Portions of the library Clauses of this document are based on work by P.J. Plauger, which was published as

The Draft Standard C

POSIX® is a registered trademark of the Institute of Electrical and Electronic Engineers, Inc.

ECMAScript® is a registered trademark of Ecma International.

All rights in these originals are reserved.

§ 4.4 8

5 Lexical conventions [lex]

5.1 Separate translation [lex.separate]

The text of the program is kept in units called source ﬁles in this document. A source ﬁle together with

all the headers (20.5.1.2) and source ﬁles included (19.2) via the preprocessing directive #include, less any

source lines skipped by any of the conditional inclusion (19.1) preprocessing directives, is called a translation

unit. [ Note: A C

program need not all be translated at the same time. — end note ]

[ Note: Previously translated translation units and instantiation units can be preserved individually or in

libraries. The separate translation units of a program communicate (6.5) by (for example) calls to functions

whose identiﬁers have external linkage, manipulation of objects whose identiﬁers have external linkage, or

manipulation of data ﬁles. Translation units can be separately translated and then later linked to produce an

executable program (6.5). — end note ]

5.2 Phases of translation [lex.phases]

The precedence among the syntax rules of translation is speciﬁed by the following phases.

Physical source ﬁle characters are mapped, in an implementation-deﬁned manner, to the basic source

character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical

source ﬁle characters accepted is implementation-deﬁned. Any source ﬁle character not in the basic

source character set (5.3) is replaced by the universal-character-name that designates that character.

An implementation may use any internal encoding, so long as an actual extended character encountered

in the source ﬁle, and the same extended character expressed in the source ﬁle as a universal-character-

name (e.g., using the

\uXXXX

notation), are handled equivalently except where this replacement is

reverted (5.4) in a raw string literal.

Each instance of a backslash character (\) immediately followed by a new-line character is deleted,

splicing physical source lines to form logical source lines. Only the last backslash on any physical source

line shall be eligible for being part of such a splice. Except for splices reverted in a raw string literal,

if a splice results in a character sequence that matches the syntax of a universal-character-name, the

behavior is undeﬁned. A source ﬁle that is not empty and that does not end in a new-line character,

or that ends in a new-line character immediately preceded by a backslash character before any such

splicing takes place, shall be processed as if an additional new-line character were appended to the ﬁle.

The source ﬁle is decomposed into preprocessing tokens (5.4) and sequences of white-space characters

(including comments). A source ﬁle shall not end in a partial preprocessing token or in a partial

comment.

Each comment is replaced by one space character. New-line characters are retained.

Whether each nonempty sequence of white-space characters other than new-line is retained or replaced

by one space character is unspeciﬁed. The process of dividing a source ﬁle’s characters into preprocessing

tokens is context-dependent. [ Example: See the handling of

within a

#include

preprocessing directive.

— end example ]

Preprocessing directives are executed, macro invocations are expanded, and

_Pragma

unary operator

expressions are executed. If a character sequence that matches the syntax of a universal-character-name

is produced by token concatenation (19.3.3), the behavior is undeﬁned. A

#include

preprocessing

directive causes the named header or source ﬁle to be processed from phase 1 through phase 4, recursively.

All preprocessing directives are then deleted.

Each source character set member in a character literal or a string literal, as well as each escape

sequence and universal-character-name in a character literal or a non-raw string literal, is converted to

the corresponding member of the execution character set (5.13.3, 5.13.5); if there is no corresponding

member, it is converted to an implementation-deﬁned member other than the null (wide) character.

Implementations must behave as if these separate phases occur, although in practice diﬀerent phases might be folded

together.

A partial preprocessing token would arise from a source ﬁle ending in the ﬁrst portion of a multi-character token that

requires a terminating sequence of characters, such as a header-name that is missing the closing

. A partial comment

would arise from a source ﬁle ending with an unclosed /* comment.

9) An implementation need not convert all non-corresponding source characters to the same execution character.

§ 5.2 9

6. Adjacent string literal tokens are concatenated.

White-space characters separating tokens are no longer signiﬁcant. Each preprocessing token is converted

into a token (5.6). The resulting tokens are syntactically and semantically analyzed and translated

as a translation unit. [ Note: The process of analyzing and translating the tokens may occasionally

result in one token being replaced by a sequence of other tokens (17.2). — end note ] [ Note: Source

ﬁles, translation units and translated translation units need not necessarily be stored as ﬁles, nor need

there be any one-to-one correspondence between these entities and any external representation. The

description is conceptual only, and does not specify any particular implementation. — end note ]

Translated translation units and instantiation units are combined as follows: [ Note: Some or all of

these may be supplied from a library. — end note ] Each translated translation unit is examined to

produce a list of required instantiations. [ Note: This may include instantiations which have been

explicitly requested (17.8.2). — end note ] The deﬁnitions of the required templates are located.

It is implementation-deﬁned whether the source of the translation units containing these deﬁnitions

is required to be available. [ Note: An implementation could encode suﬃcient information into the

translated translation unit so as to ensure the source is not required here. — end note ] All the required

instantiations are performed to produce instantiation units. [ Note: These are similar to translated

translation units, but contain no references to uninstantiated templates and no template deﬁnitions.

— end note ] The program is ill-formed if any instantiation fails.

All external entity references are resolved. Library components are linked to satisfy external references

to entities not deﬁned in the current translation. All such translator output is collected into a program

image which contains information needed for execution in its execution environment.

5.3 Character sets [lex.charset]

The basic source character set consists of 96 characters: the space character, the control characters representing

horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

0 1 2 3 4 5 6 7 8 9

_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & |

! = , \ " ’

The universal-character-name construct provides a way to name other characters.

hex-quad:

hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

universal-character-name:

\u hex-quad

\U hex-quad hex-quad

The character designated by the universal-character-name

\UNNNNNNNN

is that character whose character

short name in ISO/IEC 10646 is

NNNNNNNN

; the character designated by the universal-character-name

\uNNNN

is that character whose character short name in ISO/IEC 10646 is

0000NNNN

. If the hexadecimal value for a

universal-character-name corresponds to a surrogate code point (in the range 0xD800–0xDFFF, inclusive),

the program is ill-formed. Additionally, if the hexadecimal value for a universal-character-name outside the

c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control

character (in either of the ranges 0x00–0x1F or 0x7F–0x9F, both inclusive) or to a character in the basic

source character set, the program is ill-formed.

The basic execution character set and the basic execution wide-character set shall each contain all the

members of the basic source character set, plus control characters representing alert, backspace, and carriage

return, plus a null character (respectively, null wide character), whose value is 0. For each basic execution

character set, the values of the members shall be non-negative and distinct from one another. In both the

source and execution basic character sets, the value of each character after

in the above list of decimal

digits shall be one greater than the value of the previous. The execution character set and the execution

wide-character set are implementation-deﬁned supersets of the basic execution character set and the basic

10)

The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC

10646 which corresponds to the ASCII character set. However, because the mapping from source ﬁle characters to the source

character set (described in translation phase 1) is speciﬁed as implementation-deﬁned, an implementation is required to document

how the basic source characters are represented in source ﬁles.

11)

A sequence of characters resembling a universal-character-name in an r-char-sequence (5.13.5) does not form a universal-

character-name.

§ 5.3 10

execution wide-character set, respectively. The values of the members of the execution character sets and the

sets of additional members are locale-speciﬁc.

5.4 Preprocessing tokens [lex.pptoken]

preprocessing-token:

header-name

identiﬁer

pp-number

character-literal

user-deﬁned-character-literal

string-literal

user-deﬁned-string-literal

preprocessing-op-or-punc

each non-white-space character that cannot be one of the above

Each preprocessing token that is converted to a token (5.6) shall have the lexical form of a keyword, an

identiﬁer, a literal, an operator, or a punctuator.

A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6. The

categories of preprocessing token are: header names, identiﬁers, preprocessing numbers, character literals

(including user-deﬁned character literals), string literals (including user-deﬁned string literals), preprocessing

operators and punctuators, and single non-white-space characters that do not lexically match the other

preprocessing token categories. If a

’

or a

character matches the last category, the behavior is undeﬁned.

Preprocessing tokens can be separated by white space; this consists of comments (5.7), or white-space

characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. As described in Clause

19, in certain circumstances during translation phase 4, white space (or the absence thereof) serves as more

than preprocessing token separation. White space can appear within a preprocessing token only as part of a

header name or between the quotation characters in a character literal or string literal.

If the input stream has been parsed into preprocessing tokens up to a given character:

—

(3.1)

If the next character begins a sequence of characters that could be the preﬁx and initial double quote

of a raw string literal, such as

, the next preprocessing token shall be a raw string literal. Between

the initial and ﬁnal double quote characters of the raw string, any transformations performed in phases

1 and 2 (universal-character-names and line splicing) are reverted; this reversion shall apply before any

d-char, r-char, or delimiting parenthesis is identiﬁed. The raw string literal is deﬁned as the shortest

sequence of characters that matches the raw-string pattern

encoding-preﬁx

opt

R raw-string

—

(3.2)

Otherwise, if the next three characters are

<::

and the subsequent character is neither

nor

, the

is treated as a preprocessing token by itself and not as the ﬁrst character of the alternative token <:.

—

(3.3)

Otherwise, the next preprocessing token is the longest sequence of characters that could constitute

a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-

name (5.8) is only formed within a #include directive (19.2).

[ Example:

#define R "x"

const char* s = R"y"; // ill-formed raw string, not "x" "y"

— end example ]

[ Example: The program fragment

0xe+foo

is parsed as a preprocessing number token (one that is not a

valid ﬂoating or integer literal token), even though a parse as three preprocessing tokens

0xe

, and

foo

might produce a valid expression (for example, if

foo

were a macro deﬁned as

). Similarly, the program

fragment

1E1

is parsed as a preprocessing number (one that is a valid ﬂoating literal token), whether or not

E is a macro name. — end example ]

[ Example: The program fragment

x+++++y

is parsed as

x ++ ++ + y

, which, if

and

have integral types,

violates a constraint on increment operators, even though the parse

x ++ + ++ y

might yield a correct

expression. — end example ]

§ 5.4 11

5.5 Alternative tokens [lex.digraph]

Alternative token representations are provided for some operators and punctuators.

In all respects of the language, each alternative token behaves the same, respectively, as its primary token,

except for its spelling.

The set of alternative tokens is deﬁned in Table 1.

Table 1 — Alternative tokens

Alternative Primary Alternative Primary Alternative Primary

<% { and && and_eq &=

%> } bitor | or_eq |=

<: [ or || xor_eq ^=

:> ] xor ^ not !

%: # compl

not_eq !=

%:%: ## bitand &

5.6 Tokens [lex.token]

token:

identiﬁer

keyword

literal

operator

punctuator

There are ﬁve kinds of tokens: identiﬁers, keywords, literals,

operators, and other separators. Blanks,

horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, “white space”), as described

below, are ignored except as they serve to separate tokens. [ Note: Some white space is required to separate

otherwise adjacent identiﬁers, keywords, numeric literals, and alternative tokens containing alphabetic

characters. — end note ]

5.7 Comments [lex.comment]

The characters

start a comment, which terminates with the characters

. These comments do not nest.

The characters

start a comment, which terminates immediately before the next new-line character. If

there is a form-feed or a vertical-tab character in such a comment, only white-space characters shall appear

between it and the new-line that terminates the comment; no diagnostic is required. [ Note: The comment

characters

, and

have no special meaning within a

comment and are treated just like other

characters. Similarly, the comment characters

and

have no special meaning within a

comment.

— end note ]

5.8 Header names [lex.header]

header-name:

< h-char-sequence >

" q-char-sequence "

h-char-sequence:

h-char

h-char-sequence h-char

h-char:

any member of the source character set except new-line and >

q-char-sequence:

q-char

q-char-sequence q-char

q-char:

any member of the source character set except new-line and "

12)

These include “digraphs” and additional reserved words. The term “digraph” (token consisting of two characters) is not

perfectly descriptive, since one of the alternative preprocessing-tokens is

%:%:

and of course several primary tokens contain two

characters. Nonetheless, those alternative tokens that aren’t lexical keywords are colloquially known as “digraphs”.

13)

Thus the “stringized” values (19.3.2) of

[

and

will be diﬀerent, maintaining the source spelling, but the tokens can

otherwise be freely interchanged.

14) Literals include strings and character and numeric literals.

§ 5.8 12

剩余1544页未读，继续阅读

cq_jinglian

粉丝: 0
资源: 5

C++17标准草案：探讨语言发展与核心特性

C++标准文档

Dev C++软件

C/C++11-20的标准API中文帮助文档CHM

ISO C++最新标准(C++11)

C++标准2003C++标准2003

C++标准库 C++标准库

C++标准

最新C++国际标准.rar

C++语言标准

标准C++

最新资源