C++20工作草案：最新标准与关键特性概览

需积分: 1 113 浏览量更新于2024-07-17 1 收藏 6.26MB PDF 举报

C++20最新工作方案（N4810）发布于2019年3月15日，作为C++语言标准草案的早期版本，它概述了C++20语言的关键特性与改进。该文档共计1755页，旨在为C++程序员提供关于语言设计、实现细节和语法规则的全面指导。以下是从部分章节中提炼的重要知识点： 1. **范围（Scope）**：章节6.3中介绍了C++20对作用域规则的更新，包括变量的作用域定义，以及如何处理嵌套函数和类成员的访问控制。 2. **一定义规则（One-definition Rule）**：在6.2节中，这一原则被强调，确保每个实体（如函数或类）在整个程序中只有一处定义，防止链接时的重复定义问题。 3. **命名查找（Name Lookup）**：6.4节详细阐述了变量、函数等标识符在程序中的查找过程，涉及到作用域链、隐式名称搜索规则以及避免名称冲突的方法。 4. **类型系统（Type System）**：6.7节涉及了C++20的新类型特性，如模板元编程的增强，以及可能引入的泛型改进，以便于编写更灵活和可复用的代码。 5. **表达式处理（Expressions）**：7.1节重点关注了C++20中新的运算符、语法和表达式结构，如可能引入的lambda表达式的改进，以及对算术和逻辑操作符的更新。 6. **预处理器（Preprocessor）**：5.4-5.13章节涵盖了预处理器的使用规范，包括替代token、注释处理、字符集选择以及编译时常量的定义。 7. **语言一致性（Implementation Compliance）**：4.1节强调了C++20对实现者的要求，确保不同编译器之间对新特性的兼容性和行为的一致性。值得注意的是，尽管文档是早期的工作草案，这意味着它可能存在不完整和错误的地方，且格式未优化。开发者在使用时需谨慎，可能需要参考后续的修订版以获取更准确的信息。 C++20的这些变化将影响代码的编写方式，提高代码的性能和可读性，同时也对编译器实现者提出了新的挑战。对于C++社区而言，理解并适应这些新特性至关重要，以保持与现代C++技术同步。

Names for syntactic categories have generally been chosen according to the following rules:

—

(2.1)

X-name is a use of an identiﬁer in a context that determines its meaning (e.g., class-name, typedef-name).

—

(2.2)

X-id is an identiﬁer with no context-dependent meaning (e.g., qualiﬁed-id).

—

(2.3)

X-seq is one or more X’s without intervening delimiters (e.g., declaration-seq is a sequence of declara-

tions).

—

(2.4)

X-list is one or more X’s separated by intervening commas (e.g., identiﬁer-list is a sequence of identiﬁers

separated by commas).

4.4 Acknowledgments [intro.ack]

The C

programming language as described in this document is based on the language as described in

Chapter R (Reference Manual) of Stroustrup: The C

Programming Language (second edition, Addison-

Wesley Publishing Company, ISBN 0-201-53992-6, copyright

1991 AT&T). That, in turn, is based on the C

programming language as described in Appendix A of Kernighan and Ritchie: The C Programming Language

Portions of the library Clauses of this document are based on work by P.J. Plauger, which was published as

The Draft Standard C

POSIX® is a registered trademark of the Institute of Electrical and Electronic Engineers, Inc.

ECMAScript® is a registered trademark of Ecma International.

All rights in these originals are reserved.

§ 4.4 8

5 Lexical conventions [lex]

5.1 Separate translation [lex.separate]

The text of the program is kept in units called source ﬁles in this document. A source ﬁle together with

all the headers (16.5.1.2) and source ﬁles included (15.2) via the preprocessing directive #include, less any

source lines skipped by any of the conditional inclusion (15.1) preprocessing directives, is called a translation

unit. [Note: A C

program need not all be translated at the same time. — end note]

[Note: Previously translated translation units and instantiation units can be preserved individually or in

libraries. The separate translation units of a program communicate (6.5) by (for example) calls to functions

whose identiﬁers have external or module linkage, manipulation of objects whose identiﬁers have external or

module linkage, or manipulation of data ﬁles. Translation units can be separately translated and then later

linked to produce an executable program (6.5). — end note]

5.2 Phases of translation [lex.phases]

The precedence among the syntax rules of translation is speciﬁed by the following phases.

Physical source ﬁle characters are mapped, in an implementation-deﬁned manner, to the basic source

character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical

source ﬁle characters accepted is implementation-deﬁned. Any source ﬁle character not in the basic

source character set (5.3) is replaced by the universal-character-name that designates that character.

An implementation may use any internal encoding, so long as an actual extended character encountered

in the source ﬁle, and the same extended character expressed in the source ﬁle as a universal-character-

name (e.g., using the

\uXXXX

notation), are handled equivalently except where this replacement is

reverted (5.4) in a raw string literal.

Each instance of a backslash character (\) immediately followed by a new-line character is deleted,

splicing physical source lines to form logical source lines. Only the last backslash on any physical source

line shall be eligible for being part of such a splice. Except for splices reverted in a raw string literal,

if a splice results in a character sequence that matches the syntax of a universal-character-name, the

behavior is undeﬁned. A source ﬁle that is not empty and that does not end in a new-line character,

or that ends in a new-line character immediately preceded by a backslash character before any such

splicing takes place, shall be processed as if an additional new-line character were appended to the ﬁle.

The source ﬁle is decomposed into preprocessing tokens (5.4) and sequences of white-space characters

(including comments). A source ﬁle shall not end in a partial preprocessing token or in a partial

comment.

Each comment is replaced by one space character. New-line characters are retained.

Whether each nonempty sequence of white-space characters other than new-line is retained or replaced

by one space character is unspeciﬁed. The process of dividing a source ﬁle’s characters into preprocessing

tokens is context-dependent. [Example: See the handling of

within a

#include

preprocessing directive.

— end example]

Preprocessing directives are executed, macro invocations are expanded, and

_Pragma

unary operator

expressions are executed. If a character sequence that matches the syntax of a universal-character-name

is produced by token concatenation (15.5.3), the behavior is undeﬁned. A

#include

preprocessing

directive causes the named header or source ﬁle to be processed from phase 1 through phase 4, recursively.

All preprocessing directives are then deleted.

Each source character set member in a character literal or a string literal, as well as each escape

sequence and universal-character-name in a character literal or a non-raw string literal, is converted to

the corresponding member of the execution character set (5.13.3, 5.13.5); if there is no corresponding

member, it is converted to an implementation-deﬁned member other than the null (wide) character.

Implementations must behave as if these separate phases occur, although in practice diﬀerent phases might be folded

together.

A partial preprocessing token would arise from a source ﬁle ending in the ﬁrst portion of a multi-character token that

requires a terminating sequence of characters, such as a header-name that is missing the closing

. A partial comment

would arise from a source ﬁle ending with an unclosed /* comment.

8) An implementation need not convert all non-corresponding source characters to the same execution character.

§ 5.2 9

6. Adjacent string literal tokens are concatenated.

White-space characters separating tokens are no longer signiﬁcant. Each preprocessing token is converted

into a token (5.6). The resulting tokens are syntactically and semantically analyzed and translated as a

translation unit. [Note: The process of analyzing and translating the tokens may occasionally result in

one token being replaced by a sequence of other tokens (13.2). — end note] It is implementation-deﬁned

whether the sources for module units and header units on which the current translation unit has an

interface dependency (10.1, 10.3) are required to be available. [Note: Source ﬁles, translation units and

translated translation units need not necessarily be stored as ﬁles, nor need there be any one-to-one

correspondence between these entities and any external representation. The description is conceptual

only, and does not specify any particular implementation. — end note]

Translated translation units and instantiation units are combined as follows: [Note: Some or all of

these may be supplied from a library. — end note] Each translated translation unit is examined

to produce a list of required instantiations. [Note: This may include instantiations which have been

explicitly requested (13.8.2). — end note] The deﬁnitions of the required templates are located. It

is implementation-deﬁned whether the source of the translation units containing these deﬁnitions

is required to be available. [Note: An implementation could encode suﬃcient information into the

translated translation unit so as to ensure the source is not required here. — end note] All the required

instantiations are performed to produce instantiation units. [Note: These are similar to translated

translation units, but contain no references to uninstantiated templates and no template deﬁnitions.

— end note] The program is ill-formed if any instantiation fails.

All external entity references are resolved. Library components are linked to satisfy external references

to entities not deﬁned in the current translation. All such translator output is collected into a program

image which contains information needed for execution in its execution environment.

5.3 Character sets [lex.charset]

The basic source character set consists of 96 characters: the space character, the control characters representing

horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

0 1 2 3 4 5 6 7 8 9

_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & |

! = , \ " ’

The universal-character-name construct provides a way to name other characters.

hex-quad:

hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

universal-character-name:

\u hex-quad

\U hex-quad hex-quad

The character designated by the universal-character-name

\U00NNNNNN

is that character that has

U+NNNNNN

as a code point short identiﬁer; the character designated by the universal-character-name

\uNNNN

is that

character that has

U+NNNN

as a code point short identiﬁer. If a universal-character-name does not correspond

to a code point in ISO/IEC 10646 or if a universal-character-name corresponds to a surrogate code point, the

program is ill-formed. Additionally, if a universal-character-name outside the c-char-sequence, s-char-sequence,

or r-char-sequence of a character or string literal corresponds to a control character or to a character in the

basic source character set, the program is ill-formed.

[Note: ISO/IEC 10646 code points are within the

range 0x0-0x10FFFF (inclusive). A surrogate code point is a value in the range 0xD800-0xDFFF (inclusive).

A control character is a character whose code point is in either of the ranges 0x0-0x1F or 0x7F-0x9F (both

inclusive). — end note]

The basic execution character set and the basic execution wide-character set shall each contain all the

members of the basic source character set, plus control characters representing alert, backspace, and carriage

return, plus a null character (respectively, null wide character), whose value is 0. For each basic execution

The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC

10646 which corresponds to the ASCII character set. However, because the mapping from source ﬁle characters to the source

character set (described in translation phase 1) is speciﬁed as implementation-deﬁned, an implementation is required to document

how the basic source characters are represented in source ﬁles.

10)

A sequence of characters resembling a universal-character-name in an r-char-sequence (5.13.5) does not form a universal-

character-name.

§ 5.3 10

character set, the values of the members shall be non-negative and distinct from one another. In both the

source and execution basic character sets, the value of each character after

in the above list of decimal

digits shall be one greater than the value of the previous. The execution character set and the execution

wide-character set are implementation-deﬁned supersets of the basic execution character set and the basic

execution wide-character set, respectively. The values of the members of the execution character sets and the

sets of additional members are locale-speciﬁc.

5.4 Preprocessing tokens [lex.pptoken]

preprocessing-token:

header-name

identiﬁer

pp-number

character-literal

user-deﬁned-character-literal

string-literal

user-deﬁned-string-literal

preprocessing-op-or-punc

each non-white-space character that cannot be one of the above

Each preprocessing token that is converted to a token (5.6) shall have the lexical form of a keyword, an

identiﬁer, a literal, an operator, or a punctuator.

A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6. The

categories of preprocessing token are: header names, identiﬁers, preprocessing numbers, character literals

(including user-deﬁned character literals), string literals (including user-deﬁned string literals), preprocessing

operators and punctuators, and single non-white-space characters that do not lexically match the other

preprocessing token categories. If a

’

or a

character matches the last category, the behavior is undeﬁned.

Preprocessing tokens can be separated by white space; this consists of comments (5.7), or white-space

characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. As described in Clause

15, in certain circumstances during translation phase 4, white space (or the absence thereof) serves as more

than preprocessing token separation. White space can appear within a preprocessing token only as part of a

header name or between the quotation characters in a character literal or string literal.

If the input stream has been parsed into preprocessing tokens up to a given character:

—

(3.1)

If the next character begins a sequence of characters that could be the preﬁx and initial double quote

of a raw string literal, such as

, the next preprocessing token shall be a raw string literal. Between

the initial and ﬁnal double quote characters of the raw string, any transformations performed in phases

1 and 2 (universal-character-names and line splicing) are reverted; this reversion shall apply before any

d-char, r-char, or delimiting parenthesis is identiﬁed. The raw string literal is deﬁned as the shortest

sequence of characters that matches the raw-string pattern

encoding-preﬁx

opt

R raw-string

—

(3.2)

Otherwise, if the next three characters are

<::

and the subsequent character is neither

nor

, the

is treated as a preprocessing token by itself and not as the ﬁrst character of the alternative token <:.

—

(3.3)

Otherwise, the next preprocessing token is the longest sequence of characters that could constitute

a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-

name (5.8) is only formed

—

(3.3.1)

within a #include directive (15.2),

—

(3.3.2)

within a has-include-expression, or

—

(3.3.3)

outside of any preprocessing directive, if applying phase 4 of translation to the sequence of

preprocessing tokens produced thus far is valid and results in an import-seq (15.4).

[Example:

#define R "x"

const char* s = R"y"; // ill-formed raw string, not "x" "y"

— end example]

[Example: The program fragment

0xe+foo

is parsed as a preprocessing number token (one that is not a valid

ﬂoating or integer literal token), even though a parse as three preprocessing tokens

0xe

, and

foo

might

produce a valid expression (for example, if

foo

were a macro deﬁned as

). Similarly, the program fragment

§ 5.4 11

1E1

is parsed as a preprocessing number (one that is a valid ﬂoating literal token), whether or not

is a

macro name. — end example]

[Example: The program fragment

x+++++y

is parsed as

x ++ ++ + y

, which, if

and

have integral types,

violates a constraint on increment operators, even though the parse

x ++ + ++ y

might yield a correct

expression. — end example]

5.5 Alternative tokens [lex.digraph]

Alternative token representations are provided for some operators and punctuators.

In all respects of the language, each alternative token behaves the same, respectively, as its primary token,

except for its spelling.

The set of alternative tokens is deﬁned in Table 1.

Table 1 — Alternative tokens

Alternative Primary Alternative Primary Alternative Primary

<% { and && and_eq &=

%> } bitor | or_eq |=

<: [ or || xor_eq ^=

:> ] xor ^ not !

%: # compl

not_eq !=

%:%: ## bitand &

5.6 Tokens [lex.token]

token:

identiﬁer

keyword

literal

operator

punctuator

There are ﬁve kinds of tokens: identiﬁers, keywords, literals,

operators, and other separators. Blanks,

horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, “white space”), as described

below, are ignored except as they serve to separate tokens. [Note: Some white space is required to separate

otherwise adjacent identiﬁers, keywords, numeric literals, and alternative tokens containing alphabetic

characters. — end note]

5.7 Comments [lex.comment]

The characters

start a comment, which terminates with the characters

. These comments do not nest.

The characters

start a comment, which terminates immediately before the next new-line character. If

there is a form-feed or a vertical-tab character in such a comment, only white-space characters shall appear

between it and the new-line that terminates the comment; no diagnostic is required. [Note: The comment

characters

, and

have no special meaning within a

comment and are treated just like other

characters. Similarly, the comment characters

and

have no special meaning within a

comment.

— end note]

5.8 Header names [lex.header]

header-name:

< h-char-sequence >

" q-char-sequence "

h-char-sequence:

h-char

h-char-sequence h-char

11)

These include “digraphs” and additional reserved words. The term “digraph” (token consisting of two characters) is not

perfectly descriptive, since one of the alternative preprocessing-tokens is

%:%:

and of course several primary tokens contain two

characters. Nonetheless, those alternative tokens that aren’t lexical keywords are colloquially known as “digraphs”.

12)

Thus the “stringized” values (15.5.2) of

[

and

will be diﬀerent, maintaining the source spelling, but the tokens can

otherwise be freely interchanged.

13) Literals include strings and character and numeric literals.

§ 5.8 12

剩余1754页未读，继续阅读

东邪不邪

粉丝: 4
资源: 46

C++20工作草案：最新标准与关键特性概览

C/C++中文文档（支持C++20和C18）和蓝桥杯C/C++组用的文档

C++11/17/20 标准帮助文档（docsets 格式）

Dev-Cpp(中文版，支持C++20标准）.rar

eclipse C++, JAVA配色方案

C++解决方案解析

visual-c++兼容性解决方案

C++参考手册-C++98_C++03_C++11_C++14_C++17_C++20.rar

C++程序设计语言方案分析.pptx

C/C++项目开发代码方案.zip

C++20 in Examples

最新资源