没有合适的资源?快使用搜索试试~ 我知道了~
首页C++17标准草案:探讨语言发展与核心特性
C++17标准草案:探讨语言发展与核心特性
需积分: 46 18 下载量 139 浏览量
更新于2024-07-18
收藏 5.66MB PDF 举报
本文档详细介绍了C++语言的最新标准,特别关注于文档版本N4741,发布日期为2018年4月2日,该版本修订自N4727,并由Richard Smith来自Google Inc. 编制。这份标准草案旨在为开发人员提供对C++发展方向的深入理解,尽管它尚处于早期阶段,存在不完整和格式错误,但它仍然是C++编程的重要参考资料。
标准内容涵盖了广泛的范畴,从范围定义、规范引用,到具体的语言特性。首先,第1章概述了标准的适用范围,明确了其目标和预期的应用环境。接下来的章节深入探讨了C++的关键概念:
4.1 实施合规性:这部分强调了编译器开发者在实现新标准时应遵循的要求,确保代码的一致性和可移植性。
4.2 文档结构:描述了标准文档本身的组织架构,便于用户查找所需信息。
4.3 语法和符号表示:涉及如何编写和解析C++源代码,包括字符集选择、预处理器指令、替代符号等。
5.1 语义规则:详细规定了标识符、关键字、运算符和标点符号的处理方式,这些都是程序理解和编译的基础。
6.1 声明与定义:这部分阐述了变量、函数、类等的声明和定义规则,以及它们在程序中的作用。
6.2 一定义规则:确保每个实体在整个程序中只有一处定义,避免重复和链接问题。
6.3 变量作用域:解释了变量的作用域是如何确定的,这对于代码的封装和复用至关重要。
6.4 名称查找:探讨了变量、函数和其他标识符在程序中如何被查找和解析的过程。
6.5 程序与链接:涵盖了链接阶段的规则,如静态和动态链接,以及如何处理不同编译单元之间的关系。
6.6 内存和对象:讨论内存管理、对象生命周期以及构造和析构函数的执行机制。
6.7 类型系统:包括基本类型、复合类型、模板和泛型编程的处理方式。
6.8 程序执行:涉及程序的控制流程、异常处理、表达式求值等方面。
7. 标准转换:这部分描述了不同类型之间的转换规则,如何在不同数据类型间进行安全且正确的转换。
由于这是早期的草案,读者需要注意其中可能存在不成熟或错误的部分。然而,对于了解C++最新发展趋势和技术规范的开发者来说,这无疑是一份重要的参考资源。通过深入学习和理解这些内容,开发人员可以更好地优化代码,确保其符合标准,提升程序的可靠性和效率。
© ISO/IEC N4741
4.2 Structure of this document [intro.structure]
1
Clause 5 through Clause 19 describe the C
++
programming language. That description includes detailed
syntactic specifications in a form described in 4.3. For convenience, Annex A repeats all such syntactic
specifications.
2
Clause 21 through Clause 33 and Annex D (the library clauses) describe the C
++
standard library. That
description includes detailed descriptions of the entities and macros that constitute the library, in a form
described in Clause 20.
3
Annex B recommends lower bounds on the capacity of conforming implementations.
4
Annex C summarizes the evolution of C
++
since its first published description, and explains in detail the
differences between C
++
and C. Certain features of C
++
exist solely for compatibility purposes; Annex D
describes those features.
5
Throughout this document, each example is introduced by “[ Example: ” and terminated by “ — end example ]”.
Each note is introduced by “[ Note: ” and terminated by “ — end note ]”. Examples and notes may be nested.
4.3 Syntax notation [syntax]
1
In the syntax notation used in this document, syntactic categories are indicated by italic type, and literal
words and characters in
constant width
type. Alternatives are listed on separate lines except in a few cases
where a long set of alternatives is marked by the phrase “one of”. If the text of an alternative is too long to
fit on a line, the text is continued on subsequent lines indented from the first one. An optional terminal or
non-terminal symbol is indicated by the subscript “
opt
”, so
{ expression
opt
}
indicates an optional expression enclosed in braces.
2
Names for syntactic categories have generally been chosen according to the following rules:
—
(2.1)
X-name is a use of an identifier in a context that determines its meaning (e.g., class-name, typedef-name).
—
(2.2)
X-id is an identifier with no context-dependent meaning (e.g., qualified-id).
—
(2.3)
X-seq is one or more X ’s without intervening delimiters (e.g., declaration-seq is a sequence of declara-
tions).
—
(2.4)
X-list is one or more X’s separated by intervening commas (e.g., identifier-list is a sequence of identifiers
separated by commas).
4.4 Acknowledgments [intro.ack]
1
The C
++
programming language as described in this document is based on the language as described in
Chapter R (Reference Manual) of Stroustrup: The C
++
Programming Language (second edition, Addison-
Wesley Publishing Company, ISBN 0-201-53992-6, copyright
©
1991 AT&T). That, in turn, is based on the C
programming language as described in Appendix A of Kernighan and Ritchie: The C Programming Language
(Prentice-Hall, 1978, ISBN 0-13-110163-3, copyright ©1978 AT&T).
2
Portions of the library Clauses of this document are based on work by P.J. Plauger, which was published as
The Draft Standard C
++
Library (Prentice-Hall, ISBN 0-13-117003-1, copyright ©1995 P.J. Plauger).
3
POSIX® is a registered trademark of the Institute of Electrical and Electronic Engineers, Inc.
4
ECMAScript® is a registered trademark of Ecma International.
5
All rights in these originals are reserved.
§ 4.4 8
© ISO/IEC N4741
5 Lexical conventions [lex]
5.1 Separate translation [lex.separate]
1
The text of the program is kept in units called source files in this document. A source file together with
all the headers (20.5.1.2) and source files included (19.2) via the preprocessing directive #include, less any
source lines skipped by any of the conditional inclusion (19.1) preprocessing directives, is called a translation
unit. [ Note: A C
++
program need not all be translated at the same time. — end note ]
2
[ Note: Previously translated translation units and instantiation units can be preserved individually or in
libraries. The separate translation units of a program communicate (6.5) by (for example) calls to functions
whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or
manipulation of data files. Translation units can be separately translated and then later linked to produce an
executable program (6.5). — end note ]
5.2 Phases of translation [lex.phases]
1
The precedence among the syntax rules of translation is specified by the following phases.
7
1.
Physical source file characters are mapped, in an implementation-defined manner, to the basic source
character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical
source file characters accepted is implementation-defined. Any source file character not in the basic
source character set (5.3) is replaced by the universal-character-name that designates that character.
An implementation may use any internal encoding, so long as an actual extended character encountered
in the source file, and the same extended character expressed in the source file as a universal-character-
name (e.g., using the
\uXXXX
notation), are handled equivalently except where this replacement is
reverted (5.4) in a raw string literal.
2.
Each instance of a backslash character (\) immediately followed by a new-line character is deleted,
splicing physical source lines to form logical source lines. Only the last backslash on any physical source
line shall be eligible for being part of such a splice. Except for splices reverted in a raw string literal,
if a splice results in a character sequence that matches the syntax of a universal-character-name, the
behavior is undefined. A source file that is not empty and that does not end in a new-line character,
or that ends in a new-line character immediately preceded by a backslash character before any such
splicing takes place, shall be processed as if an additional new-line character were appended to the file.
3.
The source file is decomposed into preprocessing tokens (5.4) and sequences of white-space characters
(including comments). A source file shall not end in a partial preprocessing token or in a partial
comment.
8
Each comment is replaced by one space character. New-line characters are retained.
Whether each nonempty sequence of white-space characters other than new-line is retained or replaced
by one space character is unspecified. The process of dividing a source file’s characters into preprocessing
tokens is context-dependent. [ Example: See the handling of
<
within a
#include
preprocessing directive.
— end example ]
4.
Preprocessing directives are executed, macro invocations are expanded, and
_Pragma
unary operator
expressions are executed. If a character sequence that matches the syntax of a universal-character-name
is produced by token concatenation (19.3.3), the behavior is undefined. A
#include
preprocessing
directive causes the named header or source file to be processed from phase 1 through phase 4, recursively.
All preprocessing directives are then deleted.
5.
Each source character set member in a character literal or a string literal, as well as each escape
sequence and universal-character-name in a character literal or a non-raw string literal, is converted to
the corresponding member of the execution character set (5.13.3, 5.13.5); if there is no corresponding
member, it is converted to an implementation-defined member other than the null (wide) character.
9
7)
Implementations must behave as if these separate phases occur, although in practice different phases might be folded
together.
8)
A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that
requires a terminating sequence of characters, such as a header-name that is missing the closing
"
or
>
. A partial comment
would arise from a source file ending with an unclosed /* comment.
9) An implementation need not convert all non-corresponding source characters to the same execution character.
§ 5.2 9
© ISO/IEC N4741
6. Adjacent string literal tokens are concatenated.
7.
White-space characters separating tokens are no longer significant. Each preprocessing token is converted
into a token (5.6). The resulting tokens are syntactically and semantically analyzed and translated
as a translation unit. [ Note: The process of analyzing and translating the tokens may occasionally
result in one token being replaced by a sequence of other tokens (17.2). — end note ] [ Note: Source
files, translation units and translated translation units need not necessarily be stored as files, nor need
there be any one-to-one correspondence between these entities and any external representation. The
description is conceptual only, and does not specify any particular implementation. — end note ]
8.
Translated translation units and instantiation units are combined as follows: [ Note: Some or all of
these may be supplied from a library. — end note ] Each translated translation unit is examined to
produce a list of required instantiations. [ Note: This may include instantiations which have been
explicitly requested (17.8.2). — end note ] The definitions of the required templates are located.
It is implementation-defined whether the source of the translation units containing these definitions
is required to be available. [ Note: An implementation could encode sufficient information into the
translated translation unit so as to ensure the source is not required here. — end note ] All the required
instantiations are performed to produce instantiation units. [ Note: These are similar to translated
translation units, but contain no references to uninstantiated templates and no template definitions.
— end note ] The program is ill-formed if any instantiation fails.
9.
All external entity references are resolved. Library components are linked to satisfy external references
to entities not defined in the current translation. All such translator output is collected into a program
image which contains information needed for execution in its execution environment.
5.3 Character sets [lex.charset]
1
The basic source character set consists of 96 characters: the space character, the control characters representing
horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:
10
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & |
~
! = , \ " ’
2
The universal-character-name construct provides a way to name other characters.
hex-quad:
hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
universal-character-name:
\u hex-quad
\U hex-quad hex-quad
The character designated by the universal-character-name
\UNNNNNNNN
is that character whose character
short name in ISO/IEC 10646 is
NNNNNNNN
; the character designated by the universal-character-name
\uNNNN
is that character whose character short name in ISO/IEC 10646 is
0000NNNN
. If the hexadecimal value for a
universal-character-name corresponds to a surrogate code point (in the range 0xD800–0xDFFF, inclusive),
the program is ill-formed. Additionally, if the hexadecimal value for a universal-character-name outside the
c-char-sequence, s-char-sequence, or r-char-sequence of a character or string literal corresponds to a control
character (in either of the ranges 0x00–0x1F or 0x7F–0x9F, both inclusive) or to a character in the basic
source character set, the program is ill-formed.
11
3
The basic execution character set and the basic execution wide-character set shall each contain all the
members of the basic source character set, plus control characters representing alert, backspace, and carriage
return, plus a null character (respectively, null wide character), whose value is 0. For each basic execution
character set, the values of the members shall be non-negative and distinct from one another. In both the
source and execution basic character sets, the value of each character after
0
in the above list of decimal
digits shall be one greater than the value of the previous. The execution character set and the execution
wide-character set are implementation-defined supersets of the basic execution character set and the basic
10)
The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC
10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source
character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document
how the basic source characters are represented in source files.
11)
A sequence of characters resembling a universal-character-name in an r-char-sequence (5.13.5) does not form a universal-
character-name.
§ 5.3 10
© ISO/IEC N4741
execution wide-character set, respectively. The values of the members of the execution character sets and the
sets of additional members are locale-specific.
5.4 Preprocessing tokens [lex.pptoken]
preprocessing-token:
header-name
identifier
pp-number
character-literal
user-defined-character-literal
string-literal
user-defined-string-literal
preprocessing-op-or-punc
each non-white-space character that cannot be one of the above
1
Each preprocessing token that is converted to a token (5.6) shall have the lexical form of a keyword, an
identifier, a literal, an operator, or a punctuator.
2
A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6. The
categories of preprocessing token are: header names, identifiers, preprocessing numbers, character literals
(including user-defined character literals), string literals (including user-defined string literals), preprocessing
operators and punctuators, and single non-white-space characters that do not lexically match the other
preprocessing token categories. If a
’
or a
"
character matches the last category, the behavior is undefined.
Preprocessing tokens can be separated by white space; this consists of comments (5.7), or white-space
characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. As described in Clause
19, in certain circumstances during translation phase 4, white space (or the absence thereof) serves as more
than preprocessing token separation. White space can appear within a preprocessing token only as part of a
header name or between the quotation characters in a character literal or string literal.
3
If the input stream has been parsed into preprocessing tokens up to a given character:
—
(3.1)
If the next character begins a sequence of characters that could be the prefix and initial double quote
of a raw string literal, such as
R"
, the next preprocessing token shall be a raw string literal. Between
the initial and final double quote characters of the raw string, any transformations performed in phases
1 and 2 (universal-character-names and line splicing) are reverted; this reversion shall apply before any
d-char, r-char, or delimiting parenthesis is identified. The raw string literal is defined as the shortest
sequence of characters that matches the raw-string pattern
encoding-prefix
opt
R raw-string
—
(3.2)
Otherwise, if the next three characters are
<::
and the subsequent character is neither
:
nor
>
, the
<
is treated as a preprocessing token by itself and not as the first character of the alternative token <:.
—
(3.3)
Otherwise, the next preprocessing token is the longest sequence of characters that could constitute
a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-
name (5.8) is only formed within a #include directive (19.2).
[ Example:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
— end example ]
4
[ Example: The program fragment
0xe+foo
is parsed as a preprocessing number token (one that is not a
valid floating or integer literal token), even though a parse as three preprocessing tokens
0xe
,
+
, and
foo
might produce a valid expression (for example, if
foo
were a macro defined as
1
). Similarly, the program
fragment
1E1
is parsed as a preprocessing number (one that is a valid floating literal token), whether or not
E is a macro name. — end example ]
5
[ Example: The program fragment
x+++++y
is parsed as
x ++ ++ + y
, which, if
x
and
y
have integral types,
violates a constraint on increment operators, even though the parse
x ++ + ++ y
might yield a correct
expression. — end example ]
§ 5.4 11
© ISO/IEC N4741
5.5 Alternative tokens [lex.digraph]
1
Alternative token representations are provided for some operators and punctuators.
12
2
In all respects of the language, each alternative token behaves the same, respectively, as its primary token,
except for its spelling.
13
The set of alternative tokens is defined in Table 1.
Table 1 — Alternative tokens
Alternative Primary Alternative Primary Alternative Primary
<% { and && and_eq &=
%> } bitor | or_eq |=
<: [ or || xor_eq ^=
:> ] xor ^ not !
%: # compl
~
not_eq !=
%:%: ## bitand &
5.6 Tokens [lex.token]
token:
identifier
keyword
literal
operator
punctuator
1
There are five kinds of tokens: identifiers, keywords, literals,
14
operators, and other separators. Blanks,
horizontal and vertical tabs, newlines, formfeeds, and comments (collectively, “white space”), as described
below, are ignored except as they serve to separate tokens. [ Note: Some white space is required to separate
otherwise adjacent identifiers, keywords, numeric literals, and alternative tokens containing alphabetic
characters. — end note ]
5.7 Comments [lex.comment]
1
The characters
/*
start a comment, which terminates with the characters
*/
. These comments do not nest.
The characters
//
start a comment, which terminates immediately before the next new-line character. If
there is a form-feed or a vertical-tab character in such a comment, only white-space characters shall appear
between it and the new-line that terminates the comment; no diagnostic is required. [ Note: The comment
characters
//
,
/*
, and
*/
have no special meaning within a
//
comment and are treated just like other
characters. Similarly, the comment characters
//
and
/*
have no special meaning within a
/*
comment.
— end note ]
5.8 Header names [lex.header]
header-name:
< h-char-sequence >
" q-char-sequence "
h-char-sequence:
h-char
h-char-sequence h-char
h-char:
any member of the source character set except new-line and >
q-char-sequence:
q-char
q-char-sequence q-char
q-char:
any member of the source character set except new-line and "
12)
These include “digraphs” and additional reserved words. The term “digraph” (token consisting of two characters) is not
perfectly descriptive, since one of the alternative preprocessing-tokens is
%:%:
and of course several primary tokens contain two
characters. Nonetheless, those alternative tokens that aren’t lexical keywords are colloquially known as “digraphs”.
13)
Thus the “stringized” values (19.3.2) of
[
and
<:
will be different, maintaining the source spelling, but the tokens can
otherwise be freely interchanged.
14) Literals include strings and character and numeric literals.
§ 5.8 12
剩余1544页未读,继续阅读
2020-11-25 上传
703 浏览量
2009-02-12 上传
168 浏览量
2007-06-15 上传
2020-03-30 上传
2012-04-29 上传
cq_jinglian
- 粉丝: 0
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- zlib-1.2.12压缩包解析与技术要点
- 微信小程序滑动选项卡源码模版发布
- Unity虚拟人物唇同步插件Oculus Lipsync介绍
- Nginx 1.18.0版本WinSW自动安装与管理指南
- Java Swing和JDBC实现的ATM系统源码解析
- 掌握Spark Streaming与Maven集成的分布式大数据处理
- 深入学习推荐系统:教程、案例与项目实践
- Web开发者必备的取色工具软件介绍
- C语言实现李春葆数据结构实验程序
- 超市管理系统开发:asp+SQL Server 2005实战
- Redis伪集群搭建教程与实践
- 掌握网络活动细节:Wireshark v3.6.3网络嗅探工具详解
- 全面掌握美赛:建模、分析与编程实现教程
- Java图书馆系统完整项目源码及SQL文件解析
- PCtoLCD2002软件:高效图片和字符取模转换
- Java开发的体育赛事在线购票系统源码分析
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功