(1) Individual code units which form parts of a surrogate pair can be encoded using this escape sequence.
(2) Any Unicode character can be encoded this way, but characters outside the Basic Multilingual Plane (BMP)
will be encoded using a surrogate pair if Python is compiled to use 16-bit code units (the default). Individual
code units which form parts of a surrogate pair can be encoded using this escape sequence.
(3) As in Standard C, up to three octal digits are accepted.
(4) Unlike in Standard C, at most two hex digits are accepted.
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left
in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output
is more easily recognized as broken.) It is also important to note that the escape sequences marked as “(Unicode
only)” in the table above fall into the category of unrecognized escapes for non-Unicode string literals.
不像标准C, 所有不能被解释的转义序列留在串不作改变, 即反斜线留在串中(这个行为在调试中有用:
如果输入出错, 这样可以很容易地判断出错), 也要注意, 上面仅仅在Unicode中才有效的转义序列,在
非Unicode字面值中是无效的.
When an ‘r’ or ‘R’ prefix is present, a character following a backslash is included in the string without change,
and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a
backslash and a lowercase ‘n’. String quotes can be escaped with a backslash, but the backslash remains in the
string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote;
r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically,
a raw string cannot end in a single backslash (since the backslash would escape the following quote character).
Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string,
not as a line continuation.
当给出’r’或’R’时, 跟 随 反 斜 线 后 面 的 字 符 不 被 改 变, 并且所有制的反斜线字符都会留在串中.例
如,串r"\n"由两个字符组成:一个反斜线的一个小写的’n’.引用字符可以用反斜线引用, 但反斜线会留
在串中.比如r"\""是一个有效的串字面值(即使原始串不能以连续的奇数个反斜线结束). 另外, 原始不能
以一个反斜线结束(因为反斜线会把后面的引用字符转义), 也要注意新行号前的反斜线是解释为串中的两
个字符, 而不是作为续行处理.
When an ‘r’ or ‘R’ prefix is used in conjunction with a ‘u’ or ‘U’ prefix, then the \uXXXX escape sequence is
processed while all other backslashes are left in the string. For example, the string literal ur"\u0062\n" con-
sists of three Unicode characters: ‘LATIN SMALL LETTER B’, ‘REVERSE SOLIDUS’, and ‘LATIN SMALL
LETTER N’. Backslashes can be escaped with a preceding backslash; however, both remain in the string. As a
result, \uXXXX escape sequences are only recognized when there are an odd number of backslashes.
2.4.2 串字面值的连接String literal concatenation
Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are
allowed, and their meaning is the same as their concatenation. Thus, "hello" ’world’ is equivalent to
"helloworld". This feature can be used to reduce the number of backslashes needed, to split long strings
conveniently across long lines, or even to add comments to parts of strings, for example:
多个相邻的串字面值(由空白分隔), 可能使用不同的引用习惯, 是允许的, 并且它的含义在连接时是一样的
行.因此, ”hello””world”等价于”helloworld”.这个待征可以用来减少原本要使用的反斜线的数目, 可以把一
个长串分隔在多行上,下班甚至在串的某个部分加上注释, 例如:
re.compile("[A-Za-z_]" # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)
Note that this feature is defined at the syntactical level, but implemented at compile time. The ‘+’ operator must
be used to concatenate string expressions at run time. Also note that literal concatenation can use different quoting
styles for each component (even mixing raw strings and triple quoted strings).
注意这个功能是定义在句法层次上的, 但是是在编译时实现的.在运行时连接串必须使用”+”运算符. 并且
不同的引用字符可以混用, 甚至可以将原始串与三重引用串混着用.
10 第二章 2. 词法分析Lexical analysis