The Python Language Reference, Release 3.6.1
stringliteral ::= [stringprefix](shortstring | longstring)
stringprefix ::= “r” | “u” | “R” | “U” | “f” | “F”
| “fr” | “Fr” | “fR” | “FR” | “rf” | “rF” | “Rf” | “RF”
shortstring ::= “”’ shortstringitem
*
“”’ | ‘”’ shortstringitem
*
‘”’
longstring ::= “’‘”’ longstringitem
*
“’‘”’ | ‘”“”’ longstringitem
*
‘”“”’
shortstringitem ::= shortstringchar | stringescapeseq
longstringitem ::= longstringchar | stringescapeseq
shortstringchar ::= <any source character except “\” or newline or the quote>
longstringchar ::= <any source character except “\”>
stringescapeseq ::= “\” <any source character>
bytesliteral ::= bytesprefix(shortbytes | longbytes)
bytesprefix ::= “b” | “B” | “br” | “Br” | “bR” | “BR” | “rb” | “rB” | “Rb” | “RB”
shortbytes ::= “”’ shortbytesitem
*
“”’ | ‘”’ shortbytesitem
*
‘”’
longbytes ::= “’‘”’ longbytesitem
*
“’‘”’ | ‘”“”’ longbytesitem
*
‘”“”’
shortbytesitem ::= shortbyteschar | bytesescapeseq
longbytesitem ::= longbyteschar | bytesescapeseq
shortbyteschar ::= <any ASCII character except “\” or newline or the quote>
longbyteschar ::= <any ASCII character except “\”>
bytesescapeseq ::= “\” <any ASCII character>
One syntactic restriction not indicated by these productions is that whitespace is not allowed between the
stringprefix or bytesprefix and the rest of the literal. The source character set is defined by the encod-
ing declaration; it is UTF-8 if no encoding declaration is given in the source file; see section Encoding declarations.
In plain English: Both types of literals can be enclosed in matching single quotes (’) or double quotes ("). They can
also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted
strings). The backslash (\) character is used to escape characters that otherwise have a special meaning, such as
newline, backslash itself, or the quote character.
Bytes literals are always prefixed with ’b’ or ’B’; they produce an instance of the bytes type instead of the str
type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with
escapes.
As of Python 3.3 it is possible again to prefix string literals with a u prefix to simplify maintenance of dual 2.x and 3.x
codebases.
Both string and bytes literals may optionally be prefixed with a letter ’r’ or ’R’; such strings are called raw strings
and treat backslashes as literal characters. As a result, in string literals, ’\U’ and ’\u’ escapes in raw strings are not
treated specially. Given that Python 2.x’s raw unicode literals behave differently than Python 3.x’s the ’ur’ syntax is
not supported.
New in version 3.3: The ’rb’ prefix of raw bytes literals has been added as a synonym of ’br’.
New in version 3.3: Support for the unicode legacy literal (u’value’) was reintroduced to simplify the maintenance
of dual Python 2.x and 3.x codebases. See PEP 414 for more information.
A string literal with ’f’ or ’F’ in its prefix is a formatted string literal; see Formatted string literals. The ’f’ may
be combined with ’r’, but not with ’b’ or ’u’, therefore raw formatted strings are possible, but formatted bytes
literals are not.
In triple-quoted literals, unescaped newlines and quotes are allowed (and are retained), except that three unescaped
quotes in a row terminate the literal. (A “quote” is the character used to open the literal, i.e. either ’ or ".)
Unless an ’r’ or ’R’ prefix is present, escape sequences in string and bytes literals are interpreted according to rules
similar to those used by Standard C. The recognized escape sequences are:
10 Chapter 2. Lexical analysis