4.1. Scanning¶
- token¶
A single atomic unit in a Crowbar source file. May be a keyword, an identifier, a constant, a string literal, or a punctuator. Keywords, identifiers, and constants (except for character constants) must have either whitespace or a comment separating them. Punctuators, string literals, and character constants do not require explicit separation from adjacent tokens.
- keyword¶
One of the literal words
bool,break,case,const,continue,default,do,else,enum,false,float32,float64,for,fragile,function,if,include,int8,int16,int32,int64,intaddr,intmax,intsize,opaque,return,sizeof,struct,switch,true,uint8,uint16,uint32,uint64,uintaddr,uintmax,uintsize,union,void, orwhile.- identifier¶
A nonempty sequence of characters blah blah blah
Todo
figure out https://www.unicode.org/reports/tr31/tr31-33.html
- constant¶
A numeric (or numeric-equivalent) value specified directly within the code. May be a decimal constant, a binary constant , an octal constant, a hexadecimal constant, a floating-point constant, a hexadecimal floating-point constant, or a character constant. Any of these except for the character constant may contain underscores; these are ignored by the compiler and only meaningful to humans reading the code.
- decimal constant¶
A sequence of characters matching the regular expression
[0-9_]+. Denotes the numeric value of the given sequence of decimal digits.- binary constant¶
A sequence of characters matching the regular expression
0[bB][01_]+. Denotes the numeric value of the given sequence of binary digits (after the0[bB]prefix has been removed).- octal constant¶
A sequence of characters matching the regular expression
0o[0-7_]+. Denotes the numeric value of the given sequence of octal digits (after the0oprefix has been removed).- hexadecimal constant¶
A sequence of characters matching the regular expression
0[xX][0-9a-fA-F]+. Denotes the numeric value of the given sequence of hexadecimal digits (after the0[xX]prefix has been removed).- floating-point constant¶
A sequence of characters matching the regular expression
[0-9_]+\.[0-9_]+([eE][+-]?[0-9_]+)?.Note
Unlike in C and many other languages,
6e3in Crowbar is not a valid floating-point constant. The Crowbar-compatible spelling is6.0e3.Denotes the numeric value of the given decimal number, optionally expressed in scientific notation. That is,
XeYdenotes \(X * 10^Y\).- hexadecimal floating-point constant¶
A sequence of characters matching the regular expression
0(fx|FX)[0-9a-fA-F_]+\.[0-9a-fA-F_]+[pP][+-]?[0-9_]+. Denotes the numeric value of the given hexadecimal number expressed in binary scientific notation. That is,XpYdenotes \(X * 2^Y\).- character constant¶
A pair of single quotes
'surrounding either a single character or an escape sequence. The single character may not be a single quote or a backslash\. Denotes the Unicode scalar value for either the single surrounded character or the character denoted by the escape sequence.- escape sequence¶
One of the following pairs of characters:
\', denoting the single quote'\", denoting the double quote"\\, denoting the backslash\\r, denoting the carriage return (U+000D)\n, denoting the line feed, or newline (U+000A)\t, denoting the (horizontal) tab (U+0009)\0, denoting a null character (U+0000)
Or a sequence of characters matching one of the following regular expressions:
\\x[0-9a-fA-F]{2}, denoting the numeric value of the given two hexadecimal digits\\u[0-9a-fA-F]{4}, denoting the numeric value of the given four hexadecimal digits\\U[0-9a-fA-F]{8}, denoting the numeric value of the given eight hexadecimal digits
- string literal¶
A pair of double quotes
"surrounding a sequence whose elements are either single characters or escape sequences. No single-character element may be the double quote or the backslash. Denotes the UTF-8-encoded sequence of bytes representing the sequence of characters which, either directly or via an escape sequence, are specified between the quotes.- punctuator¶
One of the literal sequences of characters
[,],(,),{,},.,,,+,-,*,/,%,;,:,!,&,|,^,~,>,<,=,->,++,--,>>,<<,<=,>=,==,!=,&&,||,+=,-=,*=,/=,%=,&=,|=, or^=.- whitespace¶
A nonempty sequence of characters that each has a Unicode general category of either Control (
Cc) or Separator (Z). Separates tokens.- comment¶
Text that the compiler should ignore. May be a line comment or a block comment.
- line comment¶
A sequence of characters beginning with the characters
//(outside of a string literal or comment) and ending with a newline character U+000A.- block comment¶
A sequence of characters beginning with the characters
/*(outside of a string literal or comment) and ending with the characters*/.