4.1. Scanning¶
- token¶
A single atomic unit in a Crowbar source file. May be a keyword, an identifier, a constant, a string literal, or a punctuator. Keywords, identifiers, and constants (except for character constants) must have either whitespace or a comment separating them. Punctuators, string literals, and character constants do not require explicit separation from adjacent tokens.
- keyword¶
One of the literal words
bool
,break
,case
,const
,continue
,default
,do
,else
,enum
,false
,float32
,float64
,for
,fragile
,function
,if
,include
,int8
,int16
,int32
,int64
,intaddr
,intmax
,intsize
,opaque
,return
,sizeof
,struct
,switch
,true
,uint8
,uint16
,uint32
,uint64
,uintaddr
,uintmax
,uintsize
,union
,void
, orwhile
.- identifier¶
A nonempty sequence of characters blah blah blah
Todo
figure out https://www.unicode.org/reports/tr31/tr31-33.html
- constant¶
A numeric (or numeric-equivalent) value specified directly within the code. May be a decimal constant, a binary constant , an octal constant, a hexadecimal constant, a floating-point constant, a hexadecimal floating-point constant, or a character constant. Any of these except for the character constant may contain underscores; these are ignored by the compiler and only meaningful to humans reading the code.
- decimal constant¶
A sequence of characters matching the regular expression
[0-9_]+
. Denotes the numeric value of the given sequence of decimal digits.- binary constant¶
A sequence of characters matching the regular expression
0[bB][01_]+
. Denotes the numeric value of the given sequence of binary digits (after the0[bB]
prefix has been removed).- octal constant¶
A sequence of characters matching the regular expression
0o[0-7_]+
. Denotes the numeric value of the given sequence of octal digits (after the0o
prefix has been removed).- hexadecimal constant¶
A sequence of characters matching the regular expression
0[xX][0-9a-fA-F]+
. Denotes the numeric value of the given sequence of hexadecimal digits (after the0[xX]
prefix has been removed).- floating-point constant¶
A sequence of characters matching the regular expression
[0-9_]+\.[0-9_]+([eE][+-]?[0-9_]+)?
.Note
Unlike in C and many other languages,
6e3
in Crowbar is not a valid floating-point constant. The Crowbar-compatible spelling is6.0e3
.Denotes the numeric value of the given decimal number, optionally expressed in scientific notation. That is,
XeY
denotes \(X * 10^Y\).- hexadecimal floating-point constant¶
A sequence of characters matching the regular expression
0(fx|FX)[0-9a-fA-F_]+\.[0-9a-fA-F_]+[pP][+-]?[0-9_]+
. Denotes the numeric value of the given hexadecimal number expressed in binary scientific notation. That is,XpY
denotes \(X * 2^Y\).- character constant¶
A pair of single quotes
'
surrounding either a single character or an escape sequence. The single character may not be a single quote or a backslash\
. Denotes the Unicode scalar value for either the single surrounded character or the character denoted by the escape sequence.- escape sequence¶
One of the following pairs of characters:
\'
, denoting the single quote'
\"
, denoting the double quote"
\\
, denoting the backslash\
\r
, denoting the carriage return (U+000D)\n
, denoting the line feed, or newline (U+000A)\t
, denoting the (horizontal) tab (U+0009)\0
, denoting a null character (U+0000)
Or a sequence of characters matching one of the following regular expressions:
\\x[0-9a-fA-F]{2}
, denoting the numeric value of the given two hexadecimal digits\\u[0-9a-fA-F]{4}
, denoting the numeric value of the given four hexadecimal digits\\U[0-9a-fA-F]{8}
, denoting the numeric value of the given eight hexadecimal digits
- string literal¶
A pair of double quotes
"
surrounding a sequence whose elements are either single characters or escape sequences. No single-character element may be the double quote or the backslash. Denotes the UTF-8-encoded sequence of bytes representing the sequence of characters which, either directly or via an escape sequence, are specified between the quotes.- punctuator¶
One of the literal sequences of characters
[
,]
,(
,)
,{
,}
,.
,,
,+
,-
,*
,/
,%
,;
,:
,!
,&
,|
,^
,~
,>
,<
,=
,->
,++
,--
,>>
,<<
,<=
,>=
,==
,!=
,&&
,||
,+=
,-=
,*=
,/=
,%=
,&=
,|=
, or^=
.- whitespace¶
A nonempty sequence of characters that each has a Unicode general category of either Control (
Cc
) or Separator (Z
). Separates tokens.- comment¶
Text that the compiler should ignore. May be a line comment or a block comment.
- line comment¶
A sequence of characters beginning with the characters
//
(outside of a string literal or comment) and ending with a newline character U+000A.- block comment¶
A sequence of characters beginning with the characters
/*
(outside of a string literal or comment) and ending with the characters*/
.