Scanning -------- .. glossary:: token A single atomic unit in a Crowbar source file. May be a :term:`keyword`, an :term:`identifier`, a :term:`constant`, a :term:`string literal`, or a :term:`punctuator`. Keywords, identifiers, and constants (except for :term:`character constant`\ s) must have either whitespace or a comment separating them. Punctuators, string literals, and character constants do not require explicit separation from adjacent tokens. keyword One of the literal words ``bool``, ``break``, ``case``, ``const``, ``continue``, ``default``, ``do``, ``else``, ``enum``, ``false``, ``float32``, ``float64``, ``for``, ``fragile``, ``function``, ``if``, :crowbar:ref:`include `, ``int8``, ``int16``, ``int32``, ``int64``, ``intaddr``, ``intmax``, ``intsize``, ``opaque``, ``return``, ``sizeof``, ``struct``, ``switch``, ``true``, ``uint8``, ``uint16``, ``uint32``, ``uint64``, ``uintaddr``, ``uintmax``, ``uintsize``, ``union``, ``void``, or ``while``. identifier A nonempty sequence of characters blah blah blah .. todo:: figure out https://www.unicode.org/reports/tr31/tr31-33.html constant A numeric (or numeric-equivalent) value specified directly within the code. May be a :term:`decimal constant`, a :term:`binary constant` , an :term:`octal constant`, a :term:`hexadecimal constant`, a :term:`floating-point constant`, a :term:`hexadecimal floating-point constant`, or a :term:`character constant`. Any of these except for the character constant may contain underscores; these are ignored by the compiler and only meaningful to humans reading the code. decimal constant A sequence of characters matching the regular expression ``[0-9_]+``. Denotes the numeric value of the given sequence of decimal digits. binary constant A sequence of characters matching the regular expression ``0[bB][01_]+``. Denotes the numeric value of the given sequence of binary digits (after the ``0[bB]`` prefix has been removed). octal constant A sequence of characters matching the regular expression ``0o[0-7_]+``. Denotes the numeric value of the given sequence of octal digits (after the ``0o`` prefix has been removed). hexadecimal constant A sequence of characters matching the regular expression ``0[xX][0-9a-fA-F]+``. Denotes the numeric value of the given sequence of hexadecimal digits (after the ``0[xX]`` prefix has been removed). floating-point constant A sequence of characters matching the regular expression ``[0-9_]+\.[0-9_]+([eE][+-]?[0-9_]+)?``. .. note:: Unlike in C and many other languages, ``6e3`` in Crowbar is not a valid floating-point constant. The Crowbar-compatible spelling is ``6.0e3``. Denotes the numeric value of the given decimal number, optionally expressed in scientific notation. That is, ``XeY`` denotes :math:`X * 10^Y`. hexadecimal floating-point constant A sequence of characters matching the regular expression ``0(fx|FX)[0-9a-fA-F_]+\.[0-9a-fA-F_]+[pP][+-]?[0-9_]+``. Denotes the numeric value of the given hexadecimal number expressed in binary scientific notation. That is, ``XpY`` denotes :math:`X * 2^Y`. character constant A pair of single quotes ``'`` surrounding either a single character or an :term:`escape sequence`. The single character may not be a single quote or a backslash ``\``. Denotes the Unicode scalar value for either the single surrounded character or the character denoted by the escape sequence. escape sequence One of the following pairs of characters: * ``\'``, denoting the single quote ``'`` * ``\"``, denoting the double quote ``"`` * ``\\``, denoting the backslash ``\`` * ``\r``, denoting the carriage return (U+000D) * ``\n``, denoting the line feed, or newline (U+000A) * ``\t``, denoting the (horizontal) tab (U+0009) * ``\0``, denoting a null character (U+0000) Or a sequence of characters matching one of the following regular expressions: * ``\\x[0-9a-fA-F]{2}``, denoting the numeric value of the given two hexadecimal digits * ``\\u[0-9a-fA-F]{4}``, denoting the numeric value of the given four hexadecimal digits * ``\\U[0-9a-fA-F]{8}``, denoting the numeric value of the given eight hexadecimal digits string literal A pair of double quotes ``"`` surrounding a sequence whose elements are either single characters or escape sequences. No single-character element may be the double quote or the backslash. Denotes the UTF-8-encoded sequence of bytes representing the sequence of characters which, either directly or via an escape sequence, are specified between the quotes. punctuator One of the literal sequences of characters ``[``, ``]``, ``(``, ``)``, ``{``, ``}``, ``.``, ``,``, ``+``, ``-``, ``*``, ``/``, ``%``, ``;``, ``:``, ``!``, ``&``, ``|``, ``^``, ``~``, ``>``, ``<``, ``=``, ``->``, ``++``, ``--``, ``>>``, ``<<``, ``<=``, ``>=``, ``==``, ``!=``, ``&&``, ``||``, ``+=``, ``-=``, ``*=``, ``/=``, ``%=``, ``&=``, ``|=``, or ``^=``. whitespace A nonempty sequence of characters that each has a Unicode general category of either Control (``Cc``) or Separator (``Z``). Separates tokens. comment Text that the compiler should ignore. May be a :term:`line comment` or a :term:`block comment`. line comment A sequence of characters beginning with the characters ``//`` (outside of a :term:`string literal` or :term:`comment`) and ending with a newline character U+000A. block comment A sequence of characters beginning with the characters ``/*`` (outside of a :term:`string literal` or :term:`comment`) and ending with the characters ``*/``.