4.4. Defining Types

TypeDefinition <- StructDefinition / EnumDefinition / UnionDefinition

Crowbar has three different kinds of user-defined types.

Compile-time Behavior:

When a type is defined, the compiler must then allow that type to be used.

Runtime Behavior:

The definition of a type has no runtime behavior.

StructDefinition <- NormalStructDefinition / OpaqueStructDefinition
NormalStructDefinition <- 'struct' identifier '{' VariableDeclaration+ '}'

A struct defines a composite type with several members. Its members are stored in the order in which they are defined, and they each take up the space they normally would.

Todo

figure out alignment & padding

OpaqueStructDefinition <- 'opaque' 'struct' identifier ';'

An opaque struct is a struct whose name is part of an API boundary but whose contents are not. Its size is left unspecified, and it can only be used as the target of a pointer.

EnumDefinition <- 'enum' identifier '{' EnumMember (',' EnumMember)* ','? '}'
EnumMember <- identifier ('=' Expression)?

An enum defines a type which can take one of several specified values.

Todo

define enum value assignment, type-related behavior

UnionDefinition <- RobustUnionDefinition / FragileUnionDefinition

Unions as implemented in C are not robust by default, and care must be taken to ensure that they are only used robustly. However, for the purpose of interoperability with C, Crowbar unions may be defined as robust or as fragile.

RobustUnionDefinition <- 'union' identifier '{' VariableDeclaration UnionBody '}'
UnionBody <- 'switch' '(' identifier ')' '{' UnionBodySet+ '}'
UnionBodySet <- CaseSpecifier+ (VariableDeclaration / ';')

A robust union, or simply union, in Crowbar is what is known more broadly as a tagged union. It’s a way to package some data alongside an enum but have the type of data depend on the value of the enum. Since the enum value indicates which data is present, the enum value is also known as a tag. The top-level variable declaration creates the tag. The tag must have a type which is some enum. The switch parameter must be the name of the tag, and the cases will declare the data associated with a given value of the tag. This allows for storing extra data alongside enum values while using minimal additional space in memory. (All the fields under the switch overlap as stored in memory, so it’s important to use the tag to specify which field is available.)

For example:

enum TokenType {
    Identifier,
    Constant,
    Operator,
    Whitespace,
}

union Token {
    enum TokenType type;

    switch (type) {
        case Identifier: (const byte) * name;
        case Constant: intmax value;
        case Operator: (const byte) * op;
        case Whitespace: ;
    }
}

defines a union Token type, where the type field controls which of the other fields in the union is valid.

Todo

go into more depth about how tagged unions work

FragileUnionDefinition <- 'fragile' 'union' identifier '{' VariableDeclaration+ '}'

A fragile union also allows for storing one of several different types of data. However, there is no internal indication of which type of data is actually being stored in the union. As such, in non-trivial cases no compiler can predict which field is or is not valid, and any statement which reads a field of a fragile union must itself be a FragileStatement.

The size of a fragile union is the largest size of any of its members. The address of each member is the address of the union object itself. The member which was most recently set will retain its value. Reading another member with size no larger than the most recently set member will interpret the first bytes of the most recently set member as a value of the type of the member being read.

For example, the functions test1 and test2 are equivalent:

fragile union Example {
    float32 float_data;
    uint32 uint_data;
}

uint32 test1(float32 arg) {
    union Example temp;
    temp.float_data = arg;
    fragile return temp.uint_data;
}

uint32 test2(float32 arg) {
    float32* temp = &arg;
    fragile uint32* temp_casted = (uint32*)temp;
    return *temp_casted;
}